gov.sandia.cognition.text.document.extractor
Class AbstractDocumentExtractor

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.document.extractor.AbstractDocumentExtractor
All Implemented Interfaces:
DocumentExtractor, CloneableSerializable, Serializable, Cloneable
Direct Known Subclasses:
AbstractSingleDocumentExtractor

public abstract class AbstractDocumentExtractor
extends AbstractCloneableSerializable
implements DocumentExtractor

An abstract implementation of the DocumentExtractor interface. It chains together the extraction calls so that subclasses only have to handle the URLConnection calls.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Constructor Summary
AbstractDocumentExtractor()
          Creates a new AbstractDocumentExtractor.
 
Method Summary
 boolean canExtract(File file)
          Determines if the given file can be extracted by this extractor.
 Iterable<? extends Document> extractAll(File file)
          Attempts to extract all of the documents from the given file.
 Iterable<? extends Document> extractAll(URI uri)
          Attempts to extract all of the documents from the given file.
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.text.document.extractor.DocumentExtractor
canExtract, canExtract, extractAll
 

Constructor Detail

AbstractDocumentExtractor

public AbstractDocumentExtractor()
Creates a new AbstractDocumentExtractor.

Method Detail

canExtract

public boolean canExtract(File file)
                   throws IOException
Description copied from interface: DocumentExtractor
Determines if the given file can be extracted by this extractor.

Specified by:
canExtract in interface DocumentExtractor
Parameters:
file - The file to extract.
Returns:
True if this extractor can extract the file and false otherwise.
Throws:
IOException - If there is an IO error.

extractAll

public Iterable<? extends Document> extractAll(File file)
                                        throws DocumentExtractionException,
                                               IOException
Description copied from interface: DocumentExtractor
Attempts to extract all of the documents from the given file.

Specified by:
extractAll in interface DocumentExtractor
Parameters:
file - The file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractAll

public Iterable<? extends Document> extractAll(URI uri)
                                        throws DocumentExtractionException,
                                               IOException
Description copied from interface: DocumentExtractor
Attempts to extract all of the documents from the given file.

Specified by:
extractAll in interface DocumentExtractor
Parameters:
uri - The URI of the file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.