gov.sandia.cognition.text.document.extractor
Class AbstractSingleDocumentExtractor

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.document.extractor.AbstractDocumentExtractor
          extended by gov.sandia.cognition.text.document.extractor.AbstractSingleDocumentExtractor
All Implemented Interfaces:
DocumentExtractor, SingleDocumentExtractor, CloneableSerializable, Serializable, Cloneable
Direct Known Subclasses:
TextDocumentExtractor

public abstract class AbstractSingleDocumentExtractor
extends AbstractDocumentExtractor
implements SingleDocumentExtractor

An abstract implementation of the SingleDocumentExtractor interface. It turns the extractAll calls into the appropriate extractDocument calls. It also chains the different extractDocument so that subclasses only need to handle the URLConnection version.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Constructor Summary
AbstractSingleDocumentExtractor()
          Creates a new AbstractSingleDocumentExtractor.
 
Method Summary
 List<? extends Document> extractAll(File file)
          Attempts to extract all of the documents from the given file.
 List<? extends Document> extractAll(URI uri)
          Attempts to extract all of the documents from the given file.
 List<? extends Document> extractAll(URLConnection connection)
          Attempts to extract all of the documents from the given file.
 Document extractDocument(File file)
          Attempts to extract a document from the given file.
 Document extractDocument(URI uri)
          Attempts to extract a document from the given file.
 
Methods inherited from class gov.sandia.cognition.text.document.extractor.AbstractDocumentExtractor
canExtract
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.text.document.extractor.SingleDocumentExtractor
extractDocument
 
Methods inherited from interface gov.sandia.cognition.text.document.extractor.DocumentExtractor
canExtract, canExtract, canExtract
 

Constructor Detail

AbstractSingleDocumentExtractor

public AbstractSingleDocumentExtractor()
Creates a new AbstractSingleDocumentExtractor.

Method Detail

extractAll

public List<? extends Document> extractAll(File file)
                                    throws DocumentExtractionException,
                                           IOException
Description copied from interface: DocumentExtractor
Attempts to extract all of the documents from the given file.

Specified by:
extractAll in interface DocumentExtractor
Overrides:
extractAll in class AbstractDocumentExtractor
Parameters:
file - The file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractAll

public List<? extends Document> extractAll(URI uri)
                                    throws DocumentExtractionException,
                                           IOException
Description copied from interface: DocumentExtractor
Attempts to extract all of the documents from the given file.

Specified by:
extractAll in interface DocumentExtractor
Overrides:
extractAll in class AbstractDocumentExtractor
Parameters:
uri - The URI of the file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractAll

public List<? extends Document> extractAll(URLConnection connection)
                                    throws DocumentExtractionException,
                                           IOException
Description copied from interface: DocumentExtractor
Attempts to extract all of the documents from the given file.

Specified by:
extractAll in interface DocumentExtractor
Parameters:
connection - The connection to the file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractDocument

public Document extractDocument(File file)
                         throws DocumentExtractionException,
                                IOException
Description copied from interface: SingleDocumentExtractor
Attempts to extract a document from the given file.

Specified by:
extractDocument in interface SingleDocumentExtractor
Parameters:
file - The file to extract.
Returns:
The document extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractDocument

public Document extractDocument(URI uri)
                         throws DocumentExtractionException,
                                IOException
Description copied from interface: SingleDocumentExtractor
Attempts to extract a document from the given file.

Specified by:
extractDocument in interface SingleDocumentExtractor
Parameters:
uri - The URI of the file to extract.
Returns:
The document extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.