gov.sandia.cognition.text.document.extractor
Interface DocumentExtractor

All Known Subinterfaces:
SingleDocumentExtractor
All Known Implementing Classes:
AbstractDocumentExtractor, AbstractSingleDocumentExtractor, TextDocumentExtractor

public interface DocumentExtractor

Interface for extracting documents from files.

Since:
3.0
Author:
Justin Basilico

Method Summary
 boolean canExtract(File file)
          Determines if the given file can be extracted by this extractor.
 boolean canExtract(URI uri)
          Determines if the given file can be extracted by this extractor.
 boolean canExtract(URLConnection connection)
          Determines if the given file can be extracted by this extractor.
 Iterable<? extends Document> extractAll(File file)
          Attempts to extract all of the documents from the given file.
 Iterable<? extends Document> extractAll(URI uri)
          Attempts to extract all of the documents from the given file.
 Iterable<? extends Document> extractAll(URLConnection connection)
          Attempts to extract all of the documents from the given file.
 

Method Detail

canExtract

boolean canExtract(File file)
                   throws IOException
Determines if the given file can be extracted by this extractor.

Parameters:
file - The file to extract.
Returns:
True if this extractor can extract the file and false otherwise.
Throws:
IOException - If there is an IO error.

canExtract

boolean canExtract(URI uri)
                   throws IOException
Determines if the given file can be extracted by this extractor.

Parameters:
uri - The URI of the file to extract.
Returns:
True if this extractor can extract the file and false otherwise.
Throws:
IOException - If there is an IO error.

canExtract

boolean canExtract(URLConnection connection)
                   throws IOException
Determines if the given file can be extracted by this extractor.

Parameters:
connection - The connection to the file to extract.
Returns:
True if this extractor can extract the file and false otherwise.
Throws:
IOException - If there is an IO error.

extractAll

Iterable<? extends Document> extractAll(File file)
                                        throws DocumentExtractionException,
                                               IOException
Attempts to extract all of the documents from the given file.

Parameters:
file - The file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractAll

Iterable<? extends Document> extractAll(URI uri)
                                        throws DocumentExtractionException,
                                               IOException
Attempts to extract all of the documents from the given file.

Parameters:
uri - The URI of the file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.

extractAll

Iterable<? extends Document> extractAll(URLConnection connection)
                                        throws DocumentExtractionException,
                                               IOException
Attempts to extract all of the documents from the given file.

Parameters:
connection - The connection to the file to extract.
Returns:
The list of documents extracted from the given file.
Throws:
DocumentExtractionException - If there is an error extracting data from the file.
IOException - If there is an IO error.