Package gov.sandia.cognition.text.token

Provides text tokenization algorithms.


Interface Summary
Token Interface for a meaningful chunk of text, called a token.
Tokenizer Interface for a class that converts strings into tokens.

Class Summary
AbstractCharacterBasedTokenizer An abstract implementation of a tokenizer that considers each character individually.
AbstractTokenizer Abstract implementation of the Tokenizer interface.
DefaultToken A default implementation of the Token interface.
LetterNumberTokenizer A tokenizer that creates tokens from sequences of letters and numbers, treating everything else as a delimiter.

Package gov.sandia.cognition.text.token Description

Provides text tokenization algorithms. Tokenization takes raw text and turns it into an initial set of terms.

Justin Basilico