Package gov.sandia.cognition.text.token

Provides text tokenization algorithms.

See:
          Description

Interface Summary
Token Interface for a meaningful chunk of text, called a token.
Tokenizer Interface for a class that converts strings into tokens.
 

Class Summary
AbstractCharacterBasedTokenizer An abstract implementation of a tokenizer that considers each character individually.
AbstractTokenizer Abstract implementation of the Tokenizer interface.
DefaultToken A default implementation of the Token interface.
LetterNumberTokenizer A tokenizer that creates tokens from sequences of letters and numbers, treating everything else as a delimiter.
 

Package gov.sandia.cognition.text.token Description

Provides text tokenization algorithms. Tokenization takes raw text and turns it into an initial set of terms.

Since:
3.0
Author:
Justin Basilico