gov.sandia.cognition.text.token
Class AbstractCharacterBasedTokenizer

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.token.AbstractTokenizer
          extended by gov.sandia.cognition.text.token.AbstractCharacterBasedTokenizer
All Implemented Interfaces:
Tokenizer, CloneableSerializable, Serializable, Cloneable
Direct Known Subclasses:
LetterNumberTokenizer

public abstract class AbstractCharacterBasedTokenizer
extends AbstractTokenizer

An abstract implementation of a tokenizer that considers each character individually. It takes care of most of the work and lets the subclasses define what a valid token member character is.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Constructor Summary
AbstractCharacterBasedTokenizer()
          Creates a new LetterNumberTokenizer.
 
Method Summary
abstract  boolean isTokenMember(char c)
          Determines if the given character is considered to be part of a token.
 Iterable<Token> tokenize(Reader reader)
          Converts the string from the given reader into an ordered list of tokens.
 
Methods inherited from class gov.sandia.cognition.text.token.AbstractTokenizer
tokenize, tokenize
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Constructor Detail

AbstractCharacterBasedTokenizer

public AbstractCharacterBasedTokenizer()
Creates a new LetterNumberTokenizer.

Method Detail

tokenize

public Iterable<Token> tokenize(Reader reader)
Description copied from interface: Tokenizer
Converts the string from the given reader into an ordered list of tokens.

Parameters:
reader - The reader to tokenize the data from.
Returns:
The ordered list of tokens.

isTokenMember

public abstract boolean isTokenMember(char c)
Determines if the given character is considered to be part of a token.

Parameters:
c - A character.
Returns:
True if the character can be part of a token; otherwise, false.