gov.sandia.cognition.text.term.vector
Class BagOfWordsTransform

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.math.matrix.DefaultVectorFactoryContainer
          extended by gov.sandia.cognition.text.term.vector.BagOfWordsTransform
All Implemented Interfaces:
Evaluator<Iterable<? extends Termable>,Vector>, VectorFactoryContainer, CloneableSerializable, Serializable, Cloneable

public class BagOfWordsTransform
extends DefaultVectorFactoryContainer
implements Evaluator<Iterable<? extends Termable>,Vector>

Transforms a list of term occurrences into a vector of counts.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
protected  TermIndex termIndex
          Gets the term index used by the transform.
 
Fields inherited from class gov.sandia.cognition.math.matrix.DefaultVectorFactoryContainer
vectorFactory
 
Constructor Summary
BagOfWordsTransform()
          Creates a new BagOfWordsTransform.
BagOfWordsTransform(TermIndex termIndex)
          Creates a new BagOfWordsTransform with the given term index.
BagOfWordsTransform(TermIndex termIndex, VectorFactory<? extends Vector> vectorFactory)
          Creates a new BagOfWordsTransform with the given term index.
 
Method Summary
 Vector convertToVector(Iterable<? extends Termable> terms)
          Converts a given list of terms to a vector by counting the occurrence of each term.
static Vector convertToVector(Iterable<? extends Termable> terms, TermIndex termIndex, VectorFactory<?> vectorFactory)
          Converts a given list of terms to a vector by counting the occurrence of each term.
 Vector convertToVector(Iterable<? extends Termable> terms, VectorFactory<?> vectorFactory)
          Converts a given list of terms to a vector by counting the occurrence of each term.
 Vector evaluate(Iterable<? extends Termable> terms)
          Evaluates the function on the given input and returns the output.
 TermIndex getTermIndex()
          Gets the term index that the transform uses to map terms to their vector indices.
 void setTermIndex(TermIndex termIndex)
          Sets the term index that the transform is to use to map terms to their vector indices.
 
Methods inherited from class gov.sandia.cognition.math.matrix.DefaultVectorFactoryContainer
getVectorFactory, setVectorFactory
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

termIndex

protected TermIndex termIndex
Gets the term index used by the transform. Maps terms to indices in the vector.

Constructor Detail

BagOfWordsTransform

public BagOfWordsTransform()
Creates a new BagOfWordsTransform. Starts with an empty term index.


BagOfWordsTransform

public BagOfWordsTransform(TermIndex termIndex)
Creates a new BagOfWordsTransform with the given term index.

Parameters:
termIndex - The term index to use to map terms to vector indices.

BagOfWordsTransform

public BagOfWordsTransform(TermIndex termIndex,
                           VectorFactory<? extends Vector> vectorFactory)
Creates a new BagOfWordsTransform with the given term index.

Parameters:
termIndex - The term index to use to map terms to vector indices.
vectorFactory - The vector factory to use.
Method Detail

evaluate

public Vector evaluate(Iterable<? extends Termable> terms)
Description copied from interface: Evaluator
Evaluates the function on the given input and returns the output.

Specified by:
evaluate in interface Evaluator<Iterable<? extends Termable>,Vector>
Parameters:
terms - The input to evaluate.
Returns:
The output produced by evaluating the input.

convertToVector

public Vector convertToVector(Iterable<? extends Termable> terms)
Converts a given list of terms to a vector by counting the occurrence of each term.

Parameters:
terms - The terms to count.
Returns:
The bag-of-words vector representation of the terms, which is the count of how many times each term occurs in the document.

convertToVector

public Vector convertToVector(Iterable<? extends Termable> terms,
                              VectorFactory<?> vectorFactory)
Converts a given list of terms to a vector by counting the occurrence of each term.

Parameters:
terms - The terms to count.
vectorFactory - The vector factory to use to create the vector.
Returns:
The bag-of-words vector representation of the terms, which is the count of how many times each term occurs in the document.

convertToVector

public static Vector convertToVector(Iterable<? extends Termable> terms,
                                     TermIndex termIndex,
                                     VectorFactory<?> vectorFactory)
Converts a given list of terms to a vector by counting the occurrence of each term.

Parameters:
terms - The terms to count.
termIndex - The term index to use to map terms to their vector indices.
vectorFactory - The vector factory to use to create the vector.
Returns:
The bag-of-words vector representation of the terms, which is the count of how many times each term occurs in the document.

getTermIndex

public TermIndex getTermIndex()
Gets the term index that the transform uses to map terms to their vector indices.

Returns:
The term index used by the transform.

setTermIndex

public void setTermIndex(TermIndex termIndex)
Sets the term index that the transform is to use to map terms to their vector indices.

Parameters:
termIndex - The term index for the transform to use.