gov.sandia.cognition.text.term.vector.weighter.global
Class AbstractFrequencyBasedGlobalTermWeighter

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.term.vector.AbstractVectorSpaceModel
          extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
              extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
All Implemented Interfaces:
VectorFactoryContainer, VectorSpaceModel, GlobalTermWeighter, CloneableSerializable, Serializable, Cloneable
Direct Known Subclasses:
AbstractEntropyBasedGlobalTermWeighter, InverseDocumentFrequencyGlobalTermWeighter

public abstract class AbstractFrequencyBasedGlobalTermWeighter
extends AbstractGlobalTermWeighter

An abstract GlobalTermWeighter that keeps track of term frequencies in documents. For each term, it keeps track of both the document frequency (the number of documents the term appears in) and the global frequency (the total number of times the term appears). It also keeps track of the total number of documents.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
protected  int documentCount
          The number of documents the weight is computed over.
protected  Vector termDocumentFrequencies
          The vector containing the number of documents that each term occurs in.
protected  Vector termGlobalFrequencies
          A vector containing the total number of times that each term occurred in the document set.
 
Fields inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
vectorFactory
 
Constructor Summary
AbstractFrequencyBasedGlobalTermWeighter()
          Creates a new AbstractCountingBasedGlobalTermWeighter.
AbstractFrequencyBasedGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
          Creates a new AbstractCountingBasedGlobalTermWeighter.
 
Method Summary
 void add(Vector counts)
          Adds a document to the model.
 AbstractFrequencyBasedGlobalTermWeighter clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
 int getDocumentCount()
          Gets the number of documents that this object is using for its model
 Vector getTermDocumentFrequencies()
          Gets the vector containing the number of documents that each term appears in.
 Vector getTermGlobalFrequencies()
          Gets the vector containing the number of times that each term appears.
protected  void growVectors(int newDimensionality)
          Called when the dimensionality of the term vector grows.
protected  void initializeVectors(int dimensionality)
          Initializes internal vectors to the given dimensionality.
 boolean remove(Vector counts)
          Removes the document from the model.
protected  void setDocumentCount(int documentCount)
          Sets the document count.
protected  void setTermDocumentFrequencies(Vector termDocumentFrequencies)
          Sets the vector containing the number of documents that each term appears in.
protected  void setTermGlobalFrequencies(Vector termGlobalFrequencies)
          Gets the vector containing the number of times that each term appears.
 
Methods inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
getVectorFactory, setVectorFactory
 
Methods inherited from class gov.sandia.cognition.text.term.vector.AbstractVectorSpaceModel
add, addAll, remove, removeAll
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.text.term.vector.weighter.global.GlobalTermWeighter
getDimensionality, getGlobalWeights
 
Methods inherited from interface gov.sandia.cognition.text.term.vector.VectorSpaceModel
add, addAll, remove, removeAll
 

Field Detail

documentCount

protected int documentCount
The number of documents the weight is computed over.


termDocumentFrequencies

protected Vector termDocumentFrequencies
The vector containing the number of documents that each term occurs in.


termGlobalFrequencies

protected Vector termGlobalFrequencies
A vector containing the total number of times that each term occurred in the document set.

Constructor Detail

AbstractFrequencyBasedGlobalTermWeighter

public AbstractFrequencyBasedGlobalTermWeighter()
Creates a new AbstractCountingBasedGlobalTermWeighter.


AbstractFrequencyBasedGlobalTermWeighter

public AbstractFrequencyBasedGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
Creates a new AbstractCountingBasedGlobalTermWeighter.

Parameters:
vectorFactory - The vector factory to use.
Method Detail

clone

public AbstractFrequencyBasedGlobalTermWeighter clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AbstractCloneableSerializable
Returns:
A clone of this object.

add

public void add(Vector counts)
Description copied from interface: VectorSpaceModel
Adds a document to the model.

Parameters:
counts - Adds a document to the model.

remove

public boolean remove(Vector counts)
Description copied from interface: VectorSpaceModel
Removes the document from the model.

Parameters:
counts - The document to remove.
Returns:
True if this object changed as a result of the removal.

initializeVectors

protected void initializeVectors(int dimensionality)
Initializes internal vectors to the given dimensionality.

Parameters:
dimensionality - The dimensionality to initialize to.

growVectors

protected void growVectors(int newDimensionality)
Called when the dimensionality of the term vector grows.

Parameters:
newDimensionality - The new dimensionality;

getDocumentCount

public int getDocumentCount()
Description copied from interface: VectorSpaceModel
Gets the number of documents that this object is using for its model

Returns:
The number of documents used for the model.

setDocumentCount

protected void setDocumentCount(int documentCount)
Sets the document count.

Parameters:
documentCount - The document count.

getTermDocumentFrequencies

public Vector getTermDocumentFrequencies()
Gets the vector containing the number of documents that each term appears in.

Returns:
The term document frequencies.

setTermDocumentFrequencies

protected void setTermDocumentFrequencies(Vector termDocumentFrequencies)
Sets the vector containing the number of documents that each term appears in.

Parameters:
termDocumentFrequencies - The document frequencies.

getTermGlobalFrequencies

public Vector getTermGlobalFrequencies()
Gets the vector containing the number of times that each term appears.

Returns:
The term global frequencies.

setTermGlobalFrequencies

protected void setTermGlobalFrequencies(Vector termGlobalFrequencies)
Gets the vector containing the number of times that each term appears.

Parameters:
termGlobalFrequencies - The term global frequencies.