gov.sandia.cognition.text.term.vector.weighter.global
Class DominanceGlobalTermWeighter

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.term.vector.AbstractVectorSpaceModel
          extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
              extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
                  extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractEntropyBasedGlobalTermWeighter
                      extended by gov.sandia.cognition.text.term.vector.weighter.global.DominanceGlobalTermWeighter
All Implemented Interfaces:
VectorFactoryContainer, VectorSpaceModel, GlobalTermWeighter, CloneableSerializable, Serializable, Cloneable

public class DominanceGlobalTermWeighter
extends AbstractEntropyBasedGlobalTermWeighter

Implements the dominance term gloal weighting scheme. It is based on the entropy global weighting scheme, but instead the global weight favors terms with high entropy instead of discounting them, which is called the term dominance. The formula for weighting is given as: For term i, the global weight (D(i)) is: D(i) = exp(H(i)) / n H(i) = - sum_j { p_ij log(p_ij) } p_ij = tf_ij / gf_i where n = The total number of documents gf_i = The total number of times that term i appears tf_ij = The number of times that term i appears in document j This class uses an optimization for computing H(i): H(i) = - (sum_j (tf_ij log(tf_ij))) / fg_i + log(fg_i) which allows sum_j (tf_ij log(tf_ij)) to be incrementally computed and then divided by gf_i when needed, instead of needing to compute p_ij each time.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
protected  Vector dominance
          A vector caching the global dominance weight of the document collection.
 
Fields inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractEntropyBasedGlobalTermWeighter
termEntropiesSum
 
Fields inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
documentCount, termDocumentFrequencies, termGlobalFrequencies
 
Fields inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
vectorFactory
 
Constructor Summary
DominanceGlobalTermWeighter()
          Creates a new DominanceGlobalTermWeighter.
DominanceGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
          Creates a new DominanceGlobalTermWeighter.
 
Method Summary
 void add(Vector counts)
          Adds a document to the model.
 DominanceGlobalTermWeighter clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
 int getDimensionality()
          Gets the dimensionality of the global weights.
 Vector getDominance()
          Gets the dominance weight (global weight) vector for all of the terms.
 Vector getGlobalWeights()
          Gets the current vector of global weights.
 boolean remove(Vector counts)
          Removes the document from the model.
protected  void setDominance(Vector dominance)
          Sets the cached dominance weight vector.
 
Methods inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractEntropyBasedGlobalTermWeighter
getTermEntropiesSum, growVectors, initializeVectors, setTermEntropiesSum
 
Methods inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
getDocumentCount, getTermDocumentFrequencies, getTermGlobalFrequencies, setDocumentCount, setTermDocumentFrequencies, setTermGlobalFrequencies
 
Methods inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
getVectorFactory, setVectorFactory
 
Methods inherited from class gov.sandia.cognition.text.term.vector.AbstractVectorSpaceModel
add, addAll, remove, removeAll
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.text.term.vector.VectorSpaceModel
add, addAll, remove, removeAll
 

Field Detail

dominance

protected Vector dominance
A vector caching the global dominance weight of the document collection. It may be null. Use getDominance() to compute the proper value if it has not been updated yet.

Constructor Detail

DominanceGlobalTermWeighter

public DominanceGlobalTermWeighter()
Creates a new DominanceGlobalTermWeighter.


DominanceGlobalTermWeighter

public DominanceGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
Creates a new DominanceGlobalTermWeighter.

Parameters:
vectorFactory - The vector factory.
Method Detail

clone

public DominanceGlobalTermWeighter clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AbstractEntropyBasedGlobalTermWeighter
Returns:
A clone of this object.

add

public void add(Vector counts)
Description copied from interface: VectorSpaceModel
Adds a document to the model.

Specified by:
add in interface VectorSpaceModel
Overrides:
add in class AbstractEntropyBasedGlobalTermWeighter
Parameters:
counts - Adds a document to the model.

remove

public boolean remove(Vector counts)
Description copied from interface: VectorSpaceModel
Removes the document from the model.

Specified by:
remove in interface VectorSpaceModel
Overrides:
remove in class AbstractEntropyBasedGlobalTermWeighter
Parameters:
counts - The document to remove.
Returns:
True if this object changed as a result of the removal.

getGlobalWeights

public Vector getGlobalWeights()
Description copied from interface: GlobalTermWeighter
Gets the current vector of global weights.

Returns:
The global weights.

getDimensionality

public int getDimensionality()
Description copied from interface: GlobalTermWeighter
Gets the dimensionality of the global weights.

Returns:
The dimensionality of the global weights. -1 if unknown.

getDominance

public Vector getDominance()
Gets the dominance weight (global weight) vector for all of the terms.

Returns:
The dominance weight (global weight) vector for all of the terms.

setDominance

protected void setDominance(Vector dominance)
Sets the cached dominance weight vector.

Parameters:
dominance - The cached dominance weight vector.