gov.sandia.cognition.text.term.vector.weighter.global
Class InverseDocumentFrequencyGlobalTermWeighter

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.term.vector.AbstractVectorSpaceModel
          extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
              extended by gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
                  extended by gov.sandia.cognition.text.term.vector.weighter.global.InverseDocumentFrequencyGlobalTermWeighter
All Implemented Interfaces:
VectorFactoryContainer, VectorSpaceModel, GlobalTermWeighter, CloneableSerializable, Serializable, Cloneable

@PublicationReference(author="Wikipedia",
                      title="tf-idf",
                      type=WebPage,
                      url="http://en.wikipedia.org/wiki/tf-idf",
                      year=2009)
public class InverseDocumentFrequencyGlobalTermWeighter
extends AbstractFrequencyBasedGlobalTermWeighter

Implements the inverse-document-frequency (IDF) term global weighting scheme. It is a commonly used term weighting approach that gives a higher weight to terms that appear in a small number of documents in the collection. Its formula is: idf_i = log(n / df_i) where n is the total number of documents and df_i is the number of documents that term i appears in.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
protected  Vector inverseDocumentFrequency
          The (cached) value of the inverse document frequency.
 
Fields inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
documentCount, termDocumentFrequencies, termGlobalFrequencies
 
Fields inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
vectorFactory
 
Constructor Summary
InverseDocumentFrequencyGlobalTermWeighter()
          Creates a new InverseDocumentFrequencyGlobalTermWeighter.
InverseDocumentFrequencyGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
          Creates a new InverseDocumentFrequencyGlobalTermWeighter.
 
Method Summary
 void add(Vector counts)
          Adds a document to the model.
 InverseDocumentFrequencyGlobalTermWeighter clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
 int getDimensionality()
          Gets the dimensionality of the global weights.
 Vector getGlobalWeights()
          Gets the current vector of global weights.
 Vector getInverseDocumentFrequency()
          Gets the inverse-document-frequency (IDF) global weight values.
 boolean remove(Vector counts)
          Removes the document from the model.
protected  void setInverseDocumentFrequency(Vector inverseDocumentFrequency)
          Sets the cached inverse-document-frequency (IDF) global weight values.
 
Methods inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractFrequencyBasedGlobalTermWeighter
getDocumentCount, getTermDocumentFrequencies, getTermGlobalFrequencies, growVectors, initializeVectors, setDocumentCount, setTermDocumentFrequencies, setTermGlobalFrequencies
 
Methods inherited from class gov.sandia.cognition.text.term.vector.weighter.global.AbstractGlobalTermWeighter
getVectorFactory, setVectorFactory
 
Methods inherited from class gov.sandia.cognition.text.term.vector.AbstractVectorSpaceModel
add, addAll, remove, removeAll
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.text.term.vector.VectorSpaceModel
add, addAll, remove, removeAll
 

Field Detail

inverseDocumentFrequency

protected Vector inverseDocumentFrequency
The (cached) value of the inverse document frequency. The cached value is cleared out whenever a document is added or removed. It is recomputed from the other state values on request.

Constructor Detail

InverseDocumentFrequencyGlobalTermWeighter

public InverseDocumentFrequencyGlobalTermWeighter()
Creates a new InverseDocumentFrequencyGlobalTermWeighter.


InverseDocumentFrequencyGlobalTermWeighter

public InverseDocumentFrequencyGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
Creates a new InverseDocumentFrequencyGlobalTermWeighter.

Parameters:
vectorFactory - The vector factory to use.
Method Detail

clone

public InverseDocumentFrequencyGlobalTermWeighter clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AbstractFrequencyBasedGlobalTermWeighter
Returns:
A clone of this object.

add

public void add(Vector counts)
Description copied from interface: VectorSpaceModel
Adds a document to the model.

Specified by:
add in interface VectorSpaceModel
Overrides:
add in class AbstractFrequencyBasedGlobalTermWeighter
Parameters:
counts - Adds a document to the model.

remove

public boolean remove(Vector counts)
Description copied from interface: VectorSpaceModel
Removes the document from the model.

Specified by:
remove in interface VectorSpaceModel
Overrides:
remove in class AbstractFrequencyBasedGlobalTermWeighter
Parameters:
counts - The document to remove.
Returns:
True if this object changed as a result of the removal.

getDimensionality

public int getDimensionality()
Description copied from interface: GlobalTermWeighter
Gets the dimensionality of the global weights.

Returns:
The dimensionality of the global weights. -1 if unknown.

getGlobalWeights

public Vector getGlobalWeights()
Description copied from interface: GlobalTermWeighter
Gets the current vector of global weights.

Returns:
The global weights.

getInverseDocumentFrequency

public Vector getInverseDocumentFrequency()
Gets the inverse-document-frequency (IDF) global weight values.

Returns:
The inverse-document-frequency (IDF) values.

setInverseDocumentFrequency

protected void setInverseDocumentFrequency(Vector inverseDocumentFrequency)
Sets the cached inverse-document-frequency (IDF) global weight values.

Parameters:
inverseDocumentFrequency - The cached inverse-document-frequency (IDF) global weight values.