gov.sandia.cognition.text.topic
Class LatentSemanticAnalysis

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.text.topic.LatentSemanticAnalysis
All Implemented Interfaces:
BatchLearner<Collection<? extends Vectorizable>,LatentSemanticAnalysis.Transform>, CloneableSerializable, Serializable, Cloneable

@PublicationReferences(references={@PublicationReference(author={"Scott Deerwester","Susan T. Dumais","George W. Furnas","Thomas K. Landauer","Richard Harshman"},title="Indexing by Latent Semantic Analysis",year=1990,type=Journal,publication="Journal of the American Society for Information Science",pages={391,407},url="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.8490"),@PublicationReference(author={"Thomas K. Landauer","Peter W. Foltz","Darrell Laham"},title="An Introduction to Latent Semantic Analysis",year=1998,type=Journal,publication="Discourse Processes",pages={259,284},url="http://lsa.colorado.edu/papers/dp1.LSAintro.pdf"),@PublicationReference(author="Wikipedia",title="Latent semantic analysis",year=2009,type=WebPage,url="http://en.wikipedia.org/wiki/Latent_semantic_analysis")})
public class LatentSemanticAnalysis
extends AbstractCloneableSerializable
implements BatchLearner<Collection<? extends Vectorizable>,LatentSemanticAnalysis.Transform>

Implements the Latent Semantic Analysis (LSA) algorithm using Singular Value Decomposition (SVD).

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Nested Class Summary
static class LatentSemanticAnalysis.Transform
          The result from doing latent semantic analysis (LSA).
 
Field Summary
static int DEFAULT_REQUESTED_RANK
          The default requested rank is 10.
protected  int requestedRank
          The rank requested for the result LSA.
 
Constructor Summary
LatentSemanticAnalysis()
          Creates a new LatentSemanticAnalysis with default parameters.
LatentSemanticAnalysis(int requestedRank)
          Creates a new LatentSemanticAnalysis with the given parameters.
 
Method Summary
 int getRequestedRank()
          Gets the requested rank for the analysis.
 LatentSemanticAnalysis.Transform learn(Collection<? extends Vectorizable> documents)
          The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.
 void setRequestedRank(int requestedRank)
          Sets the requested rank of the analysis.
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Field Detail

DEFAULT_REQUESTED_RANK

public static final int DEFAULT_REQUESTED_RANK
The default requested rank is 10.

See Also:
Constant Field Values

requestedRank

protected int requestedRank
The rank requested for the result LSA. The results may have smaller rank if the requested rank is greater than the number of documents. Must be positive.

Constructor Detail

LatentSemanticAnalysis

public LatentSemanticAnalysis()
Creates a new LatentSemanticAnalysis with default parameters.


LatentSemanticAnalysis

public LatentSemanticAnalysis(int requestedRank)
Creates a new LatentSemanticAnalysis with the given parameters.

Parameters:
requestedRank - The requested rank to create results of.
Method Detail

learn

public LatentSemanticAnalysis.Transform learn(Collection<? extends Vectorizable> documents)
Description copied from interface: BatchLearner
The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.

Specified by:
learn in interface BatchLearner<Collection<? extends Vectorizable>,LatentSemanticAnalysis.Transform>
Parameters:
documents - The data that the learning algorithm will use to create an object of ResultType.
Returns:
The object that is created based on the given data using the learning algorithm.

getRequestedRank

public int getRequestedRank()
Gets the requested rank for the analysis.

Returns:
The requested rank for the analysis.

setRequestedRank

public void setRequestedRank(int requestedRank)
Sets the requested rank of the analysis. The analysis will attempt to find the requested number of latent topics. If the number of documents is less than the requested rank, the actual rank of the analysis will be reduced to the number of documents.

Parameters:
requestedRank - The requested rank of the analysis. Must be positive.