gov.sandia.cognition.text.topic
Class ProbabilisticLatentSemanticAnalysis

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>
                  extended by gov.sandia.cognition.text.topic.ProbabilisticLatentSemanticAnalysis
All Implemented Interfaces:
AnytimeAlgorithm<ProbabilisticLatentSemanticAnalysis.Result>, IterativeAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>, BatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>, VectorFactoryContainer, CloneableSerializable, Randomized, Serializable, Cloneable

@PublicationReferences(references={@PublicationReference(author="Thomas Hofmann",title="Probabilistic Latent Semantic Analysis",year=1999,type=Conference,publication="Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI)",pages={289,296},url="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.1187"),@PublicationReference(author="Thomas Hofmann",title="Probabilistic Latent Semantic Indexing",year=1999,type=Conference,publication="Proceedings of the 22nd Conference of the ACM Special Interest Group on Information Retreival (SIGIR)",pages={50,57},url="http://portal.acm.org/citation.cfm?id=312649"),@PublicationReference(author="Thomas Hofmann",title="Unsupervised Learning by Probabilistic Latent Semantic Analysis",year=2001,type=Journal,publication="Machine Learning",pages={177,196},url="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.6341")})
public class ProbabilisticLatentSemanticAnalysis
extends AbstractAnytimeBatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>
implements Randomized, VectorFactoryContainer

An implementation of the Probabilistic Latent Semantic Analysis (PLSA) algorithm.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Nested Class Summary
static class ProbabilisticLatentSemanticAnalysis.LatentData
          The information about each latent variable.
static class ProbabilisticLatentSemanticAnalysis.Result
          The dimensionality transform created by probabilistic latent semantic analysis.
static class ProbabilisticLatentSemanticAnalysis.StatusPrinter
          Prints out the status of the probabilistic latent semantic analysis algorithm.
 
Field Summary
protected  double changeOfLogLikelihood
          The change in log-likelihood of the algorithm from the current iteration.
static int DEFAULT_MAX_ITERATIONS
          The default maximum number of iterations is 250.
static double DEFAULT_MINIMUM_CHANGE
          The default minimum change is 1.0E-10.
static int DEFAULT_REQUESTED_RANK
          The default requested rank is 10.
protected  int documentCount
          The number of documents.
protected  Matrix documentsByTerms
          The document-by-term matrix.
protected  int latentCount
          The number of latent variables.
protected  ProbabilisticLatentSemanticAnalysis.LatentData[] latents
          The information about each of the latent variables.
protected  double logLikelihood
          The current log-likelihood of the algorithm.
protected  MatrixFactory<? extends Matrix> matrixFactory
          The matrix factory.
protected  double minimumChange
          The minimum change required in log-likelihood to continue iterating.
protected  Random random
          The random number generator to use.
protected  int requestedRank
          The requested rank to reduce the dimensionality to.
protected  ProbabilisticLatentSemanticAnalysis.Result result
          The result being produced by the algorithm.
protected  int termCount
          The number of terms.
protected  VectorFactory<? extends Vector> vectorFactory
          The vector factory.
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
ProbabilisticLatentSemanticAnalysis()
          Creates a new ProbabilisticSemanticAnalysis with default parameters.
ProbabilisticLatentSemanticAnalysis(int requestedRank)
          Creates a new ProbabilisticLatentSemanticAnalysis with the given rank and otherwise default parameters.
ProbabilisticLatentSemanticAnalysis(int requestedRank, double minimumChange, Random random)
          Creates a new ProbabilisticLatentSemanticAnalysis with the given parameters.
ProbabilisticLatentSemanticAnalysis(Random random)
          Creates a new ProbabilisticLatentSemanticAnalysis with default parameters and the given random number generator.
 
Method Summary
protected  void cleanupAlgorithm()
          Called to clean up the learning algorithm's state after learning has finished.
 MatrixFactory<? extends Matrix> getMatrixFactory()
          Gets the matrix factory to use.
 double getMinimumChange()
          Gets the minimum change in log-likelihood to allow before stopping the algorithm.
 Random getRandom()
          Gets the random number generator used by this object.
 int getRequestedRank()
          Gets the requested rank to conduct the analysis for.
 ProbabilisticLatentSemanticAnalysis.Result getResult()
          Gets the current result of the algorithm.
 VectorFactory<? extends Vector> getVectorFactory()
          Gets the vector factory to use.
protected  boolean initializeAlgorithm()
          Called to initialize the learning algorithm's state based on the data that is stored in the data field.
 void setMatrixFactory(MatrixFactory<? extends Matrix> matrixFactory)
          Sets the matrix factory to use.
 void setMinimumChange(double minimumChange)
          Sets the minimum change in log-likelihood to allow before stopping the algorithm.
 void setRandom(Random random)
          Sets the random number generator used by this object.
 void setRequestedRank(int requestedRank)
          Sets the requested rank to conduct the analysis for.
 void setVectorFactory(VectorFactory<? extends Vector> vectorFactory)
          Sets the vector factory to use.
protected  boolean step()
          Called to take a single step of the learning algorithm.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
clone, getData, getKeepGoing, learn, setData, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Field Detail

DEFAULT_REQUESTED_RANK

public static final int DEFAULT_REQUESTED_RANK
The default requested rank is 10.

See Also:
Constant Field Values

DEFAULT_MAX_ITERATIONS

public static final int DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 250.

See Also:
Constant Field Values

DEFAULT_MINIMUM_CHANGE

public static final double DEFAULT_MINIMUM_CHANGE
The default minimum change is 1.0E-10.

See Also:
Constant Field Values

requestedRank

protected int requestedRank
The requested rank to reduce the dimensionality to.


minimumChange

protected double minimumChange
The minimum change required in log-likelihood to continue iterating. Used for a stopping criteria.


random

protected Random random
The random number generator to use.


vectorFactory

protected VectorFactory<? extends Vector> vectorFactory
The vector factory.


matrixFactory

protected MatrixFactory<? extends Matrix> matrixFactory
The matrix factory.


documentsByTerms

protected transient Matrix documentsByTerms
The document-by-term matrix.


termCount

protected transient int termCount
The number of terms.


documentCount

protected transient int documentCount
The number of documents.


latentCount

protected transient int latentCount
The number of latent variables.


latents

protected transient ProbabilisticLatentSemanticAnalysis.LatentData[] latents
The information about each of the latent variables.


logLikelihood

protected transient double logLikelihood
The current log-likelihood of the algorithm.


changeOfLogLikelihood

protected transient double changeOfLogLikelihood
The change in log-likelihood of the algorithm from the current iteration.


result

protected transient ProbabilisticLatentSemanticAnalysis.Result result
The result being produced by the algorithm.

Constructor Detail

ProbabilisticLatentSemanticAnalysis

public ProbabilisticLatentSemanticAnalysis()
Creates a new ProbabilisticSemanticAnalysis with default parameters.


ProbabilisticLatentSemanticAnalysis

public ProbabilisticLatentSemanticAnalysis(Random random)
Creates a new ProbabilisticLatentSemanticAnalysis with default parameters and the given random number generator.

Parameters:
random - The random number generator to use.

ProbabilisticLatentSemanticAnalysis

public ProbabilisticLatentSemanticAnalysis(int requestedRank)
Creates a new ProbabilisticLatentSemanticAnalysis with the given rank and otherwise default parameters.

Parameters:
requestedRank - The requested rank. Must be non-negative.

ProbabilisticLatentSemanticAnalysis

public ProbabilisticLatentSemanticAnalysis(int requestedRank,
                                           double minimumChange,
                                           Random random)
Creates a new ProbabilisticLatentSemanticAnalysis with the given parameters.

Parameters:
requestedRank - The requested rank. Must be non-negative.
minimumChange - The minimum change in log-likelihood to stop.
random - The random number generator to use.
Method Detail

initializeAlgorithm

protected boolean initializeAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to initialize the learning algorithm's state based on the data that is stored in the data field. The return value indicates if the algorithm can be run or not based on the initialization.

Specified by:
initializeAlgorithm in class AbstractAnytimeBatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>
Returns:
True if the learning algorithm can be run and false if it cannot.

step

protected boolean step()
Description copied from class: AbstractAnytimeBatchLearner
Called to take a single step of the learning algorithm.

Specified by:
step in class AbstractAnytimeBatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>
Returns:
True if another step can be taken and false it the algorithm should halt.

cleanupAlgorithm

protected void cleanupAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to clean up the learning algorithm's state after learning has finished.

Specified by:
cleanupAlgorithm in class AbstractAnytimeBatchLearner<Collection<? extends Vectorizable>,ProbabilisticLatentSemanticAnalysis.Result>

getResult

public ProbabilisticLatentSemanticAnalysis.Result getResult()
Description copied from interface: AnytimeAlgorithm
Gets the current result of the algorithm.

Specified by:
getResult in interface AnytimeAlgorithm<ProbabilisticLatentSemanticAnalysis.Result>
Returns:
Current result of the algorithm.

getRandom

public Random getRandom()
Description copied from interface: Randomized
Gets the random number generator used by this object.

Specified by:
getRandom in interface Randomized
Returns:
The random number generator used by this object.

setRandom

public void setRandom(Random random)
Description copied from interface: Randomized
Sets the random number generator used by this object.

Specified by:
setRandom in interface Randomized
Parameters:
random - The random number generator for this object to use.

getVectorFactory

public VectorFactory<? extends Vector> getVectorFactory()
Gets the vector factory to use.

Specified by:
getVectorFactory in interface VectorFactoryContainer
Returns:
The vector factory to use.

setVectorFactory

public void setVectorFactory(VectorFactory<? extends Vector> vectorFactory)
Sets the vector factory to use.

Parameters:
vectorFactory - The vector factory to use.

getMatrixFactory

public MatrixFactory<? extends Matrix> getMatrixFactory()
Gets the matrix factory to use.

Returns:
The matrix factory to use.

setMatrixFactory

public void setMatrixFactory(MatrixFactory<? extends Matrix> matrixFactory)
Sets the matrix factory to use.

Parameters:
matrixFactory - The matrix factory to use.

getRequestedRank

public int getRequestedRank()
Gets the requested rank to conduct the analysis for. It is the number of latent variables to use.

Returns:
The requested rank. Must be positive.

setRequestedRank

public void setRequestedRank(int requestedRank)
Sets the requested rank to conduct the analysis for. It is the number of latent variables to use.

Parameters:
requestedRank - The requested rank. Must be positive.

getMinimumChange

public double getMinimumChange()
Gets the minimum change in log-likelihood to allow before stopping the algorithm.

Returns:
The minimum change in log-likelihood to allow before stopping. Must be non-negative.

setMinimumChange

public void setMinimumChange(double minimumChange)
Sets the minimum change in log-likelihood to allow before stopping the algorithm.

Parameters:
minimumChange - The minimum change in log-likelihood to allow before stopping. Must be non-negative.