gov.sandia.cognition.learning.algorithm.clustering
Class DirichletProcessClustering

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AnytimeAlgorithmWrapper<Collection<GaussianCluster>,DirichletProcessMixtureModel<Vector>>
              extended by gov.sandia.cognition.learning.algorithm.clustering.DirichletProcessClustering
All Implemented Interfaces:
AnytimeAlgorithm<Collection<GaussianCluster>>, IterativeAlgorithm, IterativeAlgorithmListener, MeasurablePerformanceAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends Vector>,Collection<GaussianCluster>>, BatchLearner<Collection<? extends Vector>,Collection<GaussianCluster>>, BatchClusterer<Vector,GaussianCluster>, CloneableSerializable, Randomized, Serializable, Cloneable

@PublicationReferences(references={@PublicationReference(author="Michael I. Jordan",title="Dirichlet Processes, Chinese Restaurant Processes and All That",type=Conference,publication="NIPS",year=2005,url="http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps"),@PublicationReference(author="Radform M. Neal",title="Markov Chain Sampling Methods for Dirichlet Process Mixture Models",type=Journal,year=2000,publication="Journal of Computational and Graphical Statistics, Vol. 9, No. 2",pages={249,265},notes="Based in part on Algorithm 2 from Neal"),@PublicationReference(author={"Michael D. Escobar","Mike West"},title="Bayesian Density Estimation and Inference Using Mixtures",type=Journal,publication="Journal of the American Statistical Association",year=1995)})
public class DirichletProcessClustering
extends AnytimeAlgorithmWrapper<Collection<GaussianCluster>,DirichletProcessMixtureModel<Vector>>
implements BatchClusterer<Vector,GaussianCluster>, AnytimeBatchLearner<Collection<? extends Vector>,Collection<GaussianCluster>>, Randomized, MeasurablePerformanceAlgorithm

Clustering algorithm that wraps Dirichlet Process Mixture Model. DPMM finds the number of clusters, means, and (optionally by default) covariance of Vector data. Gory details: The clustering algorithm begins by drawing samples from the posterior of a Dirichlet process mixture model, given the data, using the method of Gibbs sampling. From the resulting samples (the number of which is a parameter), we select the clustering that has the highest Maximum A Posteriori likelihood using the Chinese Restaurant Process as the prior on the clustering.

Since:
3.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Field Summary
static int DEFAULT_DIMENSIONALITY
          Default dimensionality, 2.
static int DEFAULT_SAMPLES
          Default number of samples, 1000.
static String PERFORMANCE_DESCRIPTION
          Description of the performance value returned, "Number of Clusters".
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
DirichletProcessClustering()
          Creates a new instance of DirichletProcessClustering
DirichletProcessClustering(DirichletProcessMixtureModel<Vector> algorithm)
          Creates a new instance of DirichletProcessClustering
DirichletProcessClustering(int dimensionality)
          Creates a new instance of DirichletProcessClustering
 
Method Summary
 DirichletProcessClustering clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
 Collection<? extends Vector> getData()
          Gets the data to use for learning.
 boolean getKeepGoing()
          Gets the keep going value, which indicates if the algorithm should continue on to another step.
 NamedValue<Integer> getPerformance()
          Gets the name-value pair that describes the current performance of the algorithm.
 Random getRandom()
          Gets the random number generator used by this object.
 ArrayList<GaussianCluster> getResult()
          Gets the current result of the algorithm.
 ArrayList<GaussianCluster> learn(Collection<? extends Vector> data)
          The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.
 void setRandom(Random random)
          Sets the random number generator used by this object.
 
Methods inherited from class gov.sandia.cognition.algorithm.AnytimeAlgorithmWrapper
algorithmEnded, algorithmStarted, getAlgorithm, getIteration, getMaxIterations, isResultValid, readResolve, setAlgorithm, setMaxIterations, stepEnded, stepStarted, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid, stop
 

Field Detail

PERFORMANCE_DESCRIPTION

public static final String PERFORMANCE_DESCRIPTION
Description of the performance value returned, "Number of Clusters".

See Also:
Constant Field Values

DEFAULT_DIMENSIONALITY

public static final int DEFAULT_DIMENSIONALITY
Default dimensionality, 2.

See Also:
Constant Field Values

DEFAULT_SAMPLES

public static final int DEFAULT_SAMPLES
Default number of samples, 1000.

See Also:
Constant Field Values
Constructor Detail

DirichletProcessClustering

public DirichletProcessClustering()
Creates a new instance of DirichletProcessClustering


DirichletProcessClustering

public DirichletProcessClustering(int dimensionality)
Creates a new instance of DirichletProcessClustering

Parameters:
dimensionality - Dimensionality of the observations

DirichletProcessClustering

public DirichletProcessClustering(DirichletProcessMixtureModel<Vector> algorithm)
Creates a new instance of DirichletProcessClustering

Parameters:
algorithm - Dirichlet Process Mixture model that is being wrapped
Method Detail

clone

public DirichletProcessClustering clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AnytimeAlgorithmWrapper<Collection<GaussianCluster>,DirichletProcessMixtureModel<Vector>>
Returns:
A clone of this object.

getResult

public ArrayList<GaussianCluster> getResult()
Description copied from interface: AnytimeAlgorithm
Gets the current result of the algorithm.

Specified by:
getResult in interface AnytimeAlgorithm<Collection<GaussianCluster>>
Returns:
Current result of the algorithm.

learn

public ArrayList<GaussianCluster> learn(Collection<? extends Vector> data)
Description copied from interface: BatchLearner
The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.

Specified by:
learn in interface BatchLearner<Collection<? extends Vector>,Collection<GaussianCluster>>
Parameters:
data - The data that the learning algorithm will use to create an object of ResultType.
Returns:
The object that is created based on the given data using the learning algorithm.

getRandom

public Random getRandom()
Description copied from interface: Randomized
Gets the random number generator used by this object.

Specified by:
getRandom in interface Randomized
Returns:
The random number generator used by this object.

setRandom

public void setRandom(Random random)
Description copied from interface: Randomized
Sets the random number generator used by this object.

Specified by:
setRandom in interface Randomized
Parameters:
random - The random number generator for this object to use.

getPerformance

public NamedValue<Integer> getPerformance()
Description copied from interface: MeasurablePerformanceAlgorithm
Gets the name-value pair that describes the current performance of the algorithm. For most algorithms, this is the value that they are attempting to optimize.

Specified by:
getPerformance in interface MeasurablePerformanceAlgorithm
Returns:
The name-value pair that describes the current performance of the algorithm.

getKeepGoing

public boolean getKeepGoing()
Description copied from interface: AnytimeBatchLearner
Gets the keep going value, which indicates if the algorithm should continue on to another step.

Specified by:
getKeepGoing in interface AnytimeBatchLearner<Collection<? extends Vector>,Collection<GaussianCluster>>
Returns:
The keep going value.

getData

public Collection<? extends Vector> getData()
Description copied from interface: AnytimeBatchLearner
Gets the data to use for learning. This is set when learning starts and then cleared out once learning is finished.

Specified by:
getData in interface AnytimeBatchLearner<Collection<? extends Vector>,Collection<GaussianCluster>>
Returns:
The data to use for learning.