gov.sandia.cognition.learning.algorithm.clustering
Class OptimizedKMeansClusterer<DataType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<ClusterType>>
                  extended by gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer<DataType,CentroidCluster<DataType>>
                      extended by gov.sandia.cognition.learning.algorithm.clustering.OptimizedKMeansClusterer<DataType>
Type Parameters:
DataType - The type of the data to cluster. This is typically defined by the divergence function used.
All Implemented Interfaces:
AnytimeAlgorithm<Collection<CentroidCluster<DataType>>>, IterativeAlgorithm, MeasurablePerformanceAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>, BatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>, BatchClusterer<DataType,CentroidCluster<DataType>>, DivergenceFunctionContainer<CentroidCluster<DataType>,DataType>, CloneableSerializable, Serializable, Cloneable

@CodeReview(reviewer="Kevin R. Dixon",
            date="2008-07-22",
            changesNeeded=false,
            comments={"Added PublicationReference","Code generally looks fine."})
@PublicationReference(author="C. Elkan",
                      title="Using the Triangle Inequality to Accelerate k-Means",
                      type=Conference,
                      year=2003,
                      publication="Proceedings of the Twentieth International Conference on Machine Learning",
                      pages={147,153},
                      url="www-cse.ucsd.edu/~elkan/kmeansicml03.pdf")
public class OptimizedKMeansClusterer<DataType>
extends KMeansClusterer<DataType,CentroidCluster<DataType>>

This class implements an optimized version of the k-means algorithm that makes use of the triangle inequality to compute the same answer as k-means while using less distance calculations. The only restriction that the algorithm places is that the divergence function it is given must be a metric because it must obey the triangle inequality.

Since:
1.0
Author:
Justin Basilico, Kevin R. Dixon
See Also:
Serialized Form

Field Summary
protected  double[][] clusterDistances
          The distances between clusters.
protected  double[][] lowerBounds
          The lower bounds on the distances to the clusters.
protected  double[] upperBounds
          The upper bounds on the distance to the current assigned cluster.
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer
assignments, clusterCounts, clusters, creator, DEFAULT_MAX_ITERATIONS, DEFAULT_NUM_REQUESTED_CLUSTERS, divergenceFunction, initializer, numRequestedClusters
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
OptimizedKMeansClusterer(int numClusters, int maxIterations, FixedClusterInitializer<CentroidCluster<DataType>,DataType> initializer, Metric<? super DataType> metric, ClusterCreator<CentroidCluster<DataType>,DataType> creator)
          Creates a new instance of OptimizedKMeansClusterer.
 
Method Summary
 OptimizedKMeansClusterer<DataType> clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
protected  void computeClusterDistances()
          Computes the distances between the clusters.
 DataType getClusterCentroid(int clusterIndex)
          Gets the centroid for the given cluster index.
 Metric<? super DataType> getMetric()
          Gets the metric being used by the algorithm.
protected  boolean initializeAlgorithm()
          Called to initialize the learning algorithm's state based on the data that is stored in the data field.
protected  boolean step()
          Do a step of the clustering algorithm.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer
assignDataFromIndices, assignDataToClusters, cleanupAlgorithm, createClustersFromAssignments, getAssignments, getClosestClusterIndex, getCluster, getClusterCounts, getClusters, getCreator, getDivergenceFunction, getInitializer, getNumChanged, getNumClusters, getNumElements, getNumRequestedClusters, getPerformance, getResult, setAssignment, setClusters, setCreator, setData, setDivergenceFunction, setInitializer, setNumChanged, setNumRequestedClusters
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
getData, getKeepGoing, learn, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Field Detail

lowerBounds

protected double[][] lowerBounds
The lower bounds on the distances to the clusters.


upperBounds

protected double[] upperBounds
The upper bounds on the distance to the current assigned cluster.


clusterDistances

protected double[][] clusterDistances
The distances between clusters.

Constructor Detail

OptimizedKMeansClusterer

public OptimizedKMeansClusterer(int numClusters,
                                int maxIterations,
                                FixedClusterInitializer<CentroidCluster<DataType>,DataType> initializer,
                                Metric<? super DataType> metric,
                                ClusterCreator<CentroidCluster<DataType>,DataType> creator)
Creates a new instance of OptimizedKMeansClusterer.

Parameters:
numClusters - The number of clusters to create.
maxIterations - Number of iterations before stopping
initializer - The cluster initializer.
metric - The metric to use.
creator - The cluster creator to use.
Method Detail

clone

public OptimizedKMeansClusterer<DataType> clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class KMeansClusterer<DataType,CentroidCluster<DataType>>
Returns:
A clone of this object.

initializeAlgorithm

protected boolean initializeAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to initialize the learning algorithm's state based on the data that is stored in the data field. The return value indicates if the algorithm can be run or not based on the initialization.

Overrides:
initializeAlgorithm in class KMeansClusterer<DataType,CentroidCluster<DataType>>
Returns:
True if the learning algorithm can be run and false if it cannot.

computeClusterDistances

protected void computeClusterDistances()
Computes the distances between the clusters.


step

protected boolean step()
Description copied from class: KMeansClusterer
Do a step of the clustering algorithm. Return the number of elements the changed their cluster membership. If this is zero then the clustering is complete.

Overrides:
step in class KMeansClusterer<DataType,CentroidCluster<DataType>>
Returns:
true means keep going, false means stop clustering.

getClusterCentroid

public DataType getClusterCentroid(int clusterIndex)
Gets the centroid for the given cluster index.

Parameters:
clusterIndex - The index of the cluster to get the centroid for.
Returns:
The centroid for the given cluster.

getMetric

public Metric<? super DataType> getMetric()
Gets the metric being used by the algorithm.

Returns:
The metric being used.