gov.sandia.cognition.learning.algorithm.clustering
Class ParallelizedKMeansClusterer<DataType,ClusterType extends Cluster<DataType>>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<ClusterType>>
                  extended by gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer<DataType,ClusterType>
                      extended by gov.sandia.cognition.learning.algorithm.clustering.ParallelizedKMeansClusterer<DataType,ClusterType>
Type Parameters:
DataType - The type of the data to cluster. This is typically defined by the divergence function used.
ClusterType - The type of Cluster created by the algorithm. This is typically defined by the cluster creator function used.
All Implemented Interfaces:
AnytimeAlgorithm<Collection<ClusterType>>, IterativeAlgorithm, MeasurablePerformanceAlgorithm, ParallelAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends DataType>,Collection<ClusterType>>, BatchLearner<Collection<? extends DataType>,Collection<ClusterType>>, BatchClusterer<DataType,ClusterType>, DivergenceFunctionContainer<ClusterType,DataType>, CloneableSerializable, Serializable, Cloneable

@PublicationReference(author="Halil Bisgin",
                      title="Parallel Clustering Algorithms with Application to Climatology",
                      type=Thesis,
                      year=2007,
                      url="http://www.halilbisgin.com/thesis/thesis.pdf")
public class ParallelizedKMeansClusterer<DataType,ClusterType extends Cluster<DataType>>
extends KMeansClusterer<DataType,ClusterType>
implements ParallelAlgorithm

This is a parallel implementation of the k-means clustering algorithm. The default is to use n-1 available cores/hyperthreads on a machine and spread the data-point-to-cluster assignment (E-step) and the cluster re-estimation (M-step) across these computational units. The output of this algorithm is exact, and should return the same results as the serial version of k-means for an identical dataset and random seed.

Since:
3.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Nested Class Summary
protected  class ParallelizedKMeansClusterer.AssignDataToCluster
          Callable task for the evaluate() method.
protected  class ParallelizedKMeansClusterer.CreateClustersFromAssignments
          Callable task for that creates clusters from assigned data
 
Field Summary
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer
assignments, clusterCounts, clusters, creator, DEFAULT_MAX_ITERATIONS, DEFAULT_NUM_REQUESTED_CLUSTERS, divergenceFunction, initializer, numRequestedClusters
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
ParallelizedKMeansClusterer()
          Default constructor
ParallelizedKMeansClusterer(int numRequestedClusters, int maxIterations, ThreadPoolExecutor threadPool, FixedClusterInitializer<ClusterType,DataType> initializer, ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction, ClusterCreator<ClusterType,DataType> creator)
          Creates a new instance of ParallelizedKMeansClusterer2
 
Method Summary
protected  int[] assignDataToClusters(Collection<? extends DataType> data)
          Creates the cluster assignments given the current locations of clusters
 ParallelizedKMeansClusterer<DataType,ClusterType> clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
protected  void createAssignmentTasks()
          Creates the assignment tasks given the number of threads requested
protected  void createClustersFromAssignments()
          Creates the set of clusters using the current cluster assignments.
 int getNumThreads()
          Gets the number of threads in the thread pool.
 ThreadPoolExecutor getThreadPool()
          Gets the thread pool for the algorithm to use.
protected  boolean initializeAlgorithm()
          Called to initialize the learning algorithm's state based on the data that is stored in the data field.
 void setThreadPool(ThreadPoolExecutor threadPool)
          Sets the thread pool for the algorithm to use.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer
assignDataFromIndices, cleanupAlgorithm, getAssignments, getClosestClusterIndex, getCluster, getClusterCounts, getClusters, getCreator, getDivergenceFunction, getInitializer, getNumChanged, getNumClusters, getNumElements, getNumRequestedClusters, getPerformance, getResult, setAssignment, setClusters, setCreator, setData, setDivergenceFunction, setInitializer, setNumChanged, setNumRequestedClusters, step
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
getData, getKeepGoing, learn, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Constructor Detail

ParallelizedKMeansClusterer

public ParallelizedKMeansClusterer()
Default constructor


ParallelizedKMeansClusterer

public ParallelizedKMeansClusterer(int numRequestedClusters,
                                   int maxIterations,
                                   ThreadPoolExecutor threadPool,
                                   FixedClusterInitializer<ClusterType,DataType> initializer,
                                   ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction,
                                   ClusterCreator<ClusterType,DataType> creator)
Creates a new instance of ParallelizedKMeansClusterer2

Parameters:
numRequestedClusters - The number of clusters requested (k).
maxIterations - Maximum number of iterations before stopping
threadPool - Thread pool to use for parallelization
initializer - The initializer for the clusters.
divergenceFunction - The divergence function.
creator - The cluster creator.
Method Detail

clone

public ParallelizedKMeansClusterer<DataType,ClusterType> clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>>
Returns:
A clone of this object.

getThreadPool

public ThreadPoolExecutor getThreadPool()
Description copied from interface: ParallelAlgorithm
Gets the thread pool for the algorithm to use.

Specified by:
getThreadPool in interface ParallelAlgorithm
Returns:
Thread pool used for parallelization.

setThreadPool

public void setThreadPool(ThreadPoolExecutor threadPool)
Description copied from interface: ParallelAlgorithm
Sets the thread pool for the algorithm to use.

Specified by:
setThreadPool in interface ParallelAlgorithm
Parameters:
threadPool - Thread pool used for parallelization.

getNumThreads

public int getNumThreads()
Description copied from interface: ParallelAlgorithm
Gets the number of threads in the thread pool.

Specified by:
getNumThreads in interface ParallelAlgorithm
Returns:
Number of threads in the thread pool

createAssignmentTasks

protected void createAssignmentTasks()
Creates the assignment tasks given the number of threads requested


initializeAlgorithm

protected boolean initializeAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to initialize the learning algorithm's state based on the data that is stored in the data field. The return value indicates if the algorithm can be run or not based on the initialization.

Overrides:
initializeAlgorithm in class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>>
Returns:
True if the learning algorithm can be run and false if it cannot.

assignDataToClusters

protected int[] assignDataToClusters(Collection<? extends DataType> data)
Description copied from class: KMeansClusterer
Creates the cluster assignments given the current locations of clusters

Overrides:
assignDataToClusters in class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>>
Parameters:
data - Data to assign
Returns:
Assignments of the data to each of the k-clusters

createClustersFromAssignments

protected void createClustersFromAssignments()
Description copied from class: KMeansClusterer
Creates the set of clusters using the current cluster assignments.

Overrides:
createClustersFromAssignments in class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>>