gov.sandia.cognition.learning.algorithm.clustering
Class KMeansClustererWithRemoval<DataType,ClusterType extends Cluster<DataType>>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<ClusterType>>
                  extended by gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer<DataType,ClusterType>
                      extended by gov.sandia.cognition.learning.algorithm.clustering.KMeansClustererWithRemoval<DataType,ClusterType>
Type Parameters:
DataType - The type of the data to cluster. This is typically defined by the divergence function used.
ClusterType - The type of Cluster created by the algorithm. This is typically defined by the cluster creator function used.
All Implemented Interfaces:
AnytimeAlgorithm<Collection<ClusterType>>, IterativeAlgorithm, MeasurablePerformanceAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends DataType>,Collection<ClusterType>>, BatchLearner<Collection<? extends DataType>,Collection<ClusterType>>, BatchClusterer<DataType,ClusterType>, DivergenceFunctionContainer<ClusterType,DataType>, CloneableSerializable, Serializable, Cloneable

@CodeReview(reviewer="Kevin R. Dixon",
            date="2008-07-22",
            changesNeeded=false,
            comments={"Made setRemovalThreshold check to ensure removalThreshold is < 1.0","Cleaned up javadoc.","Code generally looks fine."})
public class KMeansClustererWithRemoval<DataType,ClusterType extends Cluster<DataType>>
extends KMeansClusterer<DataType,ClusterType>

Creates a k-means clustering algorithm that removes clusters that do not have sufficient membership to pass a simple statistical significance test.

Since:
1.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Field Summary
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer
assignments, clusterCounts, clusters, creator, DEFAULT_MAX_ITERATIONS, DEFAULT_NUM_REQUESTED_CLUSTERS, divergenceFunction, initializer, numRequestedClusters
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
KMeansClustererWithRemoval()
          Default constructor
KMeansClustererWithRemoval(int numRequestedClusters, int maxIterations, FixedClusterInitializer<ClusterType,DataType> initializer, ClusterDivergenceFunction<ClusterType,DataType> divergenceFunction, ClusterCreator<ClusterType,DataType> creator, double removalThreshold)
          Creates a new instance of KMeansClusterer using the given parameters.
 
Method Summary
 double getRemovalThreshold()
          Getter for removalThreshold
protected  void removeCluster(int clusterIndex)
          Removes the cluster at the specified index, and does the internal bookkeeping as well
 void setRemovalThreshold(double removalThreshold)
          Setter for removalThreshold
protected  boolean step()
          Do a step of the clustering algorithm.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.clustering.KMeansClusterer
assignDataFromIndices, assignDataToClusters, cleanupAlgorithm, clone, createClustersFromAssignments, getAssignments, getClosestClusterIndex, getCluster, getClusterCounts, getClusters, getCreator, getDivergenceFunction, getInitializer, getNumChanged, getNumClusters, getNumElements, getNumRequestedClusters, getPerformance, getResult, initializeAlgorithm, setAssignment, setClusters, setCreator, setData, setDivergenceFunction, setInitializer, setNumChanged, setNumRequestedClusters
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
getData, getKeepGoing, learn, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Constructor Detail

KMeansClustererWithRemoval

public KMeansClustererWithRemoval()
Default constructor


KMeansClustererWithRemoval

public KMeansClustererWithRemoval(int numRequestedClusters,
                                  int maxIterations,
                                  FixedClusterInitializer<ClusterType,DataType> initializer,
                                  ClusterDivergenceFunction<ClusterType,DataType> divergenceFunction,
                                  ClusterCreator<ClusterType,DataType> creator,
                                  double removalThreshold)
Creates a new instance of KMeansClusterer using the given parameters.

Parameters:
numRequestedClusters - The number of clusters requested (k).
maxIterations - Number of iterations before stopping
initializer - The initializer for the clusters.
divergenceFunction - The divergence function.
creator - The cluster creator.
removalThreshold - fraction of the expected number of data points assigned to a cluster below which the cluster will be removed. (Suppose there are 1000 datapoint, 10 clusters, and removalThreshold=0.1. A cluster may be removed only if is has membership less than 0.1*1000/10= 10 elements assigned to it.)
Method Detail

getRemovalThreshold

public double getRemovalThreshold()
Getter for removalThreshold

Returns:
fraction of the expected number of data points assigned to a cluster below which the cluster will be removed. (Suppose there are 1000 datapoint, 10 clusters, and removalThreshold=0.1. A cluster may be removed only if is has membership less than 0.1*1000/10= 10 elements assigned to it.)

setRemovalThreshold

public void setRemovalThreshold(double removalThreshold)
Setter for removalThreshold

Parameters:
removalThreshold - fraction of the expected number of data points assigned to a cluster below which the cluster will be removed. (Suppose there are 1000 datapoint, 10 clusters, and removalThreshold=0.1. A cluster may be removed only if is has membership less than 0.1*1000/10= 10 elements assigned to it.) Must be less than 1.0.

removeCluster

protected void removeCluster(int clusterIndex)
Removes the cluster at the specified index, and does the internal bookkeeping as well

Parameters:
clusterIndex - zero-based cluster index to remove

step

protected boolean step()
Description copied from class: KMeansClusterer
Do a step of the clustering algorithm. Return the number of elements the changed their cluster membership. If this is zero then the clustering is complete.

Overrides:
step in class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>>
Returns:
true means keep going, false means stop clustering.