gov.sandia.cognition.learning.algorithm.clustering.initializer
Class DistanceSamplingClusterInitializer<ClusterType extends Cluster<DataType>,DataType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.learning.function.distance.DefaultDivergenceFunctionContainer<DataType,DataType>
          extended by gov.sandia.cognition.learning.algorithm.clustering.initializer.AbstractMinDistanceFixedClusterInitializer<ClusterType,DataType>
              extended by gov.sandia.cognition.learning.algorithm.clustering.initializer.DistanceSamplingClusterInitializer<ClusterType,DataType>
Type Parameters:
ClusterType - Type of Cluster<DataType> used in theaceous learn() method.
DataType - The algorithm operates on a Collection<DataType>, so DataType will be something like Vector or String.
All Implemented Interfaces:
FixedClusterInitializer<ClusterType,DataType>, DivergenceFunctionContainer<DataType,DataType>, CloneableSerializable, Randomized, Serializable, Cloneable

@PublicationReference(author={"David Arthur","Sergei Vassilvitskii"},
                      title="k-means++: the advantages of careful seeding",
                      year=2007,
                      type=Conference,
                      publication="Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete algorithms (SODA)",
                      url="http://portal.acm.org/citation.cfm?id=1283383.1283494")
public class DistanceSamplingClusterInitializer<ClusterType extends Cluster<DataType>,DataType>
extends AbstractMinDistanceFixedClusterInitializer<ClusterType,DataType>

Implements FixedClusterInitializer that initializes clusters by first selecting a random point for the first cluster and then randomly sampling each successive cluster based on the squared minimum distance from the point to the existing selected clusters. This is also known as the K-means++ initialization algorithm.

Since:
3.1
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.clustering.initializer.AbstractMinDistanceFixedClusterInitializer
creator, random
 
Fields inherited from class gov.sandia.cognition.learning.function.distance.DefaultDivergenceFunctionContainer
divergenceFunction
 
Constructor Summary
DistanceSamplingClusterInitializer()
          Creates a new, empty instance of MinDistanceSamplingClusterInitializer.
DistanceSamplingClusterInitializer(DivergenceFunction<? super DataType,? super DataType> divergenceFunction, ClusterCreator<ClusterType,DataType> creator, Random random)
          Creates a new instance of MinDistanceSamplingClusterInitializer.
 
Method Summary
 DistanceSamplingClusterInitializer<ClusterType,DataType> clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
protected  int selectNextClusterIndex(double[] minDistances, boolean[] selected)
          Select the index for the next cluster based on the given minimum distances and array indicating which clusters have already been selected.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.clustering.initializer.AbstractMinDistanceFixedClusterInitializer
getCreator, getRandom, initializeClusters, setCreator, setRandom
 
Methods inherited from class gov.sandia.cognition.learning.function.distance.DefaultDivergenceFunctionContainer
getDivergenceFunction, setDivergenceFunction
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistanceSamplingClusterInitializer

public DistanceSamplingClusterInitializer()
Creates a new, empty instance of MinDistanceSamplingClusterInitializer.


DistanceSamplingClusterInitializer

public DistanceSamplingClusterInitializer(DivergenceFunction<? super DataType,? super DataType> divergenceFunction,
                                          ClusterCreator<ClusterType,DataType> creator,
                                          Random random)
Creates a new instance of MinDistanceSamplingClusterInitializer.

Parameters:
divergenceFunction - The divergence function to use.
creator - The cluster creator to use.
random - The random number generator to use.
Method Detail

clone

public DistanceSamplingClusterInitializer<ClusterType,DataType> clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AbstractMinDistanceFixedClusterInitializer<ClusterType extends Cluster<DataType>,DataType>
Returns:
A clone of this object.

selectNextClusterIndex

protected int selectNextClusterIndex(double[] minDistances,
                                     boolean[] selected)
Description copied from class: AbstractMinDistanceFixedClusterInitializer
Select the index for the next cluster based on the given minimum distances and array indicating which clusters have already been selected.

Specified by:
selectNextClusterIndex in class AbstractMinDistanceFixedClusterInitializer<ClusterType extends Cluster<DataType>,DataType>
Parameters:
minDistances - The array of minimum distances.
selected - The array corresponding to whether or not an item has already been selected.
Returns:
The index of the next cluster to include. -1 means that there is nothing left to include.