gov.sandia.cognition.learning.algorithm.clustering
Class KMeansFactory

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.util.AbstractRandomized
          extended by gov.sandia.cognition.learning.algorithm.clustering.KMeansFactory
All Implemented Interfaces:
Factory<ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>>>, CloneableSerializable, Randomized, Serializable, Cloneable

@CodeReview(reviewer="Justin Basilico",
            date="2009-06-29",
            changesNeeded=false,
            comments="Changed it to not have a default random, otherwise its good.")
public class KMeansFactory
extends AbstractRandomized
implements Factory<ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>>>

Creates a parallelized version of the k-means clustering algorithm for the typical use: clustering vector data with a Euclidean distance metric.

Since:
3.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Field Summary
static int DEFAULT_NUM_CLUSTERS
          The default number of clusters is 10.
 
Fields inherited from class gov.sandia.cognition.util.AbstractRandomized
random
 
Constructor Summary
KMeansFactory()
          Creates a new instance of KMeansFactory
KMeansFactory(int numClusters)
          Creates a new instance of KMeansFactory.
KMeansFactory(int numClusters, Random random)
          Creates a new instance of KMeansFactory.
 
Method Summary
 ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>> create()
          Creates a new instance of an object.
static ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>> create(int numClusters, Random random)
          Creates a new parallelized k-means clustering algorithm for vector data with the given number of clusters (k) and random number generator.
static ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>> create(int numClusters, Semimetric<? super Vector> distanceMetric, Random random)
          Creates a new parallelized k-means clustering algorithm for vector data with the given number of clusters (k), distance metric, and random number generator.
 int getNumClusters()
          Gets the number of clusters for the algorithm to attempt to create.
 void setNumClusters(int numClusters)
          Sets the number of clusters for the algorithm to attempt to create.
 
Methods inherited from class gov.sandia.cognition.util.AbstractRandomized
clone, getRandom, setRandom
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_NUM_CLUSTERS

public static final int DEFAULT_NUM_CLUSTERS
The default number of clusters is 10.

See Also:
Constant Field Values
Constructor Detail

KMeansFactory

public KMeansFactory()
Creates a new instance of KMeansFactory


KMeansFactory

public KMeansFactory(int numClusters)
Creates a new instance of KMeansFactory.

Parameters:
numClusters - The number of clusters to use.

KMeansFactory

public KMeansFactory(int numClusters,
                     Random random)
Creates a new instance of KMeansFactory.

Parameters:
numClusters - The number of clusters to use.
random - The random number generator.
Method Detail

create

public static ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>> create(int numClusters,
                                                                                 Random random)
Creates a new parallelized k-means clustering algorithm for vector data with the given number of clusters (k) and random number generator.

Parameters:
numClusters - Number of clusters to use (k).
random - Random number generator. Used in the cluster initialization.
Returns:
k-means clustering algorithm for Vector-based clustering.

create

public static ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>> create(int numClusters,
                                                                                 Semimetric<? super Vector> distanceMetric,
                                                                                 Random random)
Creates a new parallelized k-means clustering algorithm for vector data with the given number of clusters (k), distance metric, and random number generator.

Parameters:
numClusters - Number of clusters to use (k).
distanceMetric - The distance metric for vectors to use.
random - Random number generator. Used in the cluster initialization.
Returns:
k-means clustering algorithm for Vector-based clustering.

create

public ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>> create()
Description copied from interface: Factory
Creates a new instance of an object.

Specified by:
create in interface Factory<ParallelizedKMeansClusterer<Vector,CentroidCluster<Vector>>>
Returns:
A newly created object.

getNumClusters

public int getNumClusters()
Gets the number of clusters for the algorithm to attempt to create.

Returns:
The number of clusters.

setNumClusters

public void setNumClusters(int numClusters)
Sets the number of clusters for the algorithm to attempt to create. Must be positive.

Parameters:
numClusters - The number of clusters for the algorithm to attempt to create.