gov.sandia.cognition.learning.algorithm.clustering
Class AffinityPropagation<DataType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>
                  extended by gov.sandia.cognition.learning.algorithm.clustering.AffinityPropagation<DataType>
Type Parameters:
DataType - The type of data the algorithm is to cluster, which it passes to the divergence function. For example, this could be Vector or String.
All Implemented Interfaces:
AnytimeAlgorithm<Collection<CentroidCluster<DataType>>>, IterativeAlgorithm, MeasurablePerformanceAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>, BatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>, BatchClusterer<DataType,CentroidCluster<DataType>>, DivergenceFunctionContainer<DataType,DataType>, CloneableSerializable, Serializable, Cloneable

@CodeReview(reviewer="Kevin R. Dixon",
            date="2008-07-22",
            changesNeeded=false,
            comments={"Removed transient declaration on members.","Fixed a few typos in javadoc.","Added PublicationReference annotation.","Added comment about use of direct-member access.","Code generally looked fine."})
@PublicationReference(author={"Brendan J. Frey","Delbert Dueck"},
                      title="Clustering by Passing Messages Between Data Points.",
                      type=Journal,
                      publication="Science",
                      notes="Volume 315, number 5814",
                      pages={972,976},
                      year=2007)
public class AffinityPropagation<DataType>
extends AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>
implements BatchClusterer<DataType,CentroidCluster<DataType>>, MeasurablePerformanceAlgorithm, DivergenceFunctionContainer<DataType,DataType>

The AffinityPropagation algorithm requires three parameters: a divergence function, a value to use for self-divergence, and a damping factor (called lambda in the paper; 0.5 is the default). It clusters by passing messages between each point to determine the best exemplar for the point.

This implementation takes a divergence function instead of a similarity function and sets the similarity value to the negative of the divergence value, as described in the paper for Euclidean distance.

The self-divergence value is what controls how many clusters are generated. Typically this value is set to the mean or median of all the divergence values or the maximum divergence. In general, a smaller value will mean more clusters and a larger value will mean less clusters. In the paper this is called self-similarity (s(k,k)) but since this implementation uses a divergence metric, we use self-divergence instead.

Since:
2.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
protected  int[] assignments
          The assignments of each example to an exemplar (cluster).
protected  double[][] availabilities
          The array of example-example availabilities.
protected  int changedCount
          The number of examples that have changed assignments in the last iteration.
protected  HashMap<Integer,CentroidCluster<DataType>> clusters
          The clusters that have been found so far.
protected  double dampingFactor
          The damping factor (lambda).
static double DEFAULT_DAMPING_FACTOR
          The default damping factor (lambda) is 0.5.
static int DEFAULT_MAX_ITERATIONS
          The default maximum number of iterations is 100.
static double DEFAULT_SELF_DIVERGENCE
          The default self similarity is 0.0.
protected  DivergenceFunction<? super DataType,? super DataType> divergence
          The divergence function to use.
protected  int exampleCount
          The number of examples.
protected  ArrayList<DataType> examples
          The examples.
protected  double oneMinusDampingFactor
          The cached value of one minus the damping factor.
protected  double[][] responsibilities
          The array of example-example responsibilities.
protected  double[][] similarities
          The array of example-example similarities.
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
AffinityPropagation()
          Creates a new instance of AffinityPropagation.
AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence, double selfDivergence)
          Creates a new instance of AffinityPropagation.
AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence, double selfDivergence, double dampingFactor)
          Creates a new instance of AffinityPropagation.
AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence, double selfDivergence, double dampingFactor, int maxIterations)
          Creates a new instance of AffinityPropagation.
 
Method Summary
protected  void assignCluster(int i, int newAssignment)
          Assigns example "i" to the new cluster index.
protected  void cleanupAlgorithm()
          Called to clean up the learning algorithm's state after learning has finished.
 AffinityPropagation<DataType> clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
protected  int[] getAssignments()
          Gets the assignments of examples to exemplars (clusters).
protected  double[][] getAvailabilities()
          Gets the availability values.
 int getChangedCount()
          Gets the number of cluster assignments that have changed in the most recent iteration.
protected  HashMap<Integer,CentroidCluster<DataType>> getClusters()
          Gets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.
 double getDampingFactor()
          Gets the damping factor.
 DivergenceFunction<? super DataType,? super DataType> getDivergence()
          Gets the divergence function used by the algorithm.
 DivergenceFunction<? super DataType,? super DataType> getDivergenceFunction()
          Gets the divergence function used by this object.
protected  ArrayList<DataType> getExamples()
          Gets the array list of examples to cluster.
 NamedValue<Integer> getPerformance()
          Gets the performance, which is the number changed on the last iteration.
protected  double[][] getResponsibilities()
          Gets the responsibility values.
 ArrayList<CentroidCluster<DataType>> getResult()
          Gets the current result of the algorithm.
 double getSelfDivergence()
          Gets the value used for self-divergence, which controls how many clusters are generated.
protected  double[][] getSimilarities()
          Gets the array of similarities.
protected  boolean initializeAlgorithm()
          Called to initialize the learning algorithm's state based on the data that is stored in the data field.
protected  void setAssignments(int[] assignments)
          Sets the assignments of examples to exemplars (clusters).
protected  void setAvailabilities(double[][] availabilities)
          Sets the availability values.
protected  void setChangedCount(int changedCount)
          Sets the number of cluster assignments that have changed in the most recent iteration.
protected  void setClusters(HashMap<Integer,CentroidCluster<DataType>> clusters)
          Sets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.
 void setDampingFactor(double dampingFactor)
          Sets the damping factor.
 void setDivergence(DivergenceFunction<? super DataType,? super DataType> divergence)
          Sets the divergence function used by the algorithm.
protected  void setExamples(ArrayList<DataType> examples)
          Sets the array list of examples to cluster.
protected  void setResponsibilities(double[][] responsibilities)
          Sets the responsibility values.
 void setSelfDivergence(double selfDivergence)
          Sets the value used for self-divergence, which controls how many clusters are generated.
protected  void setSimilarities(double[][] similarities)
          Sets the array of similarities.
protected  boolean step()
          Called to take a single step of the learning algorithm.
protected  void updateAssignments()
          Updates the assignments of all the examples to their exemplars (clusters) using the current availability and responsibility values.
protected  void updateAvailabilities()
          Updates the availabilities matrix based on the current responsibility values.
protected  void updateResponsibilities()
          Updates the responsibilities matrix using the similarity values and the current availability values.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
getData, getKeepGoing, learn, setData, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Field Detail

DEFAULT_MAX_ITERATIONS

public static final int DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 100.

See Also:
Constant Field Values

DEFAULT_SELF_DIVERGENCE

public static final double DEFAULT_SELF_DIVERGENCE
The default self similarity is 0.0.

See Also:
Constant Field Values

DEFAULT_DAMPING_FACTOR

public static final double DEFAULT_DAMPING_FACTOR
The default damping factor (lambda) is 0.5.

See Also:
Constant Field Values

divergence

protected DivergenceFunction<? super DataType,? super DataType> divergence
The divergence function to use.


dampingFactor

protected double dampingFactor
The damping factor (lambda). It must be between 0.0 and 1.0.


oneMinusDampingFactor

protected double oneMinusDampingFactor
The cached value of one minus the damping factor.


exampleCount

protected transient int exampleCount
The number of examples.


examples

protected ArrayList<DataType> examples
The examples.


similarities

protected double[][] similarities
The array of example-example similarities.


responsibilities

protected double[][] responsibilities
The array of example-example responsibilities.


availabilities

protected double[][] availabilities
The array of example-example availabilities.


assignments

protected int[] assignments
The assignments of each example to an exemplar (cluster).


changedCount

protected int changedCount
The number of examples that have changed assignments in the last iteration.


clusters

protected HashMap<Integer,CentroidCluster<DataType>> clusters
The clusters that have been found so far. It is a sparse mapping since we expect there to be few clusters.

Constructor Detail

AffinityPropagation

public AffinityPropagation()
Creates a new instance of AffinityPropagation.


AffinityPropagation

public AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence,
                           double selfDivergence)
Creates a new instance of AffinityPropagation.

Parameters:
divergence - The divergence function to use to determine the divergence between two examples.
selfDivergence - The value for self-divergence to use, which controls the number of clusters created.

AffinityPropagation

public AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence,
                           double selfDivergence,
                           double dampingFactor)
Creates a new instance of AffinityPropagation.

Parameters:
divergence - The divergence function to use to determine the divergence between two examples.
selfDivergence - The value for self-divergence to use, which controls the number of clusters created.
dampingFactor - The damping factor (lambda). Must be between 0.0 and 1.0.

AffinityPropagation

public AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence,
                           double selfDivergence,
                           double dampingFactor,
                           int maxIterations)
Creates a new instance of AffinityPropagation.

Parameters:
divergence - The divergence function to use to determine the divergence between two examples.
selfDivergence - The value for self-divergence to use, which controls the number of clusters created.
dampingFactor - The damping factor (lambda). Must be between 0.0 and 1.0.
maxIterations - The maximum number of iterations.
Method Detail

clone

public AffinityPropagation<DataType> clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>
Returns:
A clone of this object.

initializeAlgorithm

protected boolean initializeAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to initialize the learning algorithm's state based on the data that is stored in the data field. The return value indicates if the algorithm can be run or not based on the initialization.

Specified by:
initializeAlgorithm in class AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>
Returns:
True if the learning algorithm can be run and false if it cannot.

step

protected boolean step()
Description copied from class: AbstractAnytimeBatchLearner
Called to take a single step of the learning algorithm.

Specified by:
step in class AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>
Returns:
True if another step can be taken and false it the algorithm should halt.

updateResponsibilities

protected void updateResponsibilities()
Updates the responsibilities matrix using the similarity values and the current availability values.


updateAvailabilities

protected void updateAvailabilities()
Updates the availabilities matrix based on the current responsibility values.


updateAssignments

protected void updateAssignments()
Updates the assignments of all the examples to their exemplars (clusters) using the current availability and responsibility values. An example i is assigned to the cluster k that maximizes the sum a(i,k) + r(i,k).


assignCluster

protected void assignCluster(int i,
                             int newAssignment)
Assigns example "i" to the new cluster index. If the assignment for the example has changed, the method updates all of the information about cluster membership for this new assignment.

Parameters:
i - The index of the example to assign to the cluster.
newAssignment - The new assignment for "i".

cleanupAlgorithm

protected void cleanupAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to clean up the learning algorithm's state after learning has finished.

Specified by:
cleanupAlgorithm in class AbstractAnytimeBatchLearner<Collection<? extends DataType>,Collection<CentroidCluster<DataType>>>

getResult

public ArrayList<CentroidCluster<DataType>> getResult()
Description copied from interface: AnytimeAlgorithm
Gets the current result of the algorithm.

Specified by:
getResult in interface AnytimeAlgorithm<Collection<CentroidCluster<DataType>>>
Returns:
Current result of the algorithm.

getDivergence

public DivergenceFunction<? super DataType,? super DataType> getDivergence()
Gets the divergence function used by the algorithm.

Returns:
The divergence function.

setDivergence

public void setDivergence(DivergenceFunction<? super DataType,? super DataType> divergence)
Sets the divergence function used by the algorithm.

Parameters:
divergence - The divergence function.

getSelfDivergence

public double getSelfDivergence()
Gets the value used for self-divergence, which controls how many clusters are generated. A smaller value usually means more clusters and a larger value means less. The range of the value should be within the range of divergence values generated by the divergence function

Returns:
The value for self-divergence.

setSelfDivergence

public void setSelfDivergence(double selfDivergence)
Sets the value used for self-divergence, which controls how many clusters are generated. A smaller value usually means more clusters and a larger value means less. The range of the value should be within the range of divergence values generated by the divergence function.

Parameters:
selfDivergence - The value for self-divergence.

getDampingFactor

public double getDampingFactor()
Gets the damping factor.

Returns:
The damping factor.

setDampingFactor

public void setDampingFactor(double dampingFactor)
Sets the damping factor.

Parameters:
dampingFactor - The damping factor. Must be between 0.0 and 1.0.

getExamples

protected ArrayList<DataType> getExamples()
Gets the array list of examples to cluster.

Returns:
The array list of examples to cluster.

setExamples

protected void setExamples(ArrayList<DataType> examples)
Sets the array list of examples to cluster.

Parameters:
examples - The array list of examples to cluster.

getSimilarities

protected double[][] getSimilarities()
Gets the array of similarities.

Returns:
The array of similarities.

setSimilarities

protected void setSimilarities(double[][] similarities)
Sets the array of similarities.

Parameters:
similarities - The array of similarities.

getResponsibilities

protected double[][] getResponsibilities()
Gets the responsibility values.

Returns:
The responsibilities.

setResponsibilities

protected void setResponsibilities(double[][] responsibilities)
Sets the responsibility values.

Parameters:
responsibilities - The responsibilities.

getAvailabilities

protected double[][] getAvailabilities()
Gets the availability values.

Returns:
The availabilities.

setAvailabilities

protected void setAvailabilities(double[][] availabilities)
Sets the availability values.

Parameters:
availabilities - The availabilities.

getAssignments

protected int[] getAssignments()
Gets the assignments of examples to exemplars (clusters).

Returns:
The assignments of examples to exemplars (clusters).

setAssignments

protected void setAssignments(int[] assignments)
Sets the assignments of examples to exemplars (clusters).

Parameters:
assignments - The assignments of examples to exemplars (clusters).

getChangedCount

public int getChangedCount()
Gets the number of cluster assignments that have changed in the most recent iteration.

Returns:
The number of changed cluster assignments.

setChangedCount

protected void setChangedCount(int changedCount)
Sets the number of cluster assignments that have changed in the most recent iteration.

Parameters:
changedCount - The number of changed cluster assignments.

getClusters

protected HashMap<Integer,CentroidCluster<DataType>> getClusters()
Gets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.

Returns:
The current clusters.

setClusters

protected void setClusters(HashMap<Integer,CentroidCluster<DataType>> clusters)
Sets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.

Parameters:
clusters - The current clusters.

getDivergenceFunction

public DivergenceFunction<? super DataType,? super DataType> getDivergenceFunction()
Description copied from interface: DivergenceFunctionContainer
Gets the divergence function used by this object.

Specified by:
getDivergenceFunction in interface DivergenceFunctionContainer<DataType,DataType>
Returns:
The divergence function.

getPerformance

public NamedValue<Integer> getPerformance()
Gets the performance, which is the number changed on the last iteration.

Specified by:
getPerformance in interface MeasurablePerformanceAlgorithm
Returns:
The performance of the algorithm.