gov.sandia.cognition.learning.algorithm.pca
Class KernelPrincipalComponentsAnalysis<DataType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.learning.function.kernel.DefaultKernelContainer<DataType>
          extended by gov.sandia.cognition.learning.algorithm.pca.KernelPrincipalComponentsAnalysis<DataType>
Type Parameters:
DataType - The type of data that the analysis is to be done over. It must match the input type of the kernel function that is given.
All Implemented Interfaces:
BatchLearner<Collection<? extends DataType>,KernelPrincipalComponentsAnalysis.Function<DataType>>, KernelContainer<DataType>, CloneableSerializable, Serializable, Cloneable

@PublicationReferences(references={@PublicationReference(author={"Bernard Scholkopf","Alexander Smola","Klaus-Robert Muller"},title="Nonlinear Component Analysis as a Kernel Eigenvalue Problem",year=1996,type=TechnicalReport,url="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.1366"),@PublicationReference(author={"John  Shawe-Taylor","Nello Christianini"},title="Kernel Methods for Pattern Analysis",year=2004,type=Book,pages={150,153})})
public class KernelPrincipalComponentsAnalysis<DataType>
extends DefaultKernelContainer<DataType>
implements BatchLearner<Collection<? extends DataType>,KernelPrincipalComponentsAnalysis.Function<DataType>>

An implementation of the Kernel Principal Components Analysis (KPCA) algorithm. KPCA generalizes the standard PCA for use with a Mercer kernel. Thus, it can take a kernel function and data and come up with vector principal components for it. This allows a transform to be made for the arbitrary data from the kernel to some vector space. The implementation uses a closed-form solution based on an eigen-decomposition of the (centered) kernel matrix. Doing so does require computing the whole kernel matrix, which means that it is a computationally intensive algorithm, that scales in O(n^2) where n is the size of the data. Thus, this analysis may not scale well to large datasets.

Since:
3.1
Author:
Justin Basilico
See Also:
Serialized Form

Nested Class Summary
static class KernelPrincipalComponentsAnalysis.Function<DataType>
          The resulting transformation function learned by Kernel Principal Components Analysis.
 
Field Summary
protected  boolean centerData
          Whether or not the data should be centered before doing KPCA.
protected  int componentCount
          The number of components to create from the analysis.
static boolean DEFAULT_CENTER_DATA
          The default setting for centering data is true.
static int DEFAULT_COMPONENT_COUNT
          The default number of components to create is 10.
 
Fields inherited from class gov.sandia.cognition.learning.function.kernel.DefaultKernelContainer
kernel
 
Constructor Summary
KernelPrincipalComponentsAnalysis()
          Creates a new Kernel Principal Components Analysis with a null kernel and a default component count.
KernelPrincipalComponentsAnalysis(Kernel<? super DataType> kernel, int componentCount)
          Creates a new Kernel Principal Components Analysis with the given kernel and component count.
KernelPrincipalComponentsAnalysis(Kernel<? super DataType> kernel, int componentCount, boolean centerData)
          Creates a new Kernel Principal Components Analysis with the given kernel and component count.
 
Method Summary
 int getComponentCount()
          Gets the number of components the analysis attempts to find.
 boolean isCenterData()
          Gets whether or not the data needs to be centered in the kernel space before applying the algorithm.
 KernelPrincipalComponentsAnalysis.Function<DataType> learn(Collection<? extends DataType> data)
          The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.
 void setCenterData(boolean centerData)
          Sets whether or not the data needs to be centered in the kernel space before applying the algorithm.
 void setComponentCount(int componentCount)
          Gets the number of components the analysis attempts to find.
 
Methods inherited from class gov.sandia.cognition.learning.function.kernel.DefaultKernelContainer
clone, getKernel, setKernel
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Field Detail

DEFAULT_COMPONENT_COUNT

public static final int DEFAULT_COMPONENT_COUNT
The default number of components to create is 10.

See Also:
Constant Field Values

DEFAULT_CENTER_DATA

public static final boolean DEFAULT_CENTER_DATA
The default setting for centering data is true.

See Also:
Constant Field Values

componentCount

protected int componentCount
The number of components to create from the analysis. Must be positive.


centerData

protected boolean centerData
Whether or not the data should be centered before doing KPCA.

Constructor Detail

KernelPrincipalComponentsAnalysis

public KernelPrincipalComponentsAnalysis()
Creates a new Kernel Principal Components Analysis with a null kernel and a default component count.


KernelPrincipalComponentsAnalysis

public KernelPrincipalComponentsAnalysis(Kernel<? super DataType> kernel,
                                         int componentCount)
Creates a new Kernel Principal Components Analysis with the given kernel and component count. It will perform centering.

Parameters:
kernel - The kernel to use in the analysis.
componentCount - The number of components for the analysis to create. Must be positive.

KernelPrincipalComponentsAnalysis

public KernelPrincipalComponentsAnalysis(Kernel<? super DataType> kernel,
                                         int componentCount,
                                         boolean centerData)
Creates a new Kernel Principal Components Analysis with the given kernel and component count.

Parameters:
kernel - The kernel to use in the analysis.
componentCount - The number of components for the analysis to create. Must be positive.
centerData - True to center the data in the input space before applying the analysis. Only set this to false if the data is pre-centered. If in doubt, set to true.
Method Detail

learn

public KernelPrincipalComponentsAnalysis.Function<DataType> learn(Collection<? extends DataType> data)
Description copied from interface: BatchLearner
The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.

Specified by:
learn in interface BatchLearner<Collection<? extends DataType>,KernelPrincipalComponentsAnalysis.Function<DataType>>
Parameters:
data - The data that the learning algorithm will use to create an object of ResultType.
Returns:
The object that is created based on the given data using the learning algorithm.

getComponentCount

public int getComponentCount()
Gets the number of components the analysis attempts to find. If there are less data points than the number of components, then the number of data points is used instead.

Returns:
The number of components for the analysis. Must be positive.

setComponentCount

public void setComponentCount(int componentCount)
Gets the number of components the analysis attempts to find. If there are less data points than the number of components, then the number of data points is used instead.

Parameters:
componentCount - The number of components for the analysis. Must be positive.

isCenterData

public boolean isCenterData()
Gets whether or not the data needs to be centered in the kernel space before applying the algorithm. Only set this to false if the data has been pre-centered. If in doubt, set it to true.

Returns:
True if the algorithm will apply to the centered version of the input data. False if it will just apply directly to the given data.

setCenterData

public void setCenterData(boolean centerData)
Sets whether or not the data needs to be centered in the kernel space before applying the algorithm. Only set this to false if the data has been pre-centered. If in doubt, set it to true.

Parameters:
centerData - True if the algorithm will apply to the centered version of the input data. False if it will just apply directly to the given data.