gov.sandia.cognition.learning.algorithm.tree
Class VectorThresholdVarianceLearner

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.learning.algorithm.tree.VectorThresholdVarianceLearner
All Implemented Interfaces:
BatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Double>>,VectorElementThresholdCategorizer>, DeciderLearner<Vectorizable,Double,Boolean,VectorElementThresholdCategorizer>, CloneableSerializable, Serializable, Cloneable

public class VectorThresholdVarianceLearner
extends AbstractCloneableSerializable
implements DeciderLearner<Vectorizable,Double,Boolean,VectorElementThresholdCategorizer>

The VectorThresholdVarianceLearner computes the best threshold over a dataset of vectors using the reduction in variance to determine the optimal index and threshold. This is an implementation of what is used in the CART regression tree algorithm.

Since:
2.0
Author:
Justin Basilico
See Also:
Serialized Form

Constructor Summary
VectorThresholdVarianceLearner()
          Creates a new instance of VectorThresholdVarianceLearner.
 
Method Summary
 DefaultPair<Double,Double> computeBestGainThreshold(Collection<? extends InputOutputPair<? extends Vectorizable,Double>> data, int dimension, double baseVariance)
          Computes the best information gain-threshold pair for the given dimension on the given data.
protected  int getDimensionality(Collection<? extends InputOutputPair<? extends Vectorizable,?>> data)
          Figures out the dimensionality of the Vector data.
 VectorElementThresholdCategorizer learn(Collection<? extends InputOutputPair<? extends Vectorizable,Double>> data)
          Learns a VectorElementThresholdCategorizer from the given data by picking the vector element and threshold that best maximizes information gain.
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Constructor Detail

VectorThresholdVarianceLearner

public VectorThresholdVarianceLearner()
Creates a new instance of VectorThresholdVarianceLearner.

Method Detail

learn

public VectorElementThresholdCategorizer learn(Collection<? extends InputOutputPair<? extends Vectorizable,Double>> data)
Learns a VectorElementThresholdCategorizer from the given data by picking the vector element and threshold that best maximizes information gain.

Specified by:
learn in interface BatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Double>>,VectorElementThresholdCategorizer>
Parameters:
data - The data to learn from.
Returns:
The learned threshold categorizer, or none if there is no good categorizer.

getDimensionality

protected int getDimensionality(Collection<? extends InputOutputPair<? extends Vectorizable,?>> data)
Figures out the dimensionality of the Vector data.

Parameters:
data - The data.
Returns:
The dimensionality of the data in the vector.

computeBestGainThreshold

public DefaultPair<Double,Double> computeBestGainThreshold(Collection<? extends InputOutputPair<? extends Vectorizable,Double>> data,
                                                           int dimension,
                                                           double baseVariance)
Computes the best information gain-threshold pair for the given dimension on the given data. It does this by sorting the data according to the dimension and then walking the sorted values to find the one that has the best threshold.

Parameters:
data - The data to use.
dimension - The dimension to compute the best threshold over.
baseVariance - The variance of the data.
Returns:
The pair containing the best information gain found along this dimension and the corresponding threshold.