gov.sandia.cognition.learning.algorithm.ensemble
Class BaggingCategorizerLearner<InputType,CategoryType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends InputOutputPair<? extends InputType,OutputType>>,ResultType>
                  extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeSupervisedBatchLearner<InputType,OutputType,EnsembleType>
                      extended by gov.sandia.cognition.learning.algorithm.ensemble.AbstractBaggingLearner<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
                          extended by gov.sandia.cognition.learning.algorithm.ensemble.BaggingCategorizerLearner<InputType,CategoryType>
Type Parameters:
InputType - The input type for supervised learning. Passed on to the internal learning algorithm. Also the input type for the learned ensemble.
CategoryType - The output type for supervised learning. Passed on to the internal learning algorithm. Also the output type of the learned ensemble.
All Implemented Interfaces:
AnytimeAlgorithm<WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>, IterativeAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>, BatchLearner<Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>, BatchLearnerContainer<BatchLearner<? super Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>>>, SupervisedBatchLearner<InputType,CategoryType,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>, CloneableSerializable, Randomized, Serializable, Cloneable
Direct Known Subclasses:
CategoryBalancedBaggingLearner

@PublicationReference(title="Bagging Predictors",
                      author="Leo Breiman",
                      year=1996,
                      type=Journal,
                      publication="Machine Learning",
                      pages={123,140},
                      url="http://www.springerlink.com/index/L4780124W2874025.pdf")
public class BaggingCategorizerLearner<InputType,CategoryType>
extends AbstractBaggingLearner<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>

Learns an categorization ensemble by randomly sampling with replacement (duplicates allowed) some percentage of the size of the data (defaults to 100%) on each iteration to train a new ensemble member. The random sample is referred to as a bag. Each learned ensemble member is given equal weight. The idea here is that randomly sampling from the data and learning a categorizer that has high variance (such as a decision tree) with respect to the input data, one can improve the performance of that By default, the algorithm runs the maxIterations number of steps to create that number of ensemble members. However, one can also use out-of-bag (OOB) error on each iteration to determine a stopping criteria. The OOB error is determined by looking at the performance of the categorizer on the examples that it has not seen.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.ensemble.AbstractBaggingLearner
bag, dataInBag, dataList, DEFAULT_MAX_ITERATIONS, DEFAULT_PERCENT_TO_SAMPLE, ensemble, learner, percentToSample, random
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
BaggingCategorizerLearner()
          Creates a new instance of BaggingCategorizerLearner.
BaggingCategorizerLearner(BatchLearner<? super Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner)
          Creates a new instance of BaggingCategorizerLearner.
BaggingCategorizerLearner(BatchLearner<? super Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner, int maxIterations, double percentToSample, Random random)
          Creates a new instance of BaggingCategorizerLearner.
 
Method Summary
protected  void addEnsembleMember(Evaluator<? super InputType,? extends CategoryType> member)
          Adds a new member to the ensemble.
protected  WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>> createInitialEnsemble()
          Create the initial, empty ensemble for the algorithm to use.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.ensemble.AbstractBaggingLearner
cleanupAlgorithm, fillBag, getBag, getDataInBag, getDataList, getEnsemble, getLearner, getPercentToSample, getRandom, getResult, initializeAlgorithm, setBag, setDataInBag, setDataList, setEnsemble, setLearner, setPercentToSample, setRandom, step
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
clone, getData, getKeepGoing, learn, setData, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Constructor Detail

BaggingCategorizerLearner

public BaggingCategorizerLearner()
Creates a new instance of BaggingCategorizerLearner.


BaggingCategorizerLearner

public BaggingCategorizerLearner(BatchLearner<? super Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner)
Creates a new instance of BaggingCategorizerLearner.

Parameters:
learner - The learner to use to create the categorizer on each iteration.

BaggingCategorizerLearner

public BaggingCategorizerLearner(BatchLearner<? super Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner,
                                 int maxIterations,
                                 double percentToSample,
                                 Random random)
Creates a new instance of BaggingCategorizerLearner.

Parameters:
learner - The learner to use to create the categorizer on each iteration.
maxIterations - The maximum number of iterations to run for, which is also the number of learners to create.
percentToSample - The percentage of the total size of the data to sample on each iteration. Must be positive.
random - The random number generator to use.
Method Detail

createInitialEnsemble

protected WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>> createInitialEnsemble()
Description copied from class: AbstractBaggingLearner
Create the initial, empty ensemble for the algorithm to use.

Specified by:
createInitialEnsemble in class AbstractBaggingLearner<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
Returns:
A new ensemble for the algorithm to use.

addEnsembleMember

protected void addEnsembleMember(Evaluator<? super InputType,? extends CategoryType> member)
Description copied from class: AbstractBaggingLearner
Adds a new member to the ensemble.

Specified by:
addEnsembleMember in class AbstractBaggingLearner<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
Parameters:
member - The new member to add to the ensemble.