gov.sandia.cognition.learning.algorithm.ensemble
Class OnlineBaggingCategorizerLearner<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.learning.algorithm.AbstractBatchAndIncrementalLearner<InputOutputPair<? extends InputType,OutputType>,ResultType>
          extended by gov.sandia.cognition.learning.algorithm.AbstractSupervisedBatchAndIncrementalLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>
              extended by gov.sandia.cognition.learning.algorithm.ensemble.OnlineBaggingCategorizerLearner<InputType,CategoryType,MemberType>
Type Parameters:
InputType - The input type for supervised learning. Passed on to the internal learning algorithm. Also the input type for the learned ensemble.
CategoryType - The output type for supervised learning. Passed on to the internal learning algorithm. Also the output type of the learned ensemble.
MemberType - The type of ensemble member created by the base algorithm.
All Implemented Interfaces:
BatchAndIncrementalLearner<InputOutputPair<? extends InputType,CategoryType>,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>, BatchLearner<Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>, IncrementalLearner<InputOutputPair<? extends InputType,CategoryType>,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>, SupervisedBatchAndIncrementalLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>, SupervisedBatchLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>, SupervisedIncrementalLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>, CloneableSerializable, Randomized, Serializable, Cloneable

@PublicationReference(author={"Nikunj C. Oza","Stuart Russell"},
                      title="Online Bagging and Boosting",
                      year=2001,
                      type=Conference,
                      publication="In Artificial Intelligence and Statistics",
                      pages={105,112},
                      url="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.8889")
public class OnlineBaggingCategorizerLearner<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>
extends AbstractSupervisedBatchAndIncrementalLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType>>
implements Randomized

An implementation of an online version of the Bagging algorithm for learning an ensemble of categorizers.

Since:
3.1.1
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
static int DEFAULT_ENSEMBLE_SIZE
          The default ensemble size is 100.
static double DEFAULT_PERCENT_TO_SAMPLE
          The default percent to sample is 1.0 (which represents 100%).
protected  int ensembleSize
          The size of the ensemble to create.
protected  IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner
          The base learner used for each ensemble member.
protected  double percentToSample
          The percentage of the data to sample for each ensemble member.
protected  Random random
          The random number generator to use.
 
Constructor Summary
OnlineBaggingCategorizerLearner()
          Creates a new OnlineBaggingCategorizerLearner with a null learner and default parameters.
OnlineBaggingCategorizerLearner(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner)
          Creates a new OnlineBaggingCategorizerLearner with the given base learner and default parameters.
OnlineBaggingCategorizerLearner(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner, int ensembleSize, double percentToSample, Random random)
          Creates a new OnlineBaggingCategorizerLearner with the given parameters.
 
Method Summary
static
<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>
OnlineBaggingCategorizerLearner<InputType,CategoryType,MemberType>
create(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner, int ensembleSize, double percentToSample, Random random)
          Convenience method for creating an OnlineBaggingCategorizerLearner.
 VotingCategorizerEnsemble<InputType,CategoryType,MemberType> createInitialLearnedObject()
          Creates a new initial learned object, before any data is given.
 int getEnsembleSize()
          Gets the size of the ensemble to create.
 IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> getLearner()
          Gets the incremental (online) learning algorithm to use to learn all of the ensemble members.
 double getPercentToSample()
          Gets the percent of the data to attempt to sample for each ensemble member.
 Random getRandom()
          Gets the random number generator used by this object.
 void setEnsembleSize(int ensembleSize)
          Sets the size of the ensemble to create.
 void setLearner(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner)
          Sets the incremental (online) learning algorithm to use to learn all of the ensemble members.
 void setPercentToSample(double percentToSample)
          Sets the percent of the data to attempt to sample for each ensemble member.
 void setRandom(Random random)
          Sets the random number generator used by this object.
 void update(VotingCategorizerEnsemble<InputType,CategoryType,MemberType> target, InputOutputPair<? extends InputType,CategoryType> data)
          The update method updates an object of ResultType using the given new data of type DataType, using some form of "learning" algorithm.
 void update(VotingCategorizerEnsemble<InputType,CategoryType,MemberType> target, InputType input, CategoryType category)
          The update method updates an object of ResultType using the given a new supervised input-output pair, using some form of "learning" algorithm.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractBatchAndIncrementalLearner
clone, learn, learn, update
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchAndIncrementalLearner
learn
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.IncrementalLearner
update
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Field Detail

DEFAULT_ENSEMBLE_SIZE

public static final int DEFAULT_ENSEMBLE_SIZE
The default ensemble size is 100.

See Also:
Constant Field Values

DEFAULT_PERCENT_TO_SAMPLE

public static final double DEFAULT_PERCENT_TO_SAMPLE
The default percent to sample is 1.0 (which represents 100%).

See Also:
Constant Field Values

learner

protected IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType extends Evaluator<? super InputType,? extends CategoryType>> learner
The base learner used for each ensemble member.


ensembleSize

protected int ensembleSize
The size of the ensemble to create. Must be positive.


percentToSample

protected double percentToSample
The percentage of the data to sample for each ensemble member. Must be positive. Used as a parameter to the Poisson distribution to determine the number of samples for each ensemble member.


random

protected Random random
The random number generator to use.

Constructor Detail

OnlineBaggingCategorizerLearner

public OnlineBaggingCategorizerLearner()
Creates a new OnlineBaggingCategorizerLearner with a null learner and default parameters.


OnlineBaggingCategorizerLearner

public OnlineBaggingCategorizerLearner(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner)
Creates a new OnlineBaggingCategorizerLearner with the given base learner and default parameters.

Parameters:
learner - The base learner to use for each ensemble member.

OnlineBaggingCategorizerLearner

public OnlineBaggingCategorizerLearner(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner,
                                       int ensembleSize,
                                       double percentToSample,
                                       Random random)
Creates a new OnlineBaggingCategorizerLearner with the given parameters.

Parameters:
learner - The base learner to use for each ensemble member.
ensembleSize - The size of the ensemble to create. Must be positive,
percentToSample - The percentage of the data to sample for learning each ensemble member. Must be positive.
random - The random number generator to use.
Method Detail

createInitialLearnedObject

public VotingCategorizerEnsemble<InputType,CategoryType,MemberType> createInitialLearnedObject()
Description copied from interface: IncrementalLearner
Creates a new initial learned object, before any data is given.

Specified by:
createInitialLearnedObject in interface IncrementalLearner<InputOutputPair<? extends InputType,CategoryType>,VotingCategorizerEnsemble<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>>
Returns:
The initial learned object.

update

public void update(VotingCategorizerEnsemble<InputType,CategoryType,MemberType> target,
                   InputType input,
                   CategoryType category)
Description copied from interface: SupervisedIncrementalLearner
The update method updates an object of ResultType using the given a new supervised input-output pair, using some form of "learning" algorithm.

Specified by:
update in interface SupervisedIncrementalLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>>
Parameters:
target - The object to update.
input - The supervised input to learn from.
category - The supervised output to learn from.

update

public void update(VotingCategorizerEnsemble<InputType,CategoryType,MemberType> target,
                   InputOutputPair<? extends InputType,CategoryType> data)
Description copied from interface: IncrementalLearner
The update method updates an object of ResultType using the given new data of type DataType, using some form of "learning" algorithm.

Specified by:
update in interface IncrementalLearner<InputOutputPair<? extends InputType,CategoryType>,VotingCategorizerEnsemble<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>>
Overrides:
update in class AbstractSupervisedBatchAndIncrementalLearner<InputType,CategoryType,VotingCategorizerEnsemble<InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>>>
Parameters:
target - The object to update.
data - The new data for the learning algorithm to use to update the object.

getLearner

public IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> getLearner()
Gets the incremental (online) learning algorithm to use to learn all of the ensemble members.

Returns:
The base learning algorithm.

setLearner

public void setLearner(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner)
Sets the incremental (online) learning algorithm to use to learn all of the ensemble members.

Parameters:
learner - The base learning algorithm.

getEnsembleSize

public int getEnsembleSize()
Gets the size of the ensemble to create. When the ensemble is initially created, it is filled with this many members.

Returns:
The size of the ensemble to create. Must be positive.

setEnsembleSize

public void setEnsembleSize(int ensembleSize)
Sets the size of the ensemble to create. When the ensemble is initially created, it is filled with this many members.

Parameters:
ensembleSize - The size of the ensemble to create. Must be positive.

getPercentToSample

public double getPercentToSample()
Gets the percent of the data to attempt to sample for each ensemble member. Since this is an online algorithm, the expected number of examples that each member is trained on is this amount. However, it does not guarantee that each ensemble member will see exactly this fraction of the data. In the algorithm, this value is used as the parameter to the Poisson distribution to determine how many times to give each member each example.

Returns:
The percentage of the data to sample for each ensemble member. Must be positive.

setPercentToSample

public void setPercentToSample(double percentToSample)
Sets the percent of the data to attempt to sample for each ensemble member. Since this is an online algorithm, the expected number of examples that each member is trained on is this amount. However, it does not guarantee that each ensemble member will see exactly this fraction of the data. In the algorithm, this value is used as the parameter to the Poisson distribution to determine how many times to give each member each example.

Parameters:
percentToSample - The percentage of the data to sample for each ensemble member. Must be positive.

getRandom

public Random getRandom()
Description copied from interface: Randomized
Gets the random number generator used by this object.

Specified by:
getRandom in interface Randomized
Returns:
The random number generator used by this object.

setRandom

public void setRandom(Random random)
Description copied from interface: Randomized
Sets the random number generator used by this object.

Specified by:
setRandom in interface Randomized
Parameters:
random - The random number generator for this object to use.

create

public static <InputType,CategoryType,MemberType extends Evaluator<? super InputType,? extends CategoryType>> OnlineBaggingCategorizerLearner<InputType,CategoryType,MemberType> create(IncrementalLearner<? super InputOutputPair<? extends InputType,CategoryType>,MemberType> learner,
                                                                                                                                                                                        int ensembleSize,
                                                                                                                                                                                        double percentToSample,
                                                                                                                                                                                        Random random)
Convenience method for creating an OnlineBaggingCategorizerLearner.

Type Parameters:
InputType - The input type for supervised learning. Passed on to the internal learning algorithm. Also the input type for the learned ensemble.
CategoryType - The output type for supervised learning. Passed on to the internal learning algorithm. Also the output type of the learned ensemble.
MemberType - The type of ensemble member created by the base algorithm.
Parameters:
learner - The base learner to use for each ensemble member.
ensembleSize - The size of the ensemble to create. Must be positive,
percentToSample - The percentage of the data to sample for learning each ensemble member. Must be positive.
random - The random number generator to use.
Returns:
A new online bagging learner.