gov.sandia.cognition.learning.algorithm.svm
Class PrimalEstimatedSubGradient

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
          extended by gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
              extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<Collection<? extends InputOutputPair<? extends InputType,OutputType>>,ResultType>
                  extended by gov.sandia.cognition.learning.algorithm.AbstractAnytimeSupervisedBatchLearner<Vectorizable,Boolean,LinearBinaryCategorizer>
                      extended by gov.sandia.cognition.learning.algorithm.svm.PrimalEstimatedSubGradient
All Implemented Interfaces:
AnytimeAlgorithm<LinearBinaryCategorizer>, IterativeAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Boolean>>,LinearBinaryCategorizer>, BatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Boolean>>,LinearBinaryCategorizer>, SupervisedBatchLearner<Vectorizable,Boolean,LinearBinaryCategorizer>, CloneableSerializable, Randomized, Serializable, Cloneable

@PublicationReference(author={"Shai Shalev-Shwartz","Yoram Singer","Nathan Srebro"},
                      title="Pegasos: Primal Estimated sub-GrAdient SOlver for SVM",
                      year=2007,
                      type=Conference,
                      publication="Proceedings of the 24th International Conference on Machine Learning",
                      url="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.74.8513")
public class PrimalEstimatedSubGradient
extends AbstractAnytimeSupervisedBatchLearner<Vectorizable,Boolean,LinearBinaryCategorizer>
implements Randomized

An implementation of the Primal Estimated Sub-Gradient Solver (PEGASOS) algorithm for learning a linear support vector machine (SVM).

Since:
3.1
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
protected  ArrayList<? extends InputOutputPair<? extends Vectorizable,Boolean>> dataList
          The data represented as a list.
protected  int dataSampleSize
          The minimum of the sample size and the data size.
protected  int dataSize
          The size of the data in the training set.
static int DEFAULT_MAX_ITERATIONS
          The default maximum number of iterations is 10000.
static double DEFAULT_REGULARIZATION_WEIGHT
          The default regularization weight is 1.0E-4.
static int DEFAULT_SAMPLE_SIZE
          The default sample size is 100.
protected  int dimensionality
          The dimensionality of the dataset.
protected  Random random
          The random number generator to use.
protected  double regularizationWeight
          The weight assigned to the regularization term in the algorithm, which is often represented as lambda.
protected  LinearBinaryCategorizer result
          The categorizer learned as a result of the algorithm.
protected  int sampleSize
          The sample size requested by the user.
protected  Vector update
          A vector used to compute the update for the weight vector.
 
Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations
 
Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration
 
Constructor Summary
PrimalEstimatedSubGradient()
          Creates a new PrimalEstimatedSubGradient with default parameters.
PrimalEstimatedSubGradient(int sampleSize, double regularizationWeight, int maxIterations, Random random)
          Creates a new PrimalEstimatedSubGradient with the given parameters.
 
Method Summary
protected  void cleanupAlgorithm()
          Called to clean up the learning algorithm's state after learning has finished.
 Random getRandom()
          Gets the random number generator used by this object.
 double getRegularizationWeight()
          Gets the regularization weight (lambda) assigned to the regularization term of the algorithm.
 LinearBinaryCategorizer getResult()
          Gets the current result of the algorithm.
 int getSampleSize()
          Gets the sample size, which is the number of examples sampled without replacement on each iteration of the algorithm.
protected  boolean initializeAlgorithm()
          Called to initialize the learning algorithm's state based on the data that is stored in the data field.
 void setRandom(Random random)
          Sets the random number generator used by this object.
 void setRegularizationWeight(double regularizationWeight)
          Sets the regularization weight (lambda) assigned to the regularization term of the algorithm.
 void setSampleSize(int sampleSize)
          Sets the sample size, which is the number of examples sampled without replacement on each iteration of the algorithm.
protected  boolean step()
          Called to take a single step of the learning algorithm.
 
Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
clone, getData, getKeepGoing, learn, setData, setKeepGoing, stop
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations
 
Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 
Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations
 
Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
 
Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid
 

Field Detail

DEFAULT_SAMPLE_SIZE

public static final int DEFAULT_SAMPLE_SIZE
The default sample size is 100.

See Also:
Constant Field Values

DEFAULT_REGULARIZATION_WEIGHT

public static final double DEFAULT_REGULARIZATION_WEIGHT
The default regularization weight is 1.0E-4.

See Also:
Constant Field Values

DEFAULT_MAX_ITERATIONS

public static final int DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 10000.

See Also:
Constant Field Values

sampleSize

protected int sampleSize
The sample size requested by the user. The actual sample size may be less than this in the case that the sample size is larger than the amount of data given in the training set.


regularizationWeight

protected double regularizationWeight
The weight assigned to the regularization term in the algorithm, which is often represented as lambda.


random

protected Random random
The random number generator to use.


dataSize

protected transient int dataSize
The size of the data in the training set.


dataList

protected transient ArrayList<? extends InputOutputPair<? extends Vectorizable,Boolean>> dataList
The data represented as a list.


dimensionality

protected transient int dimensionality
The dimensionality of the dataset.


dataSampleSize

protected transient int dataSampleSize
The minimum of the sample size and the data size.


update

protected transient Vector update
A vector used to compute the update for the weight vector. It acts as a workspace so that multiple vectors do not need to be created in the algorithm, thus reducing the overall number of objects created.


result

protected transient LinearBinaryCategorizer result
The categorizer learned as a result of the algorithm.

Constructor Detail

PrimalEstimatedSubGradient

public PrimalEstimatedSubGradient()
Creates a new PrimalEstimatedSubGradient with default parameters.


PrimalEstimatedSubGradient

public PrimalEstimatedSubGradient(int sampleSize,
                                  double regularizationWeight,
                                  int maxIterations,
                                  Random random)
Creates a new PrimalEstimatedSubGradient with the given parameters.

Parameters:
sampleSize - The number of examples sampled from the dataset on each iteration.
regularizationWeight - The regularization weight (lambda). Must be positive.
maxIterations - The maximum number of iterations. Must be positive.
random - The random number generator to use.
Method Detail

initializeAlgorithm

protected boolean initializeAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to initialize the learning algorithm's state based on the data that is stored in the data field. The return value indicates if the algorithm can be run or not based on the initialization.

Specified by:
initializeAlgorithm in class AbstractAnytimeBatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Boolean>>,LinearBinaryCategorizer>
Returns:
True if the learning algorithm can be run and false if it cannot.

step

protected boolean step()
Description copied from class: AbstractAnytimeBatchLearner
Called to take a single step of the learning algorithm.

Specified by:
step in class AbstractAnytimeBatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Boolean>>,LinearBinaryCategorizer>
Returns:
True if another step can be taken and false it the algorithm should halt.

cleanupAlgorithm

protected void cleanupAlgorithm()
Description copied from class: AbstractAnytimeBatchLearner
Called to clean up the learning algorithm's state after learning has finished.

Specified by:
cleanupAlgorithm in class AbstractAnytimeBatchLearner<Collection<? extends InputOutputPair<? extends Vectorizable,Boolean>>,LinearBinaryCategorizer>

getResult

public LinearBinaryCategorizer getResult()
Description copied from interface: AnytimeAlgorithm
Gets the current result of the algorithm.

Specified by:
getResult in interface AnytimeAlgorithm<LinearBinaryCategorizer>
Returns:
Current result of the algorithm.

getSampleSize

public int getSampleSize()
Gets the sample size, which is the number of examples sampled without replacement on each iteration of the algorithm.

Returns:
The sample size. Must be positive.

setSampleSize

public void setSampleSize(int sampleSize)
Sets the sample size, which is the number of examples sampled without replacement on each iteration of the algorithm.

Parameters:
sampleSize - The sample size. Must be positive.

getRegularizationWeight

public double getRegularizationWeight()
Gets the regularization weight (lambda) assigned to the regularization term of the algorithm.

Returns:
The regularization weight. Must be positive.

setRegularizationWeight

public void setRegularizationWeight(double regularizationWeight)
Sets the regularization weight (lambda) assigned to the regularization term of the algorithm.

Parameters:
regularizationWeight - The regularization weight. Must be positive.

getRandom

public Random getRandom()
Description copied from interface: Randomized
Gets the random number generator used by this object.

Specified by:
getRandom in interface Randomized
Returns:
The random number generator used by this object.

setRandom

public void setRandom(Random random)
Description copied from interface: Randomized
Sets the random number generator used by this object.

Specified by:
setRandom in interface Randomized
Parameters:
random - The random number generator for this object to use.