gov.sandia.cognition.learning.experiment
Class RandomByTwoFoldCreator<DataType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.util.AbstractRandomized
          extended by gov.sandia.cognition.learning.experiment.RandomByTwoFoldCreator<DataType>
Type Parameters:
DataType - The type of data to create folds over.
All Implemented Interfaces:
ValidationFoldCreator<DataType,DataType>, CloneableSerializable, Randomized, Serializable, Cloneable

public class RandomByTwoFoldCreator<DataType>
extends AbstractRandomized
implements ValidationFoldCreator<DataType,DataType>

A validation fold creator that takes a given collection of data and randomly splits it in half a given number of times, returning two folds for each split, using one half as training and the other half as testing. The number of folds is thus twice the parameterized number of splits. The data is reordered as a result of each split, so this should not be used for data whose sequence order matters. The default setup is a 5x2 cross-fold creation, which is a common validation technique.

Since:
3.0
Author:
Justin Basilico
See Also:
Serialized Form

Field Summary
static int DEFAULT_NUM_SPLITS
          The default number of splits is 5.
protected  int numSplits
          The number of splits.
 
Fields inherited from class gov.sandia.cognition.util.AbstractRandomized
random
 
Constructor Summary
RandomByTwoFoldCreator()
          Creates a new RandomByTwoFoldCreator with a default number of splits.
RandomByTwoFoldCreator(int numSplits)
          Creates a new RandomByTwoFoldCreator with a given number of splits.
RandomByTwoFoldCreator(int numSplits, Random random)
          Creates a new RandomByTwoFoldCreator with a given number of splits.
 
Method Summary
 List<PartitionedDataset<DataType>> createFolds(Collection<? extends DataType> data)
          Creates a list of partitioned (training and testing) datasets from the given single dataset.
 int getNumSplits()
          Gets the number of splits to perform.
 void setNumSplits(int numSplits)
          Sets the number of splits to perform.
 
Methods inherited from class gov.sandia.cognition.util.AbstractRandomized
clone, getRandom, setRandom
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_NUM_SPLITS

public static final int DEFAULT_NUM_SPLITS
The default number of splits is 5.

See Also:
Constant Field Values

numSplits

protected int numSplits
The number of splits. The number of folds is twice this number.

Constructor Detail

RandomByTwoFoldCreator

public RandomByTwoFoldCreator()
Creates a new RandomByTwoFoldCreator with a default number of splits.


RandomByTwoFoldCreator

public RandomByTwoFoldCreator(int numSplits)
Creates a new RandomByTwoFoldCreator with a given number of splits.

Parameters:
numSplits - The number of splits to create. The number of folds created is twice this number. It must be positive.

RandomByTwoFoldCreator

public RandomByTwoFoldCreator(int numSplits,
                              Random random)
Creates a new RandomByTwoFoldCreator with a given number of splits.

Parameters:
numSplits - The number of splits to create. The number of folds created is twice this number. It must be positive.
random - The random number generator to use.
Method Detail

createFolds

public List<PartitionedDataset<DataType>> createFolds(Collection<? extends DataType> data)
Description copied from interface: ValidationFoldCreator
Creates a list of partitioned (training and testing) datasets from the given single dataset.

Specified by:
createFolds in interface ValidationFoldCreator<DataType,DataType>
Parameters:
data - The data to create multiple folds from.
Returns:
The list of partitioned datasets.

getNumSplits

public int getNumSplits()
Gets the number of splits to perform. When a dataset is given, two times this number of partitions is returned. Must be positive.

Returns:
The number of splits to perform. Must be positive.

setNumSplits

public void setNumSplits(int numSplits)
Sets the number of splits to perform. When a dataset is given, two times this number of partitions is returned. Must be positive.

Parameters:
numSplits - The number of splits to perform. Must be positive.