gov.sandia.cognition.statistics.method
Class BernoulliConfidence

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.statistics.method.BernoulliConfidence
All Implemented Interfaces:
ConfidenceIntervalEvaluator<Collection<Boolean>>, CloneableSerializable, Serializable, Cloneable

public class BernoulliConfidence
extends AbstractCloneableSerializable
implements ConfidenceIntervalEvaluator<Collection<Boolean>>

Computes the Bernoulli confidence interval. In other words, computes the Bernoulli parameter based on the given data and the desired level of confidence. This answers the question, "What is true range of classification rates given a collection of correct/incorrect guesses at a given level of confidence?" For example, if my classifier gets { Correct, Wrong, Correct, Correct, Correct, Wrong, Correct, Correct }, the true classification rate of my classifier at 50% confidence is Pr{ 0.5335 <= p <= 0.9665 } >= 0.5

Since:
2.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Field Summary
static BernoulliConfidence INSTANCE
          This class has no members, so here's a static instance.
 
Constructor Summary
BernoulliConfidence()
          Creates a new instance of BernoulliConfidence
 
Method Summary
 ConfidenceInterval computeConfidenceInterval(Collection<Boolean> data, double confidence)
          Computes the ConfidenceInterval for the Bernoulli parameter based on the given data and the desired level of confidence.
 ConfidenceInterval computeConfidenceInterval(double mean, double variance, int numSamples, double confidence)
          Computes the confidence interval given the mean and variance of the samples, number of samples, and corresponding confidence interval
static ConfidenceInterval computeConfidenceInterval(double bernoulliParameter, int numSamples, double confidence)
          Computes the ConfidenceInterval for the Bernoulli parameter based on the given data and the desired level of confidence.
static int computeSampleSize(double accuracy, double confidence)
          Computes the number of samples needed to estimate the Bernoulli parameter "p" (mean) within "accuracy" with probability at least "confidence".
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INSTANCE

public static final BernoulliConfidence INSTANCE
This class has no members, so here's a static instance.

Constructor Detail

BernoulliConfidence

public BernoulliConfidence()
Creates a new instance of BernoulliConfidence

Method Detail

computeConfidenceInterval

public ConfidenceInterval computeConfidenceInterval(Collection<Boolean> data,
                                                    double confidence)
Computes the ConfidenceInterval for the Bernoulli parameter based on the given data and the desired level of confidence. This answers the question, "What is true range of classification rates given a collection of correct/incorrect guesses at a given level of confidence?" For example, if my classifier gets { Correct, Wrong, Correct, Correct, Correct, Wrong, Correct, Correct }, the true classification rate of my classifier at 50% confidence is Pr{ 0.5335 <= p <= 0.9665 } >= 0.5

Specified by:
computeConfidenceInterval in interface ConfidenceIntervalEvaluator<Collection<Boolean>>
Parameters:
data - Correct/Wrong data
confidence - Confidence level to place on the confidence interval, must be (0,1]
Returns:
Range of values for the accuracy of the classifier at the desired confidence

computeConfidenceInterval

@PublicationReference(author="Wikipedia",
                      title="",
                      type=WebPage,
                      year=2009,
                      url="http://en.wikipedia.org/wiki/Margin_of_error")
public static ConfidenceInterval computeConfidenceInterval(double bernoulliParameter,
                                                                                                             int numSamples,
                                                                                                             double confidence)
Computes the ConfidenceInterval for the Bernoulli parameter based on the given data and the desired level of confidence. This answers the question, "What is true range of classification rates given a collection of correct/incorrect guesses at a given level of confidence?" For example, if my classifier gets { Correct, Wrong, Correct, Correct, Correct, Wrong, Correct, Correct }, the true classification rate of my classifier at 50% confidence is Pr{ 0.5335 <= p <= 0.9665 } >= 0.5

Parameters:
bernoulliParameter - Estimated Bernoulli parameter, classifier success rate, must be [0,1]
numSamples - Number of samples used in the determination
confidence - Confidence level to place on the confidence interval, must be (0,1]
Returns:
Range of values for the accuracy of the classifier at the desired confidence

computeConfidenceInterval

public ConfidenceInterval computeConfidenceInterval(double mean,
                                                    double variance,
                                                    int numSamples,
                                                    double confidence)
Description copied from interface: ConfidenceIntervalEvaluator
Computes the confidence interval given the mean and variance of the samples, number of samples, and corresponding confidence interval

Specified by:
computeConfidenceInterval in interface ConfidenceIntervalEvaluator<Collection<Boolean>>
Parameters:
mean - Mean of the distribution.
variance - Variance of the distribution.
numSamples - Number of samples in the underlying data
confidence - Confidence value to assume for the ConfidenceInterval
Returns:
ConfidenceInterval capturing the range of the mean of the data at the desired level of confidence

computeSampleSize

@PublicationReference(author="Wikipedia",
                      title="",
                      type=WebPage,
                      year=2009,
                      url="http://en.wikipedia.org/wiki/Margin_of_error")
public static int computeSampleSize(double accuracy,
                                                                                      double confidence)
Computes the number of samples needed to estimate the Bernoulli parameter "p" (mean) within "accuracy" with probability at least "confidence". Answers the question, "How many people do I need to survey to estimate how many people would vote for Budweiser as the King of Beers within a desired accuracy and a set confidence?" For example, to correctly determine the accuracy within 0.01 with confidence=0.95, we need up to 50000 samples.

Parameters:
accuracy - Desired accuracy to estimate, on the interval (0,1]
confidence - Desired confidence, on the interval (0,1]
Returns:
Maximum number of samples needed to achieve the accuracy with the level of confidence