gov.sandia.cognition.statistics.method
Class ChiSquareConfidence

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.statistics.method.ChiSquareConfidence
All Implemented Interfaces:
NullHypothesisEvaluator<Collection<? extends Number>>, CloneableSerializable, Serializable, Cloneable

@ConfidenceTestAssumptions(name="Chi-Squre test",
                           alsoKnownAs="Pearson\'s Chi-Square test",
                           description="The chi-square test determines if the given data were generated from the same discrete distributions.",
                           assumptions={"A large sample, typically above 30.","Typically, each bin from the discrete distribution must have at least 5 samples.","The underlying discrete distribution must obey the weak law of large numbers.","The observations are assumed to be independent."},
                           nullHypothesis="The frequency of events in the two datasets is consistent.",
                           dataPaired=true,
                           dataSameSize=true,
                           distribution=ChiSquareDistribution.CDF.class,
                           reference=@PublicationReference(author="Wikipedia",title="Pearson\'s chi-square test",type=WebPage,year=2009,url="http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test"))
public class ChiSquareConfidence
extends AbstractCloneableSerializable
implements NullHypothesisEvaluator<Collection<? extends Number>>

This is the chi-square goodness-of-fit test. This test allows us to compare observations against expected results, where the observations and expectations are recorded for discrete groups/conditions/bins. The null hypothesis is that the observed values were drawn from the same distribution as the expected values.

The chi-square goodness-of-fit test is a discrete version of the more general Kolmogorov-Smirnov test. If your data were drawn from a continuous distribution, I would recommend the K-S test instead.

Since:
2.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Nested Class Summary
static class ChiSquareConfidence.Statistic
          Confidence Statistic for a chi-square test
 
Field Summary
static ChiSquareConfidence INSTANCE
          Default instance variable since the class has no members.
 
Constructor Summary
ChiSquareConfidence()
          Creates a new instance of ChiSquareConfidence
 
Method Summary
static
<DomainType>
ChiSquareConfidence.Statistic
evaluateNullHypothesis(Collection<? extends DomainType> data, ProbabilityMassFunction<DomainType> pmf)
          Computes the chi-square test between a collection of data and a Probability Mass Function that may have create the observed data.
 ChiSquareConfidence.Statistic evaluateNullHypothesis(Collection<? extends Number> data1, Collection<? extends Number> data2)
          Computes the probability that two data were generated by the same distribution.
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Field Detail

INSTANCE

public static final ChiSquareConfidence INSTANCE
Default instance variable since the class has no members.

Constructor Detail

ChiSquareConfidence

public ChiSquareConfidence()
Creates a new instance of ChiSquareConfidence

Method Detail

evaluateNullHypothesis

public static <DomainType> ChiSquareConfidence.Statistic evaluateNullHypothesis(Collection<? extends DomainType> data,
                                                                                ProbabilityMassFunction<DomainType> pmf)
Computes the chi-square test between a collection of data and a Probability Mass Function that may have create the observed data.

Type Parameters:
DomainType - Domain type of the PMF.
Parameters:
data - Data observed from some discrete distribution.
pmf - Probability mass function that may have created the observed data.
Returns:
Chi-square test results.

evaluateNullHypothesis

public ChiSquareConfidence.Statistic evaluateNullHypothesis(Collection<? extends Number> data1,
                                                            Collection<? extends Number> data2)
Description copied from interface: NullHypothesisEvaluator
Computes the probability that two data were generated by the same distribution. NullHypothesisProbability=1 means that the distributions are likely the same, NullHypothesisProbability=0 means they are likely NOT the same, and NullHypothesisProbability less than 0.05 is the standard statistical significance test. This is the "p-value" that social scientists like to use.

Specified by:
evaluateNullHypothesis in interface NullHypothesisEvaluator<Collection<? extends Number>>
Parameters:
data1 - First dataset to consider
data2 - Second dataset to consider
Returns:
Probability that the two data were generated by the same source. A value of NullHypothesisProbability less than 0.05 is the standard point at which social scientists say two distributions were generated by different sources.