gov.sandia.cognition.statistics.method
Class ChiSquareConfidence
java.lang.Object
gov.sandia.cognition.util.AbstractCloneableSerializable
gov.sandia.cognition.statistics.method.ChiSquareConfidence
- All Implemented Interfaces:
- NullHypothesisEvaluator<Collection<? extends Number>>, CloneableSerializable, Serializable, Cloneable
@ConfidenceTestAssumptions(name="Chi-Squre test",
alsoKnownAs="Pearson\'s Chi-Square test",
description="The chi-square test determines if the given data were generated from the same discrete distributions.",
assumptions={"A large sample, typically above 30.","Typically, each bin from the discrete distribution must have at least 5 samples.","The underlying discrete distribution must obey the weak law of large numbers.","The observations are assumed to be independent."},
nullHypothesis="The frequency of events in the two datasets is consistent.",
dataPaired=true,
dataSameSize=true,
distribution=ChiSquareDistribution.CDF.class,
reference=@PublicationReference(author="Wikipedia",title="Pearson\'s chi-square test",type=WebPage,year=2009,url="http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test"))
public class ChiSquareConfidence
- extends AbstractCloneableSerializable
- implements NullHypothesisEvaluator<Collection<? extends Number>>
This is the chi-square goodness-of-fit test. This test allows us to compare
observations against expected results, where the observations and
expectations are recorded for discrete groups/conditions/bins. The null
hypothesis is that the observed values were drawn from the same distribution
as the expected values.
The chi-square goodness-of-fit test is a discrete version of the more
general Kolmogorov-Smirnov test. If your data were drawn from a continuous
distribution, I would recommend the K-S test instead.
- Since:
- 2.0
- Author:
- Kevin R. Dixon
- See Also:
- Serialized Form
INSTANCE
public static final ChiSquareConfidence INSTANCE
- Default instance variable since the class has no members.
ChiSquareConfidence
public ChiSquareConfidence()
- Creates a new instance of ChiSquareConfidence
evaluateNullHypothesis
public static <DomainType> ChiSquareConfidence.Statistic evaluateNullHypothesis(Collection<? extends DomainType> data,
ProbabilityMassFunction<DomainType> pmf)
- Computes the chi-square test between a collection of data and a
Probability Mass Function that may have create the observed data.
- Type Parameters:
DomainType
- Domain type of the PMF.- Parameters:
data
- Data observed from some discrete distribution.pmf
- Probability mass function that may have created the observed data.
- Returns:
- Chi-square test results.
evaluateNullHypothesis
public ChiSquareConfidence.Statistic evaluateNullHypothesis(Collection<? extends Number> data1,
Collection<? extends Number> data2)
- Description copied from interface:
NullHypothesisEvaluator
- Computes the probability that two data were generated by
the same distribution. NullHypothesisProbability=1 means that the
distributions are likely the same, NullHypothesisProbability=0 means they
are likely NOT the same, and NullHypothesisProbability less than 0.05
is the standard statistical significance test. This is the "p-value"
that social scientists like to use.
- Specified by:
evaluateNullHypothesis
in interface NullHypothesisEvaluator<Collection<? extends Number>>
- Parameters:
data1
- First dataset to considerdata2
- Second dataset to consider
- Returns:
- Probability that the two data were generated by
the same source. A value of NullHypothesisProbability less than 0.05
is the standard point at which social scientists say two distributions
were generated by different sources.