gov.sandia.cognition.statistics.method
Class KolmogorovSmirnovConfidence

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.statistics.method.KolmogorovSmirnovConfidence
All Implemented Interfaces:
NullHypothesisEvaluator<Collection<? extends Number>>, CloneableSerializable, Serializable, Cloneable

@ConfidenceTestAssumptions(name="Kolmogorov-Smirnov test",
                           alsoKnownAs="K-S test",
                           description={"Determines if two datasets were drawn from the same univariate distribution.","Robust, nonparameteric test that makes no assumptions on the underlying distribution (continuous, discrete, etc.)."},
                           assumptions="The data were sampled independently from each other.",
                           nullHypothesis="The data were drawn from the same distribution.",
                           dataPaired=false,
                           dataSameSize=false,
                           distribution=KolmogorovDistribution.CDF.class,
                           reference=@PublicationReference(author="Wikipedia",title="Kolmogorov-Smirnov test",type=WebPage,year=2009,url="http://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test"))
public class KolmogorovSmirnovConfidence
extends AbstractCloneableSerializable
implements NullHypothesisEvaluator<Collection<? extends Number>>

Performs a Kolmogorov-Smirnov Confidence Test. This is often simply called the "K-S test". This is a powerful nonparametric test that determines the probability that two distributions were generated by the same distribution. There are minimal (no?) assumptions on the underlying data or distributions. That is, the distributions are NOT assumed to be Gaussian, etc.

Since:
2.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Nested Class Summary
static class KolmogorovSmirnovConfidence.Statistic
          Computes the ConfidenceStatistic associated with a K-S test
 
Field Summary
static KolmogorovSmirnovConfidence INSTANCE
          Default instance of the K-S test.
 
Constructor Summary
KolmogorovSmirnovConfidence()
          Creates a new instance of KolmogorovSmirnovConfidence
 
Method Summary
protected static double[] computeAscendingArray(Collection<? extends Number> data)
          Returns an array of ascending sorted values from the given Collection
static KolmogorovSmirnovConfidence.Statistic evaluateGaussianHypothesis(Collection<Double> data)
          Evaluates the Hypothesis that the given data were generated according to a UnivariateGaussian distribution.
static
<DomainType extends Number>
KolmogorovSmirnovConfidence.Statistic
evaluateNullHypothesis(Collection<? extends DomainType> data1, CumulativeDistributionFunction<DomainType> function)
          This is the standard K-S test for determining if the given data were generated by the given CDF.
 KolmogorovSmirnovConfidence.Statistic evaluateNullHypothesis(Collection<? extends Number> data1, Collection<? extends Number> data2)
          This is the standard K-S test for two distributions of data.
 
Methods inherited from class gov.sandia.cognition.util.AbstractCloneableSerializable
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gov.sandia.cognition.util.CloneableSerializable
clone
 

Field Detail

INSTANCE

public static final KolmogorovSmirnovConfidence INSTANCE
Default instance of the K-S test.

Constructor Detail

KolmogorovSmirnovConfidence

public KolmogorovSmirnovConfidence()
Creates a new instance of KolmogorovSmirnovConfidence

Method Detail

computeAscendingArray

protected static double[] computeAscendingArray(Collection<? extends Number> data)
Returns an array of ascending sorted values from the given Collection

Parameters:
data - Collection of doubles to sort into ascending order
Returns:
Array of ascending sorted values

evaluateNullHypothesis

@PublicationReference(author={"William H. Press","Saul A. Teukolsky","William T. Vetterling","Brian P. Flannery"},
                      title="Numerical Recipes in C, Second Edition",
                      type=Book,
                      year=1992,
                      pages={625,626},
                      notes={"Section 14.3","Function kstwo()"},
                      url="http://www.nrbook.com/a/bookcpdf.php")
public KolmogorovSmirnovConfidence.Statistic evaluateNullHypothesis(Collection<? extends Number> data1,
                                                                                                                             Collection<? extends Number> data2)
This is the standard K-S test for two distributions of data. Determines the probability that the two distributions of data were generated by the same underlying distributions. This is a parameter-free test, so the assumptions on the underlying data are minimal (inexistent?).

Specified by:
evaluateNullHypothesis in interface NullHypothesisEvaluator<Collection<? extends Number>>
Parameters:
data1 - First dataset to consider
data2 - Second dataset to consider
Returns:
ConfidenceStatistic from the K-S test.

evaluateNullHypothesis

@PublicationReference(author={"William H. Press","Saul A. Teukolsky","William T. Vetterling","Brian P. Flannery"},
                      title="Numerical Recipes in C, Second Edition",
                      type=Book,
                      year=1992,
                      pages=625,
                      notes={"Section 14.3","Function ksone()"})
public static <DomainType extends Number> KolmogorovSmirnovConfidence.Statistic evaluateNullHypothesis(Collection<? extends DomainType> data1,
                                                                                                                                                             CumulativeDistributionFunction<DomainType> function)
This is the standard K-S test for determining if the given data were generated by the given CDF. Computes the probability that the two distributions of data are actually the same underlying distributions. This is a parameter-free test, so the assumptions on the underlying data are minimal (inexistent?). For example, to test if a dataset is normally distribution, call computeNullHypothesisProbability( data, new UnivariateGaussian.CumulativeDistribution() ).

Type Parameters:
DomainType - Type of Number to consider
Parameters:
data1 - Dataset to consider
function - CDF to compare against the given data
Returns:
ConfidenceStatistic from the K-S test.

evaluateGaussianHypothesis

public static KolmogorovSmirnovConfidence.Statistic evaluateGaussianHypothesis(Collection<Double> data)
Evaluates the Hypothesis that the given data were generated according to a UnivariateGaussian distribution. A high null-hypothesis probability is not conclusive proof that the data were generated by a Gaussian. However, a low null-hypothesis probability is conclusive that the data were NOT likely generated by a Gaussian

Parameters:
data - Data to evaluate the possibility that they were generated according to a Gaussian Distribution
Returns:
Confidence statistic from the K-S test