gov.sandia.cognition.math
Class UnivariateStatisticsUtil

java.lang.Object
  extended by gov.sandia.cognition.math.UnivariateStatisticsUtil

public class UnivariateStatisticsUtil
extends Object

Some static methods for computing generally useful univariate statistics.

Since:
2.0
Author:
Kevin R. Dixon

Constructor Summary
UnivariateStatisticsUtil()
           
 
Method Summary
static double computeCentralMoment(Iterable<? extends Number> data, double mean, int moment)
          Computes the desired biased estimate central moment of the given dataset.
static double computeCorrelation(Collection<? extends Number> data1, Collection<? extends Number> data2)
          Computes the correlation coefficient in a single pass.
static double computeEntropy(Iterable<? extends Number> data)
          Computes the information-theoretic entropy of the PMF in bits (base 2).
static double computeKurtosis(Collection<? extends Number> data)
          Computes the biased excess kurtosis of the given dataset.
static double computeMaximum(Iterable<? extends Number> data)
          Finds the maximum value of a data set.
static double computeMean(Iterable<? extends Number> data)
          Computes the arithmetic mean (average, expectation, first central moment) of a dataset
static Pair<Double,Double> computeMeanAndVariance(Iterable<? extends Number> data)
          Computes the mean and unbiased variance of a Collection of data using the one-pass approach.
static double computeMedian(Collection<? extends Number> data)
          Computes the median of the given data.
static Pair<Double,Double> computeMinAndMax(Iterable<? extends Number> data)
          Computes the minimum and maximum of a set of data in a single pass.
static double computeMinimum(Iterable<? extends Number> data)
          Finds the minimum value of a data set.
static double computePercentile(Collection<? extends Number> data, double percentile)
          Computes the percentile value of the given data.
static double computeRootMeanSquaredError(Collection<? extends Number> data)
          Computes the Root mean-squared (RMS) error between the data and its mean.
static double computeRootMeanSquaredError(Collection<? extends Number> data, double mean)
          Computes the Root mean-squared (RMS) error between the data and its mean
static double computeSkewness(Collection<? extends Number> data)
          Computes the unbiased skewness of the dataset.
static double computeSum(Iterable<? extends Number> data)
          Computes the arithmetic sum of the dataset
static double computeSumSquaredDifference(Iterable<? extends Number> data, double target)
          Computes the sum-squared difference between the data and a target
static double computeVariance(Collection<? extends Number> data)
          Computes the unbiased variance (second central moment, squared standard deviation) of a dataset.
static double computeVariance(Collection<? extends Number> data, double mean)
          Computes the unbiased variance (second central moment, squared standard deviation) of a dataset
static double computeWeightedCentralMoment(Iterable<? extends WeightedValue<? extends Number>> data, double mean, int moment)
          Computes the desired biased estimate central moment of the given dataset.
static double computeWeightedKurtosis(Collection<? extends WeightedValue<? extends Number>> data)
          Computes the biased excess kurtosis of the given dataset.
static double computeWeightedMean(Iterable<? extends WeightedValue<? extends Number>> data)
          Computes the arithmetic mean (average, expectation, first central moment) of a dataset.
static Pair<Double,Double> computeWeightedMeanAndVariance(Iterable<? extends WeightedValue<? extends Number>> data)
          Computes the mean and unbiased variance of a Collection of data using the one-pass approach.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UnivariateStatisticsUtil

public UnivariateStatisticsUtil()
Method Detail

computeMean

@PublicationReference(title="Algorithms for calculating variance",
                      type=WebPage,
                      year=2010,
                      author="Wikipedia",
                      url="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance")
public static double computeMean(Iterable<? extends Number> data)
Computes the arithmetic mean (average, expectation, first central moment) of a dataset

Parameters:
data - Collection of Doubles to consider
Returns:
Arithmetic mean of the given dataset

computeWeightedMean

@PublicationReference(title="Algorithms for calculating variance",
                      type=WebPage,
                      year=2010,
                      author="Wikipedia",
                      url="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance")
public static double computeWeightedMean(Iterable<? extends WeightedValue<? extends Number>> data)
Computes the arithmetic mean (average, expectation, first central moment) of a dataset. The absolute value of the weight is used to handle negative weights.

Parameters:
data - Collection of Doubles to consider.
Returns:
Arithmetic mean of the given dataset.

computeVariance

public static double computeVariance(Collection<? extends Number> data)
Computes the unbiased variance (second central moment, squared standard deviation) of a dataset. Computes the mean first, then computes the variance. If you already have the mean, then use the two-argument computeVariance(data,mean) method to save duplication of effort.

Parameters:
data - Data to consider
Returns:
Unbiased variance of the given dataset

computeVariance

public static double computeVariance(Collection<? extends Number> data,
                                     double mean)
Computes the unbiased variance (second central moment, squared standard deviation) of a dataset

Parameters:
data - Data to consider
mean - Pre-computed mean (or central value) of the dataset
Returns:
Unbiased variance of the given dataset

computeRootMeanSquaredError

public static double computeRootMeanSquaredError(Collection<? extends Number> data)
Computes the Root mean-squared (RMS) error between the data and its mean. Computes the mean first, then computes the RMS error. If you already have the mean, then use the two-argument computeRootMeanSquaredError(data,mean) method to save computation

Parameters:
data - Dataset to consider
Returns:
RMS error of the dataset about the mean

computeRootMeanSquaredError

public static double computeRootMeanSquaredError(Collection<? extends Number> data,
                                                 double mean)
Computes the Root mean-squared (RMS) error between the data and its mean

Parameters:
data - Dataset to consider
mean - Mean value about which to compute the sum-squared error
Returns:
RMS error of the dataset about the mean

computeSum

public static double computeSum(Iterable<? extends Number> data)
Computes the arithmetic sum of the dataset

Parameters:
data - Dataset to consider
Returns:
Arithmetic sum of the given dataset

computeSumSquaredDifference

public static double computeSumSquaredDifference(Iterable<? extends Number> data,
                                                 double target)
Computes the sum-squared difference between the data and a target

Parameters:
data - Dataset to consider
target - Target about which to compute the difference
Returns:
Sum-squared difference between the dataset and the target

computeCorrelation

@PublicationReference(author="Wikipedia",
                      title="Pearson product-moment correlation coefficient",
                      type=WebPage,
                      year=2011,
                      url="http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient")
public static double computeCorrelation(Collection<? extends Number> data1,
                                                                                          Collection<? extends Number> data2)
Computes the correlation coefficient in a single pass. However, this algorithm can become numerically unstable but is about twice as fast as the non-single-pass method.

Parameters:
data1 - First dataset to consider, must have same size as data2
data2 - Second dataset to consider
Returns:
Normalized correlation coefficient, [-1,+1]

computeMedian

public static double computeMedian(Collection<? extends Number> data)
Computes the median of the given data.

Parameters:
data - Data from which to compute the median.
Returns:
Median of the sample.

computePercentile

public static double computePercentile(Collection<? extends Number> data,
                                       double percentile)
Computes the percentile value of the given data. For example, if data has 101 values and "percentile" is 0.27, then the return value would be the ascending-sorted value in the 26th zero-based index.

Parameters:
data - Data from which to compute the percentile.
percentile - Percentile to choose, must be on the closed interval 0.0 to 1.0.
Returns:
Requested percentile from the data.

computeMinimum

public static double computeMinimum(Iterable<? extends Number> data)
Finds the minimum value of a data set.

Parameters:
data - Data set to consider
Returns:
Minimum value of the data set.

computeMaximum

public static double computeMaximum(Iterable<? extends Number> data)
Finds the maximum value of a data set.

Parameters:
data - Data set to consider
Returns:
Maximum value of the data set.

computeMinAndMax

public static Pair<Double,Double> computeMinAndMax(Iterable<? extends Number> data)
Computes the minimum and maximum of a set of data in a single pass.

Parameters:
data - Data to consider
Returns:
Minimum and Maximum

computeSkewness

@PublicationReference(author="Wikipedia",
                      title="Skewness",
                      type=WebPage,
                      year=2009,
                      url="http://en.wikipedia.org/wiki/Skewness")
public static double computeSkewness(Collection<? extends Number> data)
Computes the unbiased skewness of the dataset.

Parameters:
data - Data from which to compute the unbiased skewness.
Returns:
Unbiased skewness.

computeCentralMoment

public static double computeCentralMoment(Iterable<? extends Number> data,
                                          double mean,
                                          int moment)
Computes the desired biased estimate central moment of the given dataset.

Parameters:
data - Data to compute the moment of.
mean - Mean of the data (to prevent redundant computation).
moment - Desired moment of the data, must be greater than or equal to 1.
Returns:
Biased estimate of the desired central moment.

computeWeightedCentralMoment

public static double computeWeightedCentralMoment(Iterable<? extends WeightedValue<? extends Number>> data,
                                                  double mean,
                                                  int moment)
Computes the desired biased estimate central moment of the given dataset. The absolute value of the weight is used to handle negative weights.

Parameters:
data - Data to compute the moment of.
mean - Mean of the data (to prevent redundant computation).
moment - Desired moment of the data, must be greater than or equal to 1.
Returns:
Biased estimate of the desired central moment.

computeKurtosis

@PublicationReference(author="Wikipedia",
                      title="Kurtosis",
                      type=WebPage,
                      year=2009,
                      url="http://en.wikipedia.org/wiki/Kurtosis")
public static double computeKurtosis(Collection<? extends Number> data)
Computes the biased excess kurtosis of the given dataset. Intuitively, kurtosis quantifies the pointiness of the data by normalizing the fourth central moment.

Parameters:
data - Dataset to compute its kurtosis.
Returns:
Biased excess kurtosis of the given dataset.

computeWeightedKurtosis

@PublicationReference(author="Wikipedia",
                      title="Kurtosis",
                      type=WebPage,
                      year=2009,
                      url="http://en.wikipedia.org/wiki/Kurtosis")
public static double computeWeightedKurtosis(Collection<? extends WeightedValue<? extends Number>> data)
Computes the biased excess kurtosis of the given dataset. Intuitively, kurtosis quantifies the pointiness of the data by normalizing the fourth central moment. The absolute value of the weight is used to handle negative weights.

Parameters:
data - Dataset to compute its kurtosis.
Returns:
Biased excess kurtosis of the given dataset.

computeEntropy

@PublicationReference(author="Wikipedia",
                      title="Entropy (information theory)",
                      type=WebPage,
                      year=2009,
                      url="http://en.wikipedia.org/wiki/Entropy_(Information_theory)")
public static double computeEntropy(Iterable<? extends Number> data)
Computes the information-theoretic entropy of the PMF in bits (base 2).

Parameters:
data - Data to compute the entropy.
Returns:
Entropy in bits of the given PMF.

computeMeanAndVariance

@PublicationReference(title="Algorithms for calculating variance",
                      type=WebPage,
                      year=2010,
                      author="Wikipedia",
                      url="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance")
public static Pair<Double,Double> computeMeanAndVariance(Iterable<? extends Number> data)
Computes the mean and unbiased variance of a Collection of data using the one-pass approach.

Parameters:
data - Data to consider
Returns:
Mean and unbiased Variance Pair.

computeWeightedMeanAndVariance

@PublicationReference(title="Algorithms for calculating variance",
                      type=WebPage,
                      year=2010,
                      author="Wikipedia",
                      url="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance")
public static Pair<Double,Double> computeWeightedMeanAndVariance(Iterable<? extends WeightedValue<? extends Number>> data)
Computes the mean and unbiased variance of a Collection of data using the one-pass approach. The absolute value is used to handle negative weights.

Parameters:
data - Data to consider.
Returns:
Mean and unbiased Variance Pair.