gov.sandia.cognition.learning.algorithm.bayes
Class DiscreteNaiveBayesCategorizer<InputType,CategoryType>

java.lang.Object
  extended by gov.sandia.cognition.util.AbstractCloneableSerializable
      extended by gov.sandia.cognition.learning.algorithm.bayes.DiscreteNaiveBayesCategorizer<InputType,CategoryType>
Type Parameters:
InputType - Type of inputs to the categorizer.
CategoryType - Type of the categories of the categorizer.
All Implemented Interfaces:
Evaluator<Collection<InputType>,CategoryType>, Categorizer<Collection<InputType>,CategoryType>, DiscriminantCategorizer<Collection<InputType>,CategoryType,Double>, CloneableSerializable, Serializable, Cloneable

@PublicationReferences(references={@PublicationReference(author={"Richard O. Duda","Peter E. Hart","David G. Stork"},title="Pattern Classification: Second Edition",type=Book,year=2001,pages={56,62}),@PublicationReference(author="Wikipedia",title="Naive Bayes classifier",type=WebPage,year=2009,url="http://en.wikipedia.org/wiki/Naive_bayes")})
public class DiscreteNaiveBayesCategorizer<InputType,CategoryType>
extends AbstractCloneableSerializable
implements DiscriminantCategorizer<Collection<InputType>,CategoryType,Double>

Implementation of a Naive Bayes Classifier for Discrete Data. That is, the categorizer takes a Collection of input attributes and infers the most-likely category with the assumption that each input attribute is independent of all others given the category. In other words,
Cml = arg max(c) P(C=c | X=inputs )
= arg max(c) P(X=inputs AND C=c) / P(X=inputs)
= arg max(c) P(X=inputs AND C=c) (since P(X=inputs) doesn't depend on the category).
P(X=inputs AND C=c) = P(X=inputs|C=c) * P(C=c)
(Naive Bayes assumption:) = P(X1=x1|C=c) * P(X2=x2|C=c) * ... * P(Xn=xn|C=c) * P(C=c).

While the DiscreteNaiveBayesCategorizer class assumes that all inputs have the same dimensionality, it handles missing (unknown) data by inserting a "null" into the given input Collection. Furthermore, the DiscreteNaiveBayesCategorizer class can also compute the probabilities of various quantities.

Since:
3.0
Author:
Kevin R. Dixon
See Also:
Serialized Form

Nested Class Summary
static class DiscreteNaiveBayesCategorizer.Learner<InputType,CategoryType>
          Learner for a DiscreteNaiveBayesCategorizer.
 
Constructor Summary
  DiscreteNaiveBayesCategorizer()
          Creates a new instance of DiscreteNaiveBayesCategorizer
  DiscreteNaiveBayesCategorizer(int inputDimensionality)
          Creates a new instance of DiscreteNaiveBayesCategorizer.
protected DiscreteNaiveBayesCategorizer(int inputDimensionality, DefaultDataDistribution<CategoryType> priorProbabilities, Map<CategoryType,List<DefaultDataDistribution<InputType>>> conditionalProbabilities)
          Creates a new instance of DiscreteNaiveBayesCategorizer.
 
Method Summary
 DiscreteNaiveBayesCategorizer<InputType,CategoryType> clone()
          This makes public the clone method on the Object class and removes the exception that it throws.
 double computeConditionalProbability(Collection<InputType> inputs, CategoryType category)
          Computes the class conditional for the given inputs at the given category assuming that each input feature is conditionally independent of all other features.
 double computeConjuctiveProbability(Collection<InputType> inputs, CategoryType category)
          Computes the conjunctive probability of the inputs and the category.
 double computeEvidenceProbabilty(Collection<InputType> inputs)
          Computes the probability of the given inputs.
 double computePosterior(Collection<InputType> inputs, CategoryType category)
          Computes the posterior probability of the inputs for the given category.
 CategoryType evaluate(Collection<InputType> inputs)
          Evaluates the function on the given input and returns the output.
 DefaultWeightedValueDiscriminant<CategoryType> evaluateWithDiscriminant(Collection<InputType> input)
          Evaluate the categorizer on the given input to produce the expected category plus a discriminant for later producing an ordering of how well items fit into that category.
 Set<CategoryType> getCategories()
          Gets the list of possible categories that the categorizer can produce.
 double getConditionalProbability(int index, InputType input, CategoryType category)
          Gets the conditional probability for the given input and category.
 int getInputDimensionality()
          Getter for inputDimensionality.
 double getPriorProbability(CategoryType category)
          Returns the prior probability of the given category.
 void setInputDimensionality(int inputDimensionality)
          Setter for inputDimensionality.
 void update(Collection<InputType> inputs, CategoryType category)
          Updates the probability tables from observing the sample inputs and category.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DiscreteNaiveBayesCategorizer

public DiscreteNaiveBayesCategorizer()
Creates a new instance of DiscreteNaiveBayesCategorizer


DiscreteNaiveBayesCategorizer

public DiscreteNaiveBayesCategorizer(int inputDimensionality)
Creates a new instance of DiscreteNaiveBayesCategorizer.

Parameters:
inputDimensionality - Assumed dimensionality of the inputs.

DiscreteNaiveBayesCategorizer

protected DiscreteNaiveBayesCategorizer(int inputDimensionality,
                                        DefaultDataDistribution<CategoryType> priorProbabilities,
                                        Map<CategoryType,List<DefaultDataDistribution<InputType>>> conditionalProbabilities)
Creates a new instance of DiscreteNaiveBayesCategorizer.

Parameters:
inputDimensionality - Assumed dimensionality of the inputs.
priorProbabilities - Table of category priors.
conditionalProbabilities - Class conditional probability table.
Method Detail

clone

public DiscreteNaiveBayesCategorizer<InputType,CategoryType> clone()
Description copied from class: AbstractCloneableSerializable
This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.

Specified by:
clone in interface CloneableSerializable
Overrides:
clone in class AbstractCloneableSerializable
Returns:
A clone of this object.

getCategories

public Set<CategoryType> getCategories()
Description copied from interface: Categorizer
Gets the list of possible categories that the categorizer can produce.

Specified by:
getCategories in interface Categorizer<Collection<InputType>,CategoryType>
Returns:
The list of possible categories.

computeEvidenceProbabilty

public double computeEvidenceProbabilty(Collection<InputType> inputs)
Computes the probability of the given inputs. In other words, P(X=inputs) = sum over all C=c ( P(X=inputs|C=c)* P(C=c) ).

Parameters:
inputs - Inputs for which to compute the probability.
Returns:
Probability of the inputs, P(X=inputs).

computePosterior

public double computePosterior(Collection<InputType> inputs,
                               CategoryType category)
Computes the posterior probability of the inputs for the given category. This is quite expensive as the denominator of Bayes rule will be computed by computing the numerator probabilities for each category, then summing them up. If you're interested in the most likely class, then I would STRONGLY suggest using computeConjuctiveProbability, which is much cheaper. In other words, P(C=category|X=inputs) = P(X=inputs|C=category)*P(C=category)/P(X=inputs).

Parameters:
inputs - Inputs to compute the posterior.
category - Category to compute the posterior.
Returns:
Posterior probability, P(C=category|X=inputs).

computeConditionalProbability

public double computeConditionalProbability(Collection<InputType> inputs,
                                            CategoryType category)
Computes the class conditional for the given inputs at the given category assuming that each input feature is conditionally independent of all other features. In other words, P(X=inputs|C=category) = P(X0=x0|C=category) * P(X1=x1|C=category) * ... * P(Xn=xn|C=category).

Parameters:
inputs - Inputs to compute the class conditional.
category - Category to compute the class conditional.
Returns:
Class conditional probability, P(X=inputs|C=category)

update

public void update(Collection<InputType> inputs,
                   CategoryType category)
Updates the probability tables from observing the sample inputs and category. If the tables are empty, then this observation sets the assumed input dimensionality.

Parameters:
inputs - Inputs to update.
category - Category to update.

computeConjuctiveProbability

public double computeConjuctiveProbability(Collection<InputType> inputs,
                                           CategoryType category)
Computes the conjunctive probability of the inputs and the category. This is the numerator of Bayes rule. In other words,
P( X=inputs AND C=category ) = P(X=inputs|C=category) * P(C=category).

Under the Naive Bayes assumption, the input features are assumed to be independent of all others given the category. So, we compute the above probability as
P(X=inputs|C=c) = P(X1=x1|C=c) * P(X2=x2|C=c) * ... * P(Xn=xn|C=c).

If we're just interested in finding the most-likely category, then the conjunctive probability is sufficient.

Parameters:
inputs - Inputs for which to compute the conjunctive probability.
category - Category for which to compute the conjunctive probability.
Returns:
The conjunctive probability, which is the numerator of Bayes rule.

evaluate

public CategoryType evaluate(Collection<InputType> inputs)
Description copied from interface: Evaluator
Evaluates the function on the given input and returns the output.

Specified by:
evaluate in interface Evaluator<Collection<InputType>,CategoryType>
Parameters:
inputs - The input to evaluate.
Returns:
The output produced by evaluating the input.

evaluateWithDiscriminant

public DefaultWeightedValueDiscriminant<CategoryType> evaluateWithDiscriminant(Collection<InputType> input)
Description copied from interface: DiscriminantCategorizer
Evaluate the categorizer on the given input to produce the expected category plus a discriminant for later producing an ordering of how well items fit into that category.

Specified by:
evaluateWithDiscriminant in interface DiscriminantCategorizer<Collection<InputType>,CategoryType,Double>
Parameters:
input - The input value to categorize with a discriminate
Returns:
A pair containing the value and the discriminant value used for ordering results belonging to the same category.

getConditionalProbability

public double getConditionalProbability(int index,
                                        InputType input,
                                        CategoryType category)
Gets the conditional probability for the given input and category. In other words,
P(Xindex=input|C=category).

Parameters:
index - Index to compute.
input - Input value to assume.
category - Category value to assume.
Returns:
Class conditional probability of the given input and category.

getPriorProbability

public double getPriorProbability(CategoryType category)
Returns the prior probability of the given category. In other words,
P(C=category).

Parameters:
category - Category to return the prior probability of.
Returns:
Prior probability of the given category.

getInputDimensionality

public int getInputDimensionality()
Getter for inputDimensionality.

Returns:
Assumed dimensionality of the inputs.

setInputDimensionality

public void setInputDimensionality(int inputDimensionality)
Setter for inputDimensionality. Also resets the probability tables.

Parameters:
inputDimensionality - Assumed dimensionality of the inputs.