Bayesian Decision Theory is a cornerstone of statistical pattern recognition, providing a robust framework for classifying objects based on probabilistic reasoning. It leverages Bayes' Theorem to make optimal decisions under uncertainty, making it a powerful tool in machine learning applications like image recognition, spam filtering, and medical diagnosis. This blog post explores Bayesian Decision Theory as presented in the provided lecture notes, delving into its mathematical foundations, practical implementations, and advanced extensions like Gaussian Mixture Models (GMM) and Naïve Bayes classifiers. We'll also include Python code samples to illustrate key concepts, ensuring a clear understanding for both beginners and advanced practitioners.
Bayesian Decision Theory addresses pattern classification by modeling the probability of a sample belonging to a specific class based on observed features. The lecture notes use a fish classification example, distinguishing between sea bass (
When additional information, such as a feature like lightness measurement ($ x $), is available, Bayesian Decision Theory incorporates this evidence to compute the posterior probability, $ p(\omega_j | x) $, which updates the likelihood of each class given the feature. The decision rule then assigns the sample to the class with the highest posterior probability.
Bayes' Theorem is the backbone of this approach, expressed as:
where:
- $ p(\omega_j | x)
$: Posterior probability of class $ \omega_j$ given feature $ x $. - $ p(x | \omega_j) $: Class-conditional probability density, describing the distribution of feature $ x $ for class
$\omega_j$ . - $ p(\omega_j)
$: Prior probability of class $ \omega_j$. - $ p(x) $: Evidence, computed as $ p(x) = \sum_{j=1}^c p(x | \omega_j) p(\omega_j) $, where $ c $ is the number of classes.
For a two-class problem, the decision rule is:
- Choose
$\omega_1$ if $ p(\omega_1 | x) > p(\omega_2 | x) $. - Choose
$\omega_2$ otherwise.
Since $ p(x) $ is a common scaling factor, the decision can be simplified to comparing $ p(x | \omega_1) p(\omega_1) $ and $ p(x | \omega_2) p(\omega_2) $.
Suppose we have prior probabilities $ p(\omega_1) = 2/3 $ (sea bass) and $ p(\omega_2) = 1/3 $ (salmon). Given a lightness measurement $ x $, we compute the posterior probabilities using the class-conditional densities functions $ p(x | \omega_1) $ and $ p(x | \omega_2) $, which describe the lightness distribution for each fish type.
For real-world applications, we often deal with multiple features (forming a feature vector
where $ p(\mathbf{x}) = \sum_{j=1}^c p(\mathbf{x} | \omega_j) p(\omega_j)
Alternative discriminant functions include:
These are equivalent for classification, as they preserve the order of probabilities.
The lecture notes emphasize the multivariate normal (Gaussian) distribution for modeling class-conditional densities functions due to its mathematical tractability and prevalence in real-world data. For a univariate case:
For a $ d
where
In practice,
The log-likelihood function is:
where
Here's a Python implementation using scikit-learn to classify data with a Gaussian assumption:
from sklearn.naive_bayes import GaussianNB import numpy as npnp.random.seed(42) class1 = np.random.multivariate_normal([-0.1055, -0.0974], [[1.0253, -0.0036], [-0.0036, 0.8880]], 100) class2 = np.random.multivariate_normal([2.0638, 3.0451], [[1.1884, -0.013], [-0.013, 1.0198]], 100) X = np.vstack((class1, class2)) y = np.array([0] * 100 + [1] * 100)
gnb = GaussianNB(priors=[1/3, 2/3]) gnb.fit(X, y)
new_sample = np.array([[0.5, 0.5]]) prediction = gnb.predict(new_sample) print(f"Predicted class: {prediction[0]}")