Dimensionality reduction

What is Dimensionality reduction?

It is a technique used in ML to reduce the number of dimensions such that it retains only those most important components.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.Methods are commonly divided into linear and non-linear approaches.[1] Approaches can also be divided into feature selection and feature extraction.[2] Dimensionality reduction can be used for noise reduction, data visualization, cluster analysis, or as an intermediate step to facilitate other analyses.

Dimensionality reduction is used both in supervised and unsupervised learning techniques.

PCA can be used for both supervised and unsupervised learning techniques LDA can be used only for supervised learning technique

LDA tries to reduce the dimensions of the feature set while retaining the information that discriminates the output class label as well.
LDA tries to find a decision boundary around each cluster of a class
It will then project these data points in a new dimension such that all the clusters are separate from each other as much as possible. hence, the individual points in a cluster are closer to the centroid of that particular cluster.
These new dimensions form the linear discriminants of the feature set.

Choose PCA --> When the data is highly irregular in terms of distribution (skewed)
Choose LDA --> Uniform distribution