From 82e0fb8b5e741a0e4297853e96adf8e51fbc391b Mon Sep 17 00:00:00 2001 From: solegalli Date: Sun, 5 Apr 2026 11:09:53 -0400 Subject: [PATCH] update scalers user guide --- .../scaling/MeanNormalizationScaler.rst | 95 +++++++------------ docs/user_guide/scaling/index.rst | 21 ++-- feature_engine/scaling/mean_normalization.py | 2 +- 3 files changed, 46 insertions(+), 72 deletions(-) diff --git a/docs/user_guide/scaling/MeanNormalizationScaler.rst b/docs/user_guide/scaling/MeanNormalizationScaler.rst index fb12b5c1b..e5413f4f8 100644 --- a/docs/user_guide/scaling/MeanNormalizationScaler.rst +++ b/docs/user_guide/scaling/MeanNormalizationScaler.rst @@ -1,20 +1,31 @@ -.. _mean_normalization_scaler: +.. _mean_normalisation_scaler: .. currentmodule:: feature_engine.scaling MeanNormalizationScaler ======================= -:class:`MeanNormalizationScaler()` scales variables using mean normalization. With mean normalization, -we center the distribution around 0, and rescale the distribution to the variable's value range, -so that its values vary between -1 and 1. This is accomplished by subtracting the mean of the feature -and then dividing by its range (i.e., the difference between the maximum and minimum values). +With mean normalisation, we center the variable distribution at around 0 and rescale the +variable's values so that they vary between -1 and 1. -The :class:`MeanNormalizationScaler()` only works with non-constant numerical variables. -If the variable is constant, the scaler will raise an error. +This is accomplished by subtracting the mean of the feature and then dividing by its +range (i.e., the difference between the maximum and minimum values). -Python example --------------- +Mean normalisation is given by the following formula: + +.. math:: + + X' = (X - Mean(X)) / Max(X) - Min(X) + +:class:`MeanNormalizationScaler()` scales variables using mean normalisation. + +.. note:: + + :class:`MeanNormalizationScaler()` only works with non-constant numerical variables. + If the variable is constant, the scaler will raise an error. + +Python implementation +--------------------- We'll show how to use :class:`MeanNormalizationScaler()` through a toy dataset. Let's create a toy dataset: @@ -47,7 +58,7 @@ The dataset looks like this: 3 jack Bristol 18 2.00 0.6 2020-02-24 00:03:00 We see that the only numerical features in this dataset are **Age**, **Marks**, and **Height**. We want -to scale them using mean normalization. +to scale them using mean normalisation. First, let's make a list with the variable names: @@ -69,8 +80,8 @@ Now, let's set up :class:`MeanNormalizationScaler()`: # fit the scaler scaler.fit(df) -The scaler learns the mean of every column in *vars* and their respective range. -Note that we can access these values in the following way: +With the method `fit()`, the scaler learned the mean of every variable in `vars` and +their respective value range. We can access these values in the following way: .. code:: python @@ -103,7 +114,7 @@ In the following output, we can see the scaled variables: 2 krish Liverpool -0.166667 0.141304 -0.166667 2020-02-24 00:02:00 3 jack Bristol -0.500000 0.576087 -0.500000 2020-02-24 00:03:00 -We can restore the data to itsoriginal values using the inverse transformation: +We can restore the data to its original values using the inverse transformation: .. code:: python @@ -125,52 +136,12 @@ In the following data, we see the scaled variables returned to their oridinal re Additional resources -------------------- -For more details about this and other feature engineering methods check out -these resources: - - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| - -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +For tutorials about this and other feature engineering methods check out these resources: + +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. + +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/scaling/index.rst b/docs/user_guide/scaling/index.rst index 244746028..719e328b2 100644 --- a/docs/user_guide/scaling/index.rst +++ b/docs/user_guide/scaling/index.rst @@ -11,14 +11,14 @@ is the process of transforming the range of numerical features so that they fit specific scale, usually to improve the performance and training stability of machine learning models. -Scaling helps to normalize the input data, ensuring that each feature contributes proportionately +Scaling helps to normalise the input data, ensuring that each feature contributes proportionately to the final result, particularly in algorithms that are sensitive to the range of the data, such as gradient descent-based models (e.g., linear regression, logistic regression, neural networks) -and distance-based models (e.g., K-nearest neighbors, clustering). +and distance-based models (e.g., K-nearest neighbours, clustering). Feature-engine's scalers replace the variables' values by the scaled ones. In this page, we discuss the importance of scaling numerical features, and then introduce the various -scaling techniques supported by Feature-engine. +scaling techniques supported by feature-engine. Importance of scaling --------------------- @@ -30,25 +30,28 @@ and distance-based methods. Additionally, scaling can improve convergence speed accuracy, leading to more reliable predictions. -When apply scaling ------------------- +When to apply scaling +--------------------- - **Training:** Most machine learning algorithms require data to be scaled before training, especially linear models, neural networks, and distance-based models. -- **Feature Engineering:** Scaling can be essential for certain feature engineering techniques, +- **Feature engineering:** Scaling can be essential for certain feature engineering techniques, like polynomial features. - **Resampling:** Some oversampling methods like SMOTE and many of the undersampling methods - clean data based on KNN algorithms, which are distance based models. + resample data based on KNN algorithms, which are distance based models. +- **Dimensionality reduction:** Principal component analysis (PCA) and other dimensionality reduction + methods are distance based, and as such, sensitive to the scale of the features (more details in our + course `Clustering and Dimensionality Reduction `_. -When Scaling Is Not Necessary +When scaling Is not necessary ----------------------------- Not all algorithms require scaling. For example, tree-based algorithms (like Decision Trees, Random Forests, Gradient Boosting) are generally invariant to scaling because they split data -based on the order of values, not the magnitude. +based on the order of values, not their magnitude. Scalers ------- diff --git a/feature_engine/scaling/mean_normalization.py b/feature_engine/scaling/mean_normalization.py index 78f4a958c..22b104380 100644 --- a/feature_engine/scaling/mean_normalization.py +++ b/feature_engine/scaling/mean_normalization.py @@ -44,7 +44,7 @@ class MeanNormalizationScaler(BaseNumericalTransformer): Constant variables will raise an error due to division by zero. - More details in the :ref:`User Guide `. + More details in the :ref:`User Guide `. Parameters