Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 33 additions & 62 deletions docs/user_guide/scaling/MeanNormalizationScaler.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,31 @@
.. _mean_normalization_scaler:
.. _mean_normalisation_scaler:

.. currentmodule:: feature_engine.scaling

MeanNormalizationScaler
=======================

:class:`MeanNormalizationScaler()` scales variables using mean normalization. With mean normalization,
we center the distribution around 0, and rescale the distribution to the variable's value range,
so that its values vary between -1 and 1. This is accomplished by subtracting the mean of the feature
and then dividing by its range (i.e., the difference between the maximum and minimum values).
With mean normalisation, we center the variable distribution at around 0 and rescale the
variable's values so that they vary between -1 and 1.

The :class:`MeanNormalizationScaler()` only works with non-constant numerical variables.
If the variable is constant, the scaler will raise an error.
This is accomplished by subtracting the mean of the feature and then dividing by its
range (i.e., the difference between the maximum and minimum values).

Python example
--------------
Mean normalisation is given by the following formula:

.. math::

X' = (X - Mean(X)) / Max(X) - Min(X)

:class:`MeanNormalizationScaler()` scales variables using mean normalisation.

.. note::

:class:`MeanNormalizationScaler()` only works with non-constant numerical variables.
If the variable is constant, the scaler will raise an error.

Python implementation
---------------------

We'll show how to use :class:`MeanNormalizationScaler()` through a toy dataset. Let's create
a toy dataset:
Expand Down Expand Up @@ -47,7 +58,7 @@ The dataset looks like this:
3 jack Bristol 18 2.00 0.6 2020-02-24 00:03:00

We see that the only numerical features in this dataset are **Age**, **Marks**, and **Height**. We want
to scale them using mean normalization.
to scale them using mean normalisation.

First, let's make a list with the variable names:

Expand All @@ -69,8 +80,8 @@ Now, let's set up :class:`MeanNormalizationScaler()`:
# fit the scaler
scaler.fit(df)

The scaler learns the mean of every column in *vars* and their respective range.
Note that we can access these values in the following way:
With the method `fit()`, the scaler learned the mean of every variable in `vars` and
their respective value range. We can access these values in the following way:

.. code:: python

Expand Down Expand Up @@ -103,7 +114,7 @@ In the following output, we can see the scaled variables:
2 krish Liverpool -0.166667 0.141304 -0.166667 2020-02-24 00:02:00
3 jack Bristol -0.500000 0.576087 -0.500000 2020-02-24 00:03:00

We can restore the data to itsoriginal values using the inverse transformation:
We can restore the data to its original values using the inverse transformation:

.. code:: python

Expand All @@ -125,52 +136,12 @@ In the following data, we see the scaled variables returned to their oridinal re
Additional resources
--------------------

For more details about this and other feature engineering methods check out
these resources:


.. figure:: ../../images/feml.png
:width: 300
:figclass: align-center
:align: left
:target: https://www.trainindata.com/p/feature-engineering-for-machine-learning

Feature Engineering for Machine Learning

|
|
|
|
|
|
|
|
|
|

Or read our book:

.. figure:: ../../images/cookbook.png
:width: 200
:figclass: align-center
:align: left
:target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587

Python Feature Engineering Cookbook

|
|
|
|
|
|
|
|
|
|
|
|
|

Both our book and course are suitable for beginners and more advanced data scientists
alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.
For tutorials about this and other feature engineering methods check out these resources:

- `Feature Engineering for Machine Learning <https://www.trainindata.com/p/feature-engineering-for-machine-learning>`_, online course.
- `Feature Engineering for Time Series Forecasting <https://www.trainindata.com/p/feature-engineering-for-forecasting>`_, online course.
- `Python Feature Engineering Cookbook <https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587>`_, book.

Both our book and courses are suitable for beginners and more advanced data scientists
alike. By purchasing them you are supporting `Sole <https://linkedin.com/in/soledad-galli>`_,
the main developer of feature-engine.
21 changes: 12 additions & 9 deletions docs/user_guide/scaling/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ is the process of transforming the range of numerical features so that they fit
specific scale, usually to improve the performance and training stability of machine learning
models.

Scaling helps to normalize the input data, ensuring that each feature contributes proportionately
Scaling helps to normalise the input data, ensuring that each feature contributes proportionately
to the final result, particularly in algorithms that are sensitive to the range of the data,
such as gradient descent-based models (e.g., linear regression, logistic regression, neural networks)
and distance-based models (e.g., K-nearest neighbors, clustering).
and distance-based models (e.g., K-nearest neighbours, clustering).

Feature-engine's scalers replace the variables' values by the scaled ones. In this page, we
discuss the importance of scaling numerical features, and then introduce the various
scaling techniques supported by Feature-engine.
scaling techniques supported by feature-engine.

Importance of scaling
---------------------
Expand All @@ -30,25 +30,28 @@ and distance-based methods. Additionally, scaling can improve convergence speed
accuracy, leading to more reliable predictions.


When apply scaling
------------------
When to apply scaling
---------------------

- **Training:** Most machine learning algorithms require data to be scaled before training,
especially linear models, neural networks, and distance-based models.

- **Feature Engineering:** Scaling can be essential for certain feature engineering techniques,
- **Feature engineering:** Scaling can be essential for certain feature engineering techniques,
like polynomial features.

- **Resampling:** Some oversampling methods like SMOTE and many of the undersampling methods
clean data based on KNN algorithms, which are distance based models.
resample data based on KNN algorithms, which are distance based models.

- **Dimensionality reduction:** Principal component analysis (PCA) and other dimensionality reduction
methods are distance based, and as such, sensitive to the scale of the features (more details in our
course `Clustering and Dimensionality Reduction <https://www.trainindata.com/clustering-dimensionality-reduction>`_.

When Scaling Is Not Necessary
When scaling Is not necessary
-----------------------------

Not all algorithms require scaling. For example, tree-based algorithms (like Decision Trees,
Random Forests, Gradient Boosting) are generally invariant to scaling because they split data
based on the order of values, not the magnitude.
based on the order of values, not their magnitude.

Scalers
-------
Expand Down
2 changes: 1 addition & 1 deletion feature_engine/scaling/mean_normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ class MeanNormalizationScaler(BaseNumericalTransformer):

Constant variables will raise an error due to division by zero.

More details in the :ref:`User Guide <mean_normalization_scaler>`.
More details in the :ref:`User Guide <mean_normalisation_scaler>`.


Parameters
Expand Down