Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 29 additions & 45 deletions docs/user_guide/creation/CyclicalFeatures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Cyclical encoding

The trigonometric functions sine and cosine are periodic and repeat their values every
2 pi radians. Thus, to transform cyclical variables into (x, y) coordinates using these
functions, first we need to normalize them to 2 pi radians.
functions, first we need to normalise them to 2 pi radians.

We achieve this by dividing the variables' values by their maximum value. Thus, the two
new features are derived as follows:
Expand All @@ -51,9 +51,9 @@ In Python, we can encode cyclical features by using the Numpy functions `sin` an
X[f"{variable}_sin"] = np.sin(X["variable"] * (2.0 * np.pi / X["variable"]).max())
X[f"{variable}_cos"] = np.cos(X["variable"] * (2.0 * np.pi / X["variable"]).max())

We can also use Feature-engine to automate this process.
We can also use feature-engine to automate this process.

Cyclical encoding with Feature-engine
Cyclical encoding with feature-engine
-------------------------------------

:class:`CyclicalFeatures()` creates two new features from numerical variables to better
Expand All @@ -69,7 +69,7 @@ Finding the max_value
~~~~~~~~~~~~~~~~~~~~~

:class:`CyclicalFeatures()` attempts to automate the process of cyclical encoding by
automatically determining the value used to normalize the feature between
automatically determining the value used to normalise the feature between
0 and 2 * pi radians, which coincides with the cycle of the periodic functions sine and
cosine.

Expand All @@ -86,7 +86,7 @@ Applying cyclical encoding
--------------------------

We'll start by applying cyclical encoding to a toy dataset to get familiar with how to
use Feature-engine for cyclical encoding.
use feature-engine for cyclical encoding.

In this example, we'll encode the cyclical features **days of the week** and **months**.
Let's create a toy dataframe with the variables "days" and "months":
Expand Down Expand Up @@ -131,6 +131,8 @@ The maximum values used for the transformation are stored in the attribute

cyclical.max_values_

Below we see the maximum values of each variable:

.. code:: python

{'day': 7, 'months': 12}
Expand All @@ -141,7 +143,7 @@ Let's have a look at the transformed dataframe:

print(X.head())

We see that the new variables were added at the right of our dataframe.
We see that the new variables were added at the right of our dataframe:

.. code:: python

Expand All @@ -166,7 +168,7 @@ the feature creation, we can set the parameter to `True`:
print(X.head())

The resulting dataframe contains only the cyclical encoded features; the original variables
are removed:
were removed:

.. code:: python

Expand Down Expand Up @@ -200,7 +202,7 @@ Understanding cyclical encoding
-------------------------------

We now know how to convert cyclical variables into (x, y) coordinates of a circle by using
the sine and cosine functions. Let’s now carry out some visualizations to better understand
the sine and cosine functions. Let’s now carry out some visualisations to better understand
the effect of this transformation.

Let's create a toy dataframe:
Expand Down Expand Up @@ -266,7 +268,7 @@ These are the sine and cosine features that represent the hour:


Let's now plot the hour variable against its sine transformation. We add perpendicular
lines to flag the hours 0 and 22.
lines to flag the hours 0 and 22:

.. code:: python

Expand Down Expand Up @@ -369,10 +371,10 @@ functions and cyclical encoding.
Feature-engine vs Scikit-learn
------------------------------

Let's compare the implementations of cyclical encoding between Feature-engine and Scikit-learn.
Let's compare the implementations of cyclical encoding between feature-engine and scikit-learn.
We'll work with the Bike sharing demand dataset, and we'll follow the implementation of
Cyclical encoding found in the `Time related features documentation <https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html#trigonometric-features>`_
from Scikit-learn.
cyclical encoding found in the `Time related features documentation <https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html#trigonometric-features>`_
from scikit-learn.

Let's load the libraries and dataset:

Expand Down Expand Up @@ -409,7 +411,7 @@ In the following output we see the bike sharing dataset:
3 14.395 0.75 0.0 13
4 14.395 0.75 0.0 1

To apply cyclical encoding with Scikit-learn, we can use the `FunctionTransformer`:
To apply cyclical encoding with scikit-learn, we can use the `FunctionTransformer`:

.. code:: python

Expand Down Expand Up @@ -476,7 +478,7 @@ and hour:

[17379 rows x 6 columns]

With Feature-engine, we can do the same as follows:
With feature-engine, we can do the same as follows:

.. code:: python

Expand Down Expand Up @@ -533,10 +535,10 @@ the variable hour by 23, instead of 24, because the values of these variables va
{'month': 12, 'weekday': 6, 'hour': 23}

Practically, there isn't a big difference between the values of the dataframes returned by
Scikit-learn and Feature-engine, and I doubt that this subtle difference will incur in a big
scikit-learn and feature-engine, and I doubt that this subtle difference will incur in a big
change in model performance.

However, if you want to divide the varibles weekday and hour by 7 and 24 respectively, you can
However, if you want to divide the variables weekday and hour by 7 and 24 respectively, you can
do so like this:

.. code:: python
Expand Down Expand Up @@ -574,39 +576,21 @@ the user, with automation, we can only go that far.
Additional resources
--------------------

For tutorials on how to create cyclical features, check out the following courses:

.. figure:: ../../images/feml.png
:width: 300
:figclass: align-center
:align: left
:target: https://www.trainindata.com/p/feature-engineering-for-machine-learning

Feature Engineering for Machine Learning

.. figure:: ../../images/fetsf.png
:width: 300
:figclass: align-center
:align: right
:target: https://www.trainindata.com/p/feature-engineering-for-forecasting

Feature Engineering for Time Series Forecasting

|
|
|
|
|
|
|
|
|
|

For a comparison between one-hot encoding, ordinal encoding, cyclical encoding and spline
encoding of cyclical features check out the following
`sklearn demo <https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html>`_.

Check also these Kaggle demo on the use of cyclical encoding with neural networks:

- `Encoding Cyclical Features for Deep Learning <https://www.kaggle.com/code/avanwyk/encoding-cyclical-features-for-deep-learning>`_.


For tutorials about this and other feature engineering methods check out these resources:

- `Feature Engineering for Machine Learning <https://www.trainindata.com/p/feature-engineering-for-machine-learning>`_, online course.
- `Feature Engineering for Time Series Forecasting <https://www.trainindata.com/p/feature-engineering-for-forecasting>`_, online course.
- `Python Feature Engineering Cookbook <https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587>`_, book.

Both our book and courses are suitable for beginners and more advanced data scientists
alike. By purchasing them you are supporting `Sole <https://linkedin.com/in/soledad-galli>`_,
the main developer of feature-engine.
Loading