Skip to content

Commit d87a781

Browse files
authored
feat: Add DatetimeOrdinal transformer (#818) (#874)
* feat: Add DatetimeOrdinal transformer and unit tests (#818) Implement the new DatetimeOrdinal transformer for converting datetime features to their ordinal representation. This commit includes the transformer class itself and a full suite of pytest unit tests to ensure its correctness and robustness. * fix: Correct CI build failures This commit fixes two issues that were causing the CI checks to fail for the DatetimeOrdinal transformer. - Corrected the docstring format in the transform() method to resolve the sphinx-build error. - Removed trailing whitespace from a test file to pass the flake8 style check. * fix: Correct CI build failures This commit fixes issue that were causing the CI checks to fail for the DatetimeOrdinal transformer. - Removed trailing whitespace from a test file to pass the flake8 style checks. * fix: Correct CI build failures This commit fixes issue that were causing the CI checks to fail for the DatetimeOrdinal transformer. - Refactored long lines of code to resolve E501 errors reported by flake8. * fix: Correct CI build failures * fix: Correct CI build failures * fix: Correct CI build failures * test: Enhance error message validation per review (#874) * docs: Add user guide for DatetimeOrdinal transformer (#874)
1 parent 5750960 commit d87a781

9 files changed

Lines changed: 722 additions & 1 deletion

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,7 @@ Please share your story by answering 1 quick question
143143
### Datetime
144144
* DatetimeFeatures
145145
* DatetimeSubtraction
146+
* DatetimeOrdinal
146147

147148
### Time Series
148149
* LagFeatures
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
DatetimeOrdinal
2+
===============
3+
4+
.. automodule:: feature_engine.datetime.datetime_ordinal
5+
:members:
6+

docs/api_doc/datetime/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ features from existing datetime or object-like data.
1111

1212
DatetimeFeatures
1313
DatetimeSubtraction
14+
DatetimeOrdinal
1415

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ extract many new features from the date and time parts of the datetime variable:
256256

257257
- :doc:`api_doc/datetime/DatetimeFeatures`: extract features from datetime variables
258258
- :doc:`api_doc/datetime/DatetimeSubtraction`: computes subtractions between datetime variables
259+
- :doc:`api_doc/datetime/DatetimeOrdinal`: converts datetime variables into ordinal numbers
259260

260261
Feature Selection:
261262
~~~~~~~~~~~~~~~~~~
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
.. _datetime_ordinal:
2+
3+
.. currentmodule:: feature_engine.datetime
4+
5+
DatetimeOrdinal
6+
================
7+
8+
:class:`DatetimeOrdinal()` converts datetime variables into ordinal numbers, that is, a numerical representation of the date.
9+
10+
By default, :class:`DatetimeOrdinal()` returns the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1.
11+
12+
Optionally, :class:`DatetimeOrdinal()` can compute the number of days relative to a user-defined `start_date`.
13+
14+
Datetime ordinals with pandas
15+
-----------------------------
16+
17+
In Python, we can get the Gregorian ordinal of a date using the `toordinal()` method from a datetime object.
18+
19+
.. code:: python
20+
21+
import pandas as pd
22+
23+
data = pd.DataFrame({"date": pd.to_datetime(["2023-01-01", "2023-01-10"])})
24+
25+
data["ordinal"] = data["date"].apply(lambda x: x.toordinal())
26+
27+
data
28+
29+
The output shows the new ordinal feature:
30+
31+
.. code:: python
32+
33+
date ordinal
34+
0 2023-01-01 738521
35+
1 2023-01-10 738530
36+
37+
38+
Datetime ordinals with Feature-engine
39+
-------------------------------------
40+
41+
:class:`DatetimeOrdinal()` automatically converts one or more datetime variables into ordinal numbers. It works with variables whose dtype is datetime, as well as with object-type variables, provided that they can be parsed into datetime format.
42+
43+
:class:`DatetimeOrdinal()` uses pandas `toordinal()` under the hood. The main functionalities are:
44+
45+
- It can convert multiple datetime variables at once.
46+
- It can compute the ordinal number relative to a `start_date`.
47+
- It can automatically find and select datetime variables.
48+
49+
Example
50+
~~~~~~~
51+
52+
First, let's create a toy dataframe with 2 date variables:
53+
54+
.. code:: python
55+
56+
import pandas as pd
57+
from feature_engine.datetime import DatetimeOrdinal
58+
59+
toy_df = pd.DataFrame({
60+
"var_date1": ['May-1989', 'Dec-2020', 'Jan-1999', 'Feb-2002'],
61+
"var_date2": ['06/21/2012', '02/10/1998', '08/03/2010', '10/31/2020'],
62+
"other_var": [1, 2, 3, 4]
63+
})
64+
65+
Now, we will set up the transformer to convert `var_date2` into an ordinal feature.
66+
67+
.. code:: python
68+
69+
dtfs = DatetimeOrdinal(variables="var_date2")
70+
71+
df_transf = dtfs.fit_transform(toy_df)
72+
73+
df_transf
74+
75+
We see the new ordinal feature in the output:
76+
77+
.. code:: python
78+
79+
var_date1 other_var var_date2_ordinal
80+
0 May-1989 1 734675
81+
1 Dec-2020 2 729430
82+
2 Jan-1999 3 733987
83+
3 Feb-2002 4 737729
84+
85+
By default, :class:`DatetimeOrdinal()` drops the original datetime variable. To keep it, you can set `drop_original=False`.
86+
87+
Calculate days from a start date
88+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
89+
90+
:class:`DatetimeOrdinal()` can also calculate the number of days elapsed since a specific `start_date`.
91+
92+
.. code:: python
93+
94+
dtfs = DatetimeOrdinal(
95+
variables="var_date2",
96+
start_date="2010-01-01"
97+
)
98+
99+
df_transf = dtfs.fit_transform(toy_df)
100+
101+
df_transf
102+
103+
The new feature now represents the number of days between `var_date2` and January 1st, 2010. Note that dates before the `start_date` will result in negative numbers.
104+
105+
.. code:: python
106+
107+
var_date1 other_var var_date2_ordinal
108+
0 May-1989 1 903
109+
1 Dec-2020 2 -4343
110+
2 Jan-1999 3 215
111+
3 Feb-2002 4 3956
112+
113+
114+
Missing timestamps
115+
------------------
116+
117+
:class:`DatetimeOrdinal()` handles missing values (NaT) in datetime variables through the `missing_values` parameter, which can be set to `"raise"` or `"ignore"`.
118+
119+
If `missing_values="raise"`, the transformer will raise an error if NaT values are found in the datetime variables during `fit()` or `transform()`.
120+
121+
If `missing_values="ignore"`, the transformer will ignore NaT values, and the resulting ordinal feature will contain `NaN` (or `pd.NA`) in their place.
122+
123+
124+
.. autoclass:: DatetimeOrdinal
125+
:members:
126+
:undoc-members:
127+
:show-inheritance:
128+
129+
130+
Additional resources
131+
--------------------
132+
133+
For tutorials on how to create and use features from datetime columns, check the following courses:
134+
135+
.. figure:: ../../images/feml.png
136+
:width: 300
137+
:figclass: align-center
138+
:align: left
139+
:target: https://www.trainindata.com/p/feature-engineering-for-machine-learning
140+
141+
Feature Engineering for Machine Learning
142+
143+
.. figure:: ../../images/fetsf.png
144+
:width: 300
145+
:figclass: align-center
146+
:align: right
147+
:target: https://www.trainindata.com/p/feature-engineering-for-forecasting
148+
149+
Feature Engineering for Time Series Forecasting
150+
151+
|
152+
|
153+
|
154+
|
155+
|
156+
|
157+
|
158+
|
159+
|
160+
|
161+
162+
Or read our book:
163+
164+
.. figure:: ../../images/cookbook.png
165+
:width: 200
166+
:figclass: align-center
167+
:align: left
168+
:target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587
169+
170+
Python Feature Engineering Cookbook
171+
172+
|
173+
|
174+
|
175+
|
176+
|
177+
|
178+
|
179+
|
180+
|
181+
|
182+
|
183+
|
184+
|
185+
186+
187+
Both our book and course are suitable for beginners and more advanced data scientists
188+
alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.

docs/user_guide/datetime/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ features from existing datetime or object-like data.
1010
:maxdepth: 1
1111

1212
DatetimeFeatures
13+
DatetimeOrdinal
1314
DatetimeSubtraction

feature_engine/datetime/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22

33
from .datetime import DatetimeFeatures
44
from .datetime_subtraction import DatetimeSubtraction
5+
from .datetime_ordinal import DatetimeOrdinal
56

6-
__all__ = ["DatetimeFeatures", "DatetimeSubtraction"]
7+
__all__ = ["DatetimeFeatures", "DatetimeSubtraction", "DatetimeOrdinal"]

0 commit comments

Comments
 (0)