-
-
Notifications
You must be signed in to change notification settings - Fork 334
Add TextFeatures transformer for text feature extraction #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ankitlade12
commented
Jan 8, 2026
- Add TextFeatures class to extract features from text columns
- Support for features: char_count, word_count, digit_count, uppercase_count, etc.
- Add comprehensive tests with pytest parametrize
- Add user guide documentation
solegalli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ankitlade12
Thanks a lot!
This transformer, function-wise, I'd say it's ready. I made a few suggestions regarding how to optimize the feature creation functions. Let me know if they make sense.
Other than that, we need the various docs file and we'll be good to go :)
Thanks again!
- Add ArcSinhTransformer class with loc and scale parameters - Support for positive and negative values (unlike LogTransformer) - Includes inverse_transform method - Add comprehensive tests with pytest parametrize - Add user guide documentation
…ide with comparison and references
|
We need to rebase main so the 2 remaining tests pass. |
solegalli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ankitlade12
I am very sorry for the delayed review. I am travelling till end of April, so I am a bit slower than usual.
I think, for the first version of the transformer, let's enforce the user to pass the names of the text variables. They can pass one or more variables in case there are more than one text column.
Other than that, we need to add the tranformer in the docs/index file, in the readme, and in the docs/api, and adjust the tests and the demo to the newer functionality. Then it is good to merge.
Thank you very much for this great addition.
ba25768 to
ba69e23
Compare
|
Hey @solegalli, I tracked down the cause of the CI failures. They are caused by Pandas 2.2/3.0 breaking changes in the CI environment (specifically datetime string formatting and select_dtypes behavior). Because these are breaking the entire library (280 failures), they need to be fixed in the main branch first. My |
ba69e23 to
41f9528
Compare