Skip to content

Conversation

@ankitlade12
Copy link
Contributor

  • Add TextFeatures class to extract features from text columns
  • Support for features: char_count, word_count, digit_count, uppercase_count, etc.
  • Add comprehensive tests with pytest parametrize
  • Add user guide documentation

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ankitlade12

Thanks a lot!

This transformer, function-wise, I'd say it's ready. I made a few suggestions regarding how to optimize the feature creation functions. Let me know if they make sense.

Other than that, we need the various docs file and we'll be good to go :)

Thanks again!

- Add ArcSinhTransformer class with loc and scale parameters
- Support for positive and negative values (unlike LogTransformer)
- Includes inverse_transform method
- Add comprehensive tests with pytest parametrize
- Add user guide documentation
@ankitlade12 ankitlade12 requested a review from solegalli January 23, 2026 16:40
@solegalli
Copy link
Collaborator

We need to rebase main so the 2 remaining tests pass.

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ankitlade12

I am very sorry for the delayed review. I am travelling till end of April, so I am a bit slower than usual.

I think, for the first version of the transformer, let's enforce the user to pass the names of the text variables. They can pass one or more variables in case there are more than one text column.

Other than that, we need to add the tranformer in the docs/index file, in the readme, and in the docs/api, and adjust the tests and the demo to the newer functionality. Then it is good to merge.

Thank you very much for this great addition.

@ankitlade12 ankitlade12 force-pushed the add-text-features branch 5 times, most recently from ba25768 to ba69e23 Compare January 26, 2026 20:38
@ankitlade12
Copy link
Contributor Author

Hey @solegalli,

I tracked down the cause of the CI failures. They are caused by Pandas 2.2/3.0 breaking changes in the CI environment (specifically datetime string formatting and select_dtypes behavior).

Because these are breaking the entire library (280 failures), they need to be fixed in the main branch first. My
TextFeatures
code works perfectly on the current stable releases and passes all 30 isolated tests. I'd be happy to re-run the CI once the library is updated to handle the new Pandas string/datetime standards."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants