Refactor: Reduce Complexity of sklearn_serializer.py
Summary
The file openmodels/serializers/sklearn/sklearn_serializer.py has grown quite large and complex, containing a mix of serialization logic, deserialization logic, type/dtype mapping, special-case handlers, and a large number of scikit-learn-specific workarounds. This makes the codebase harder to maintain, test, and extend.
Motivation
- Maintainability: The current file is lengthy and contains many responsibilities, making it difficult to navigate and update.
- Testability: Isolating logic into smaller, focused modules or classes will make it easier to write targeted unit tests.
- Extensibility: Reducing complexity will make it easier to add support for new estimators, kernels, or serialization features in the future.
- Readability: A more modular structure will help new contributors understand and contribute to the codebase.
Suggested Refactoring Tasks
- Split the file into smaller modules: For example, move loss serialization, kernel serialization, tree serialization, and special-case handlers into their own files or classes.
- Group related helper functions: Consider grouping helpers (e.g., type/dtype mapping, attribute extraction) into utility modules.
- Reduce duplication: Identify and refactor repeated patterns (e.g., recursive serialization/deserialization) into reusable functions.
- Document module boundaries: Add docstrings and comments to clarify the responsibilities of each new module/class.
- Add or improve tests: Ensure that the refactored code is covered by unit tests, especially for edge cases and custom estimator support.
Acceptance Criteria
- The main
sklearn_serializer.py file should be significantly shorter and focused on high-level orchestration.
- Specialized logic (losses, kernels, trees, etc.) should be moved to dedicated modules or classes.
- All existing tests should pass, and new tests should be added for any newly isolated logic.
- The public API and behavior should remain unchanged.
Related file: openmodels/serializers/sklearn/sklearn_serializer.py
Refactor: Reduce Complexity of
sklearn_serializer.pySummary
The file
openmodels/serializers/sklearn/sklearn_serializer.pyhas grown quite large and complex, containing a mix of serialization logic, deserialization logic, type/dtype mapping, special-case handlers, and a large number of scikit-learn-specific workarounds. This makes the codebase harder to maintain, test, and extend.Motivation
Suggested Refactoring Tasks
Acceptance Criteria
sklearn_serializer.pyfile should be significantly shorter and focused on high-level orchestration.Related file:
openmodels/serializers/sklearn/sklearn_serializer.py