Skip to content

Update XGBoost max supported version to 3.0.2#2657

Open
sid22669 wants to merge 4 commits intoapple:mainfrom
sid22669:feature/xgboost-3x-support
Open

Update XGBoost max supported version to 3.0.2#2657
sid22669 wants to merge 4 commits intoapple:mainfrom
sid22669:feature/xgboost-3x-support

Conversation

@sid22669
Copy link
Copy Markdown
Contributor

@sid22669 sid22669 commented Mar 5, 2026

Motivation

Fixes #2596

Users are blocked from converting XGBoost 3.x models because _XGBOOST_MAX_VERSION is set to 1.4.2. XGBoost 1.4.2 is incompatible with newer Python versions (3.12+), forcing users into a dependency nightmare.

Modifications

coremltools/_deps/__init__.py:

  • Updated _XGBOOST_MAX_VERSION from "1.4.2" to "3.0.2"

Verification

Tested all conversion paths with XGBoost 3.0.2 — the converter is fully compatible:

  • XGBRegressor → CoreML spec: OK
  • XGBClassifier (binary) → CoreML spec: OK
  • XGBClassifier (multi-class) → CoreML spec: OK
  • Raw Booster → CoreML spec: OK

The JSON tree dump format (split, split_condition, children, leaf, cover, yes, no, missing) is unchanged between 1.4.2 and 3.0.2. All APIs used by the converter (get_booster(), get_dump(), feature_names, copy(), n_classes_) work identically.

Checklist

  • Format code with pre-commit
  • Follow coremltools code style

The XGBoost converter is already compatible with 3.x — the JSON tree
dump format and API (get_booster, get_dump, feature_names, copy) are
unchanged. Tested with XGBRegressor, XGBClassifier (binary and
multi-class), and raw Booster conversion.
@TobyRoseman
Copy link
Copy Markdown
Collaborator

The tests are still using the old version of XGBoost. See the link I shared in the issue.

Update the pinned XGBoost version in test requirements to match
the new max supported version.
@sid22669
Copy link
Copy Markdown
Contributor Author

sid22669 commented Mar 7, 2026

Thanks for the review! I've updated reqs/test.pip to use XGBoost 3.0.2.

While testing, I also found two compatibility issues with XGBoost 3.x:

  1. base_score auto-estimation — XGBoost 3.x auto-estimates base_score from training data. The converter was hardcoding 0.5/0.0, causing prediction mismatches. Fixed by reading the actual value from booster.save_config().

  2. feature_names type — XGBoost 3.x requires feature_names to be a list. Added a type check before assignment.

Added a test to verify base_score propagation. Will push these changes shortly.

- Read base_score from booster config instead of hardcoding 0.5/0.0
- Convert feature_names to list for XGBoost 3.x compatibility
- Add test verifying base_score is correctly propagated
save_config() exists in all supported XGBoost versions (1.4.2+),
so silencing exceptions would hide real breakage.
@TobyRoseman
Copy link
Copy Markdown
Collaborator

Thanks for removing the try/except.

CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/2376256024

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new XGBoostBaseScoreTest only exercises the regressor path, but the diff also modifies base_prediction for both binary and multiclass classifiers in _tree_ensemble.py (lines ~252–265). The classifier branches—where base_score from the model config is spread across n_classes slots—have no corresponding test coverage, which is worth adding given that the semantics of base_score differ between regression and classification objectives (e.g., log-odds vs. raw margin for binary logistic).

Additionally, load_boston() was deprecated in scikit-learn 1.2 and removed in 1.4; the test class's setUpClass will fail on any modern scikit-learn installation. The existing tests in the file use it too, but introducing new test cases that rely on it compounds the problem rather than addressing it—switching to a synthetic dataset via sklearn.datasets.make_regression would be straightforward and future-proof.

Finally, the config['learner']['learner_model_param']['base_score'] key path is duplicated verbatim in both the converter (_tree_ensemble.py:203) and the test (test_boosted_trees_regression_numeric.py:332). Extracting a small helper (e.g., _get_base_score(booster)) in the converter module and importing it in the test would eliminate the duplication and make any future schema changes easier to handle in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support latest XGBoost version

3 participants