Skip to content

fix: cache filtered instruments during grouped loads#2237

Open
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/cache-filtered-instruments-per-load
Open

fix: cache filtered instruments during grouped loads#2237
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/cache-filtered-instruments-per-load

Conversation

@he-yufeng
Copy link
Copy Markdown

Description

QlibDataLoader.load() now resolves filter_pipe once before grouped field loading starts, then reuses the filtered instruments for each field group. Direct load_group_df() calls keep the old behavior by resolving instruments at that entry point.

A regression test covers grouped loading with two field groups and verifies that D.instruments(..., filter_pipe=...) is called once while D.features(...) is still called per group.

Motivation and Context

Fixes #2236.

The previous grouped path delegated every group to load_group_df() with the original market name, so each group re-ran the same dynamic instrument filter pipeline. With multiple fields_groups, that multiplies filter work even though the filtered instrument set is identical for the load request.

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Targeted checks run locally:

  • python -m py_compile qlib\data\dataset\loader.py tests\data_mid_layer_tests\test_dataloader.py
  • git diff --check
  • an isolated loader smoke test that stubs Qlib's C extensions and asserts one D.instruments call plus two grouped D.features calls

I also attempted the targeted pytest:

  • python -m pytest tests\data_mid_layer_tests\test_dataloader.py::TestDataLoader::test_group_loader_applies_filter_pipe_once -q

That is blocked in my Windows checkout before test collection because the local Cython extension is not built:

ModuleNotFoundError: No module named 'qlib.data._libs.rolling'

Building the extension with python setup.py build_ext --inplace also fails locally because this Windows environment cannot find the Windows SDK io.h header while compiling rolling.cpp.

Screenshots of Test Results (if appropriate):

  1. Pipeline test: not run; local C extension build is blocked as described above.
  2. Your own tests: py_compile, diff check, and isolated loader smoke test passed.

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SeriesDFilter] Redundant execution of filter_pipe for each fields_group causes slow data loading

1 participant