Add FreshLILOLabelCheck transition criterion (#4994)#4994
Closed
ItsMrLin wants to merge 3 commits intofacebook:mainfrom
Closed
Add FreshLILOLabelCheck transition criterion (#4994)#4994ItsMrLin wants to merge 3 commits intofacebook:mainfrom
ItsMrLin wants to merge 3 commits intofacebook:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4994 +/- ##
==========================================
- Coverage 96.84% 96.83% -0.01%
==========================================
Files 604 605 +1
Lines 65022 65235 +213
==========================================
+ Hits 62971 63172 +201
- Misses 2051 2063 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 9, 2026
Summary: Add a hash-aware transition criterion for LILO GS loops. Unlike plain MinTrials which counts all completed trials from a node, MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash matches the current experiment state. This ensures the GS correctly transitions from LILO labeling → MBG only when enough *fresh* labels exist (labels produced under the current experiment data + LLM messages). Trials without a LILO input hash (non-LILO trials) are always counted, preserving backward compatibility. Changes: - Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py` that delegates hash computation to `get_current_lilo_hash` from `hash_utils` (replacing a private `_compute_current_hash` static method) - Remove redundant pass-through `__init__` — the parent class handles all args - Register in JSON encoder/decoder registries for serialization support - Add tests verifying fresh/stale counting behavior Reviewed By: saitcakmak Differential Revision: D95284285
1c0ab8f to
2d0aa0e
Compare
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 9, 2026
Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. Unlike plain MinTrials which counts all completed trials from a node, MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash matches the current experiment state. This ensures the GS correctly transitions from LILO labeling → MBG only when enough *fresh* labels exist (labels produced under the current experiment data + LLM messages). Trials without a LILO input hash (non-LILO trials) are always counted, preserving backward compatibility. Changes: - Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py` that delegates hash computation to `get_current_lilo_hash` from `hash_utils` (replacing a private `_compute_current_hash` static method) - Remove redundant pass-through `__init__` — the parent class handles all args - Register in JSON encoder/decoder registries for serialization support - Add tests verifying fresh/stale counting behavior Reviewed By: saitcakmak Differential Revision: D95284285
9facfc4 to
bfc2cc4
Compare
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 13, 2026
Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO → MBG): is_met when fresh count ≥ threshold. "Enough fresh labels — proceed to BO generation." - `require_sufficient=False` (MBG → LILO): is_met when fresh count < threshold. "Labels are stale — relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` → always met, `require_sufficient=False` → never met. This prevents false relabeling triggers on non-LILO experiments. Renamed from `MinTrialsWithLILOInputHashCheck`. Reviewed By: saitcakmak Differential Revision: D95284285
bfc2cc4 to
da15e27
Compare
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 13, 2026
Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO → MBG): is_met when fresh count ≥ threshold. "Enough fresh labels — proceed to BO generation." - `require_sufficient=False` (MBG → LILO): is_met when fresh count < threshold. "Labels are stale — relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` → always met, `require_sufficient=False` → never met. This prevents false relabeling triggers on non-LILO experiments. Renamed from `MinTrialsWithLILOInputHashCheck`. Reviewed By: saitcakmak Differential Revision: D95284285
da15e27 to
2f58b4b
Compare
2f58b4b to
36eef43
Compare
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 13, 2026
Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285
36eef43 to
2bb65c9
Compare
Summary: Add hash-based data freshness tracking for LILO (Language-in-the-Loop) pairwise preference labels. When LILOPairwiseMetric produces labels, it now stamps a SHA-256 hash of the experiment's LILO inputs (metric data for input_metric_names + LLM messages) onto the trial's _properties. If any of these inputs change (new data arrives, data is updated, or the user modifies LLM messages), the hash changes, indicating that existing LILO labels are stale. Changes: - Add `LILO_INPUT_HASH` key to `Keys` enum in `constants.py` - Create `ax/utils/common/hash_utils.py` with `compute_lilo_input_hash` (standalone hash function) and `get_current_lilo_hash` (convenience helper that looks up the pairwise `DerivedMetric` on an experiment, extracts `input_metric_names`, and computes the hash — returns `None` if no pairwise metric is registered) - Stamp hash in `LILOPairwiseMetric._compute_derived_values` after producing labels - Add tests for hash determinism, sensitivity to data/message changes, stamping, and `get_current_lilo_hash` helper Differential Revision: D95284287
Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Differential Revision: D95284286
2bb65c9 to
91ef649
Compare
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 13, 2026
Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 13, 2026
Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285
91ef649 to
ac531be
Compare
Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285
ac531be to
4d64f87
Compare
|
This pull request has been merged in e2056d2. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add a hash-aware transition criterion for LILO GS loops.
FreshLILOLabelCheckcounts only trials whose LILO input hash matches thecurrent experiment state, ensuring transitions are gated on fresh labels
(produced under current data + LLM messages).
The
require_sufficientflag controls the transition direction:require_sufficient=True(LILO_LABELING -> MBG): is_met when fresh countrequire_sufficient=False(MBG -> LILO_LABELING): is_met when fresh count< threshold. "Labels are stale -- relabel before generating."
Non-LILO experiments (no pairwise DerivedMetric) short-circuit:
require_sufficient=True-> always met,require_sufficient=False-> nevermet. This prevents false relabeling triggers on non-LILO experiments.
Reviewed By: saitcakmak
Differential Revision: D95284285