Root Cause Analysis
_cached_lookup in pricing.py resolves an unknown model string to a pricing multiplier via a prefix-match loop:
best_len = 0
best_key: str | None = None
for key in _MULTIPLIERS:
if normalized.startswith(key) and len(key) > best_len:
best_len = len(key)
best_key = key
When two or more keys in _MULTIPLIERS match the same prefix AND have the same length, the first one encountered (by dict insertion order from _RAW_MULTIPLIERS) wins silently.
Concrete example
All model strings beginning with gpt-5 that are not exact matches would hit multiple gpt-5.* keys (e.g. gpt-5.4, gpt-5.2, gpt-5.1, gpt-5.1-codex, gpt-5.4-mini) — all have length 5 characters as a common prefix. More critically, if a future model like gpt-5.3 is logged before its entry is added to _RAW_MULTIPLIERS, the match would silently resolve to whichever same-length key happens to appear first in the dict, potentially applying a wildly incorrect multiplier (e.g., 0× if it matched gpt-5.4-mini).
Why this matters
- The pricing multiplier affects cost calculations displayed to the user in
copilot-usage cost and the detail views.
- A 0× multiplier (applied to some mini models) would silently zero-out cost for tokens that should be billed at some rate.
- A 1.0× multiplier when the correct rate is 0× (or vice versa) overstates or understates cost.
- The bug is entirely silent — no warning is logged, no exception is raised. The only signal would be wrong numbers in the output.
- Adding a new model to
_RAW_MULTIPLIERS can retroactively change the pricing of existing unrecognized model strings if the new key happens to tie with an existing key on prefix length.
How ties arise in practice
Ties on match_len occur when:
- The unknown model string is a prefix of multiple known keys (
gpt-5 matches gpt-5.4, gpt-5.2, etc., all at match_len = 5)
- Two keys share the same text up to
len(key) characters (unlikely but possible with aliased model families)
Fix
When the algorithm finds a tie (a new key matches with len(key) == best_len), fall through to the unknown-model fallback rather than silently using whichever key is first:
best_len = 0
best_key: str | None = None
tied = False
for key in _MULTIPLIERS:
if normalized.startswith(key):
klen = len(key)
if klen > best_len:
best_len = klen
best_key = key
tied = False
elif klen == best_len:
tied = True # ambiguous — will use fallback
if tied:
best_key = None # force unknown-model path
Alternatively, resolve ties by preferring the shortest matching key (the most-generic/closest ancestor in the model family hierarchy), which is typically the safest fallback. Either way, log a warning so the tie is observable.
A separate, simpler guard: in the existing loop, change len(key) > best_len to len(key) >= best_len and always record the last matching key of a given length, combined with logging when best_len stays the same across two iterations.
Testing requirement
Add tests in test_pricing.py (or equivalent):
- Tie detection: construct a
_MULTIPLIERS-like dict with two keys of equal length that both prefix-match a query string; assert the function logs a warning (or returns the fallback multiplier, depending on chosen fix) rather than silently returning one of the tied entries.
- Regression: assert that known exact-match lookups (e.g.
gpt-5.4) are unaffected by the tie-detection logic.
- Unknown model fallback: assert that a model string matching no key at all returns the fallback multiplier (1.0×) and logs a warning.
Generated by Code Health Analysis · ● 6.5M · ◷
Root Cause Analysis
_cached_lookupinpricing.pyresolves an unknown model string to a pricing multiplier via a prefix-match loop:When two or more keys in
_MULTIPLIERSmatch the same prefix AND have the same length, the first one encountered (by dict insertion order from_RAW_MULTIPLIERS) wins silently.Concrete example
All model strings beginning with
gpt-5that are not exact matches would hit multiplegpt-5.*keys (e.g.gpt-5.4,gpt-5.2,gpt-5.1,gpt-5.1-codex,gpt-5.4-mini) — all have length 5 characters as a common prefix. More critically, if a future model likegpt-5.3is logged before its entry is added to_RAW_MULTIPLIERS, the match would silently resolve to whichever same-length key happens to appear first in the dict, potentially applying a wildly incorrect multiplier (e.g., 0× if it matchedgpt-5.4-mini).Why this matters
copilot-usage costand the detail views._RAW_MULTIPLIERScan retroactively change the pricing of existing unrecognized model strings if the new key happens to tie with an existing key on prefix length.How ties arise in practice
Ties on
match_lenoccur when:gpt-5matchesgpt-5.4,gpt-5.2, etc., all atmatch_len = 5)len(key)characters (unlikely but possible with aliased model families)Fix
When the algorithm finds a tie (a new
keymatches withlen(key) == best_len), fall through to the unknown-model fallback rather than silently using whichever key is first:Alternatively, resolve ties by preferring the shortest matching key (the most-generic/closest ancestor in the model family hierarchy), which is typically the safest fallback. Either way, log a warning so the tie is observable.
A separate, simpler guard: in the existing loop, change
len(key) > best_lentolen(key) >= best_lenand always record the last matching key of a given length, combined with logging whenbest_lenstays the same across two iterations.Testing requirement
Add tests in
test_pricing.py(or equivalent):_MULTIPLIERS-like dict with two keys of equal length that both prefix-match a query string; assert the function logs a warning (or returns the fallback multiplier, depending on chosen fix) rather than silently returning one of the tied entries.gpt-5.4) are unaffected by the tie-detection logic.