fix(tests): align 3 stale OSS test assertions with intentional defaults by yilu331 · Pull Request #164 · ReflexioAI/reflexio

yilu331 · 2026-06-16T10:42:21Z

Summary

Nightly /check-and-test run found 3 OSS unit-test failures that were stale assertions lagging intentional behavior changes on main. The application code is correct; the tests are fixed to match.

Test	Was	Now	Why
`test_unified_search_floor::test_floor_applied_per_arm`	`RetrievalFloorConfig(user_playbook_floor=-5.0)`	adds `enabled=True`	The `enabled` default flipped `True`→`False`; the floor-applied test must opt in.
`test_litellm_client::test_config_defaults`	`max_retries == 1`	`max_retries == 3`	#124 adopted native litellm `num_retries`; default is now 3.
`test_embedding_service_concurrency::test_embed_texts_caps_and_queues`	`peak >= 2`	`peak == 1`	#153 added `_MODEL_ENCODE_LOCK` serializing inference (tensor-race fix); encode peak is always 1.

Also includes pre-existing ruff-format drift in billing_meter.py and extraction/outcome.py (formatting only).

Verification

Full OSS unit + integration suite: 3061 passed, 0 failed (pytest tests/ --ignore=tests/e2e_tests/, mocked LLM).

Summary by CodeRabbit

Tests
- Updated test assertions for concurrency behavior and configuration defaults
Refactor
- Improved internal code structure and formatting for maintainability

coderabbitai · 2026-06-16T10:42:29Z

Warning

Review limit reached

@yilu331, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 31 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 918f93a5-2ff6-4c25-a2c0-181d39658f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 7eda9b5 and 49cb14d.

📒 Files selected for processing (1)

tests/server/llm/test_litellm_client.py

📝 Walkthrough

Walkthrough

Five small, independent edits: _INTERNAL in billing_meter.py is changed from a str to a tuple; ExtractionOutcome.completed is reformatted with no logic change; three test assertions are updated to match current runtime behavior (recorder.peak == 1, max_retries == 3, enabled=True).

Changes

Miscellaneous Fixes and Test Alignment

Layer / File(s)	Summary
Source constant and classmethod reformat `reflexio/server/billing_meter.py`, `reflexio/server/services/extraction/outcome.py`	`_INTERNAL` is redefined as a tuple wrapping `"internal"` instead of a plain string; `ExtractionOutcome.completed` `return cls(...)` is split across multiple lines with identical arguments.
Test assertion alignment `tests/server/llm/test_embedding_service_concurrency.py`, `tests/server/llm/test_litellm_client.py`, `tests/server/services/test_unified_search_floor.py`	Concurrency test expects `recorder.peak == 1` to match `_MODEL_ENCODE_LOCK` serialization; `LiteLLMConfig.max_retries` default expectation changed from `1` to `3`; `RetrievalFloorConfig` in search floor test gains explicit `enabled=True`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

ReflexioAI/reflexio#153: Introduced _MODEL_ENCODE_LOCK to serialize model inference in embedding_service.py, directly motivating the recorder.peak == 1 assertion update in this PR.

Poem

🐇 A tuple here, a peak of one,
Three retries where there once was one,
enabled=True makes the floor run,
Multi-line returns, neatly done —
Small hops forward, every one! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main objective of fixing three stale test assertions to align with intentional defaults.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci/check-and-test-nightly-20260616

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…rmat Three OSS test failures were stale assertions that lagged intentional behavior changes on main; fix the tests (the application behavior is correct): - test_unified_search_floor: RetrievalFloorConfig.enabled default flipped True->False, so the floor-applied test must opt in with enabled=True. - test_litellm_client::test_config_defaults: max_retries default is now 3 (native litellm num_retries adoption, #124), not 1. - test_embedding_service_concurrency: model inference is now serialized by _MODEL_ENCODE_LOCK (#153, fixes a tensor race), so recorder.peak is always 1; the old `peak >= 2` assertion contradicted the new design. Also applies pre-existing ruff-format drift to billing_meter.py and extraction/outcome.py (formatting only, no behavior change).

yilu331 marked this pull request as ready for review June 16, 2026 16:28

yilu331 force-pushed the ci/check-and-test-nightly-20260616 branch from 7eda9b5 to 49cb14d Compare June 16, 2026 16:32

yilu331 merged commit 6e85810 into main Jun 16, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): align 3 stale OSS test assertions with intentional defaults#164

fix(tests): align 3 stale OSS test assertions with intentional defaults#164
yilu331 merged 1 commit into
mainfrom
ci/check-and-test-nightly-20260616

yilu331 commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yilu331 commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yilu331 commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading