test: fix stale/environment-fragile tests (nightly check-and-test) by yilu331 · Pull Request #163 · ReflexioAI/reflexio

yilu331 · 2026-06-16T09:14:59Z

Companion to ReflexioAI/reflexio-enterprise#240, opened by the nightly check-and-test run.

These tests passed in isolation but failed in the full run because they were stale relative to intentional behavior changes, or fragile to the CI environment:

test_env_loader_mode: isolate ambient BACKEND_PORT — the loader is override=False, so a BACKEND_PORT exported by the suite's shell wins over the file value.
test_unified_search_floor: enable the relevance floor explicitly; its default was flipped to off in fix(server): harden runtime defaults #161.
test_embedding_service_concurrency: assert serialized model inference (peak == 1); fix(embedding): serialize model inference to stop concurrent-encode tensor race #153 added _MODEL_ENCODE_LOCK to stop the concurrent-encode tensor race, so encodes no longer overlap.
test_recycle_smoke_integration: raise the /healthz readiness timeouts — a recycled worker reloads the embedder + cross-encoder, which exceeds the old budget on a contended box.
test_interaction_workflows: add a timeout(300) marker so cold embedder + reranker loads under the default -n auto run don't spuriously hit the 120s per-test timeout.

Also includes ruff-format normalization of two source files.

Verified: the fixed tests pass with and without the ambient env that originally broke them.

Summary by CodeRabbit

Tests
- Enhanced test robustness by isolating environment variables and strengthening assertions for concurrency control and unified search configuration.
- Increased timeout budgets for end-to-end workflows and integration tests to accommodate model loading under resource constraints.
Refactor
- Restructured internal billing constant and reformatted code for improved maintainability.

…nd-test - test_env_loader_mode: isolate ambient BACKEND_PORT (override=False contract) - test_unified_search_floor: enable floor explicitly (default flipped to off in #161) - test_embedding_service_concurrency: assert serialized encode (peak==1) after _MODEL_ENCODE_LOCK serialization in #153 - test_recycle_smoke_integration: raise readiness timeouts for ML model (re)load under CI contention - test_interaction_workflows: add per-test timeout(300) marker so cold embedder + reranker loads under -n auto don't spuriously time out Also includes ruff-format normalization of two source files.

coderabbitai · 2026-06-16T09:15:08Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 01a0a3e9-c10a-46a0-9140-af52354552a9

📥 Commits

Reviewing files that changed from the base of the PR and between 424649c and 7235e73.

📒 Files selected for processing (7)

reflexio/server/billing_meter.py
reflexio/server/services/extraction/outcome.py
tests/cli/test_env_loader_mode.py
tests/e2e_tests/test_interaction_workflows.py
tests/server/llm/test_embedding_service_concurrency.py
tests/server/services/test_unified_search_floor.py
tests/server/test_recycle_smoke_integration.py

📝 Walkthrough

Walkthrough

The _INTERNAL billing caller constant is changed from a string to a tuple. ExtractionOutcome.completed is cosmetically reformatted. Three tests are hardened for correctness: env isolation before loading a mode env file, explicit enabled=True in a floor config, and a stricter peak-encode-concurrency assertion of exactly 1. Four timeout budgets are extended to accommodate slow model loading in CI.

Changes

Billing, extraction, and test fixes

Layer / File(s)	Summary
Production: billing constant and extraction outcome formatting `reflexio/server/billing_meter.py`, `reflexio/server/services/extraction/outcome.py`	`_INTERNAL` is changed from a string literal to a single-element tuple while its usage as `caller_type` is unchanged. `ExtractionOutcome.completed` return is reformatted to multi-line with no behavioral change.
Test correctness: env isolation, floor config, embedding lock assertion `tests/cli/test_env_loader_mode.py`, `tests/server/services/test_unified_search_floor.py`, `tests/server/llm/test_embedding_service_concurrency.py`	`test_mode_filename` explicitly removes `BACKEND_PORT` from `os.environ` before the env-load call. `test_floor_applied_per_arm` sets `enabled=True` explicitly in `RetrievalFloorConfig`. `test_embed_texts_caps_and_queues` tightens the peak-concurrency assertion to exactly `1` to match model encode lock serialization semantics.
CI timeout increases for model-loading tests `tests/e2e_tests/test_interaction_workflows.py`, `tests/server/test_recycle_smoke_integration.py`	E2e interaction workflow `pytestmark` adds `pytest.mark.timeout(300)` for cold model-load scenarios. The recycle smoke test increases initial startup `/healthz` polling from 45 s to 90 s and post-recycle polling from 15 s to 90 s.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

ReflexioAI/reflexio#153: Introduced the shared model encode lock that serializes embedding inference, directly motivating this PR's change of the peak-concurrency assertion to exactly 1.
ReflexioAI/reflexio#126: Introduced _INTERNAL = "internal" in billing_meter.py; this PR changes that constant to a tuple.
ReflexioAI/reflexio#60: Introduced the daemon-mode recycling smoke test and /healthz-based readiness polling whose timeout values are extended in this PR.

Poem

🐇 A tuple instead of a string, what a curious thing,
The tests now know to clear the port before they sing,
Peak concurrency locked to one — fair and square,
Ninety seconds for models to load, with room to spare,
Small fixes stitched together with a rabbit's care! 🎉

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci/check-and-test-nightly-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yilu331 marked this pull request as ready for review June 16, 2026 16:28

yilu331 merged commit b56a541 into main Jun 16, 2026
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix stale/environment-fragile tests (nightly check-and-test)#163

test: fix stale/environment-fragile tests (nightly check-and-test)#163
yilu331 merged 1 commit into
mainfrom
ci/check-and-test-nightly-fixes

yilu331 commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yilu331 commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yilu331 commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading