Skip to content

test: fix stale/environment-fragile tests (nightly check-and-test)#163

Merged
yilu331 merged 1 commit into
mainfrom
ci/check-and-test-nightly-fixes
Jun 16, 2026
Merged

test: fix stale/environment-fragile tests (nightly check-and-test)#163
yilu331 merged 1 commit into
mainfrom
ci/check-and-test-nightly-fixes

Conversation

@yilu331

@yilu331 yilu331 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Companion to ReflexioAI/reflexio-enterprise#240, opened by the nightly check-and-test run.

These tests passed in isolation but failed in the full run because they were stale relative to intentional behavior changes, or fragile to the CI environment:

  • test_env_loader_mode: isolate ambient BACKEND_PORT — the loader is override=False, so a BACKEND_PORT exported by the suite's shell wins over the file value.
  • test_unified_search_floor: enable the relevance floor explicitly; its default was flipped to off in fix(server): harden runtime defaults #161.
  • test_embedding_service_concurrency: assert serialized model inference (peak == 1); fix(embedding): serialize model inference to stop concurrent-encode tensor race #153 added _MODEL_ENCODE_LOCK to stop the concurrent-encode tensor race, so encodes no longer overlap.
  • test_recycle_smoke_integration: raise the /healthz readiness timeouts — a recycled worker reloads the embedder + cross-encoder, which exceeds the old budget on a contended box.
  • test_interaction_workflows: add a timeout(300) marker so cold embedder + reranker loads under the default -n auto run don't spuriously hit the 120s per-test timeout.

Also includes ruff-format normalization of two source files.

Verified: the fixed tests pass with and without the ambient env that originally broke them.

Summary by CodeRabbit

  • Tests

    • Enhanced test robustness by isolating environment variables and strengthening assertions for concurrency control and unified search configuration.
    • Increased timeout budgets for end-to-end workflows and integration tests to accommodate model loading under resource constraints.
  • Refactor

    • Restructured internal billing constant and reformatted code for improved maintainability.

…nd-test

- test_env_loader_mode: isolate ambient BACKEND_PORT (override=False contract)
- test_unified_search_floor: enable floor explicitly (default flipped to off in #161)
- test_embedding_service_concurrency: assert serialized encode (peak==1) after
  _MODEL_ENCODE_LOCK serialization in #153
- test_recycle_smoke_integration: raise readiness timeouts for ML model (re)load
  under CI contention
- test_interaction_workflows: add per-test timeout(300) marker so cold embedder +
  reranker loads under -n auto don't spuriously time out

Also includes ruff-format normalization of two source files.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 01a0a3e9-c10a-46a0-9140-af52354552a9

📥 Commits

Reviewing files that changed from the base of the PR and between 424649c and 7235e73.

📒 Files selected for processing (7)
  • reflexio/server/billing_meter.py
  • reflexio/server/services/extraction/outcome.py
  • tests/cli/test_env_loader_mode.py
  • tests/e2e_tests/test_interaction_workflows.py
  • tests/server/llm/test_embedding_service_concurrency.py
  • tests/server/services/test_unified_search_floor.py
  • tests/server/test_recycle_smoke_integration.py

📝 Walkthrough

Walkthrough

The _INTERNAL billing caller constant is changed from a string to a tuple. ExtractionOutcome.completed is cosmetically reformatted. Three tests are hardened for correctness: env isolation before loading a mode env file, explicit enabled=True in a floor config, and a stricter peak-encode-concurrency assertion of exactly 1. Four timeout budgets are extended to accommodate slow model loading in CI.

Changes

Billing, extraction, and test fixes

Layer / File(s) Summary
Production: billing constant and extraction outcome formatting
reflexio/server/billing_meter.py, reflexio/server/services/extraction/outcome.py
_INTERNAL is changed from a string literal to a single-element tuple while its usage as caller_type is unchanged. ExtractionOutcome.completed return is reformatted to multi-line with no behavioral change.
Test correctness: env isolation, floor config, embedding lock assertion
tests/cli/test_env_loader_mode.py, tests/server/services/test_unified_search_floor.py, tests/server/llm/test_embedding_service_concurrency.py
test_mode_filename explicitly removes BACKEND_PORT from os.environ before the env-load call. test_floor_applied_per_arm sets enabled=True explicitly in RetrievalFloorConfig. test_embed_texts_caps_and_queues tightens the peak-concurrency assertion to exactly 1 to match model encode lock serialization semantics.
CI timeout increases for model-loading tests
tests/e2e_tests/test_interaction_workflows.py, tests/server/test_recycle_smoke_integration.py
E2e interaction workflow pytestmark adds pytest.mark.timeout(300) for cold model-load scenarios. The recycle smoke test increases initial startup /healthz polling from 45 s to 90 s and post-recycle polling from 15 s to 90 s.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • ReflexioAI/reflexio#153: Introduced the shared model encode lock that serializes embedding inference, directly motivating this PR's change of the peak-concurrency assertion to exactly 1.
  • ReflexioAI/reflexio#126: Introduced _INTERNAL = "internal" in billing_meter.py; this PR changes that constant to a tuple.
  • ReflexioAI/reflexio#60: Introduced the daemon-mode recycling smoke test and /healthz-based readiness polling whose timeout values are extended in this PR.

Poem

🐇 A tuple instead of a string, what a curious thing,
The tests now know to clear the port before they sing,
Peak concurrency locked to one — fair and square,
Ninety seconds for models to load, with room to spare,
Small fixes stitched together with a rabbit's care! 🎉

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/check-and-test-nightly-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yilu331 yilu331 marked this pull request as ready for review June 16, 2026 16:28
@yilu331 yilu331 merged commit b56a541 into main Jun 16, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant