Skip to content

[MLI-6891] feat: wire forwarder_max_concurrency through K8s forwarder template#836

Open
diazagasatya wants to merge 2 commits into
mainfrom
feat/MLI-6891-forwarder-max-concurrency-template-wiring
Open

[MLI-6891] feat: wire forwarder_max_concurrency through K8s forwarder template#836
diazagasatya wants to merge 2 commits into
mainfrom
feat/MLI-6891-forwarder-max-concurrency-template-wiring

Conversation

@diazagasatya
Copy link
Copy Markdown
Collaborator

@diazagasatya diazagasatya commented May 29, 2026

Linear: MLI-6891
Follow-up to: #835 (MLI-6876)

Why

MLI-6876 added forwarder_max_concurrency to the API schema and plumbed it through to ResourceArguments (11 injection sites). But the K8s manifest templates never referenced ${FORWARDER_MAX_CONCURRENCY} — so the field was silently dropped at the rendering layer. API callers could set the field with no actual effect on the running forwarder.

This PR is the load-bearing piece that makes the field actually work end-to-end on the llm-engine API path.

What changes

  • Adds --set "max_concurrency=${FORWARDER_MAX_CONCURRENCY}" to all 6 http-forwarder containers in service_template_config_map_circleci.yaml and the corresponding 4 forwarder containers in charts/model-engine/templates/service_template_config_map.yaml
  • Adds a _strip_optional_set_pairs preprocessor in k8s_endpoint_resource_delegate.py that drops any - --set / - "...${KEY}..." arg pair from the template when KEY's value is None. Generic — usable by any future optional kwarg using the same pattern
  • Updates forwarder_max_concurrency's description in the DTOs to reflect the actual fallback (config-file default, not per_worker — earlier docstring was inaccurate)

Backward compatibility

Zero behavior change for any existing endpoint that doesn't explicitly set forwarder_max_concurrency:

Caller state Rendered manifest Forwarder behavior
forwarder_max_concurrency=None (current default) No --set max_concurrency line Uses config-file default (100) — same as today
forwarder_max_concurrency=5 --set "max_concurrency=5" Forwarder admits 5 in-flight requests

Tests

Added three focused unit tests for _strip_optional_set_pairs:

  • test_strip_optional_set_pairs_drops_when_none
  • test_strip_optional_set_pairs_keeps_when_set
  • test_strip_optional_set_pairs_handles_unknown_key

Also ran the full test_k8s_endpoint_resource_delegate.py suite locally — 59 passed, 0 failed, including the cross-cutting test_resource_arguments_type_and_add_datadog_env_to_main_container that renders every forwarder container with full substitution.

Out of scope

  • OpenAPI spec regen: this PR only changes a description string on existing fields (no schema-level changes), so I skipped the regen step. Happy to add a follow-up commit with regen if reviewers prefer.

Rollout chain status

Ticket Repo State
MLI-5366 scaleapi/models Merged
MLI-6876 scaleapi/llm-engine Merged (#835)
🟡 MLI-6875 scaleapi/launch-python-client PR #180 open
MLI-6874 scaleapi/models Not started
🟡 MLI-6891 scaleapi/llm-engine this PR

After MLI-6891 + MLI-6876 are both in the next llm-engine release, forwarder_max_concurrency is functional end-to-end on the API path.

🤖 Generated with Claude Code

Greptile Summary

This PR completes the forwarder_max_concurrency end-to-end wire-up by adding --set "max_concurrency=${FORWARDER_MAX_CONCURRENCY}" to forwarder container templates and introducing a _strip_optional_set_pairs preprocessor that cleanly removes the arg-pair from the rendered manifest when the value is None, preserving backward compatibility.

  • Adds the max_concurrency arg to 6 of 7 http-forwarder containers in the CircleCI template and all 4 in the Helm chart; the LWS streaming GPU container in the CircleCI template was missed (see comment).
  • Introduces _strip_optional_set_pairs as a generic preprocessor for optional --set kwargs — correctly placed before safe_substitute and covered by three new unit tests.
  • Corrects the forwarder_max_concurrency docstring on both Create and Update DTOs to reflect the actual config-file fallback default (100) rather than the previously incorrect per_worker description.

Confidence Score: 4/5

Safe to merge for all production traffic; only the CircleCI test template for LWS streaming GPU endpoints would render an incorrect manifest when forwarder_max_concurrency is explicitly set.

The Helm chart (production) is correctly updated for all four forwarder containers. The one gap is the LWS streaming http-forwarder in the CircleCI template, which was not updated to include max_concurrency. Any CI test that renders an LWS streaming GPU deployment with a non-None forwarder_max_concurrency would produce a manifest that diverges from the production Helm output, potentially masking the omission during review.

model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml — the LWS streaming GPU forwarder block (~line 2797) is missing the two-line max_concurrency addition present in every other updated container.

Important Files Changed

Filename Overview
model-engine/model_engine_server/infra/gateways/resources/k8s_endpoint_resource_delegate.py Adds _strip_optional_set_pairs preprocessor that removes --set "...${KEY}..." arg-pairs for None-valued kwargs before template substitution; correctly placed before safe_substitute. The generic nature of the preprocessor also silently changes behavior for other existing Optional kwargs (e.g. FORWARDER_TYPE=None), though that prior behavior was already broken.
model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml Adds max_concurrency=${FORWARDER_MAX_CONCURRENCY} to 6 http-forwarder containers, but misses the LWS streaming GPU forwarder container (~line 2797), leaving it inconsistent with the Helm chart which correctly updated all 4 equivalent containers including its LWS streaming path.
charts/model-engine/templates/service_template_config_map.yaml Adds max_concurrency=${FORWARDER_MAX_CONCURRENCY} to all 4 forwarder containers (sync, streaming, LWS sync, LWS streaming), matching the expected coverage.
model-engine/model_engine_server/common/dtos/model_endpoints.py Corrects the docstring for forwarder_max_concurrency on both Create and Update DTOs to say the fallback is the config-file default (100), not per_worker.
model-engine/tests/unit/infra/gateways/resources/test_k8s_endpoint_resource_delegate.py Adds three focused unit tests for _strip_optional_set_pairs covering the drop-when-None, keep-when-set, and unknown-key no-op cases.

Comments Outside Diff (1)

  1. model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml, line 2797-2800 (link)

    P1 Missing max_concurrency in LWS streaming GPU forwarder

    The LeaderWorkerSet streaming GPU http-forwarder container (starting ~line 2769) was not updated, while the Helm chart's equivalent path ({{- else if eq $mode "streaming" }} in the LWS section, line 677) was correctly updated. LeaderWorkerSetRunnableImageStreamingGpuArguments extends _BaseDeploymentArguments, which declares FORWARDER_MAX_CONCURRENCY: Optional[int], so when a non-None value is passed the rendered CI manifest will silently omit the --set max_concurrency arg while the production manifest includes it. Add the two lines below after line 2799 to match every other forwarder container in this file:

                      - --set
                      - "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"
    
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml
    Line: 2797-2800
    
    Comment:
    **Missing `max_concurrency` in LWS streaming GPU forwarder**
    
    The LeaderWorkerSet streaming GPU http-forwarder container (starting ~line 2769) was not updated, while the Helm chart's equivalent path (`{{- else if eq $mode "streaming" }}` in the LWS section, line 677) was correctly updated. `LeaderWorkerSetRunnableImageStreamingGpuArguments` extends `_BaseDeploymentArguments`, which declares `FORWARDER_MAX_CONCURRENCY: Optional[int]`, so when a non-None value is passed the rendered CI manifest will silently omit the `--set max_concurrency` arg while the production manifest includes it. Add the two lines below after line 2799 to match every other forwarder container in this file:
    
    ```
                      - --set
                      - "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Cursor Fix in Claude Code Fix in Codex

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml:2797-2800
**Missing `max_concurrency` in LWS streaming GPU forwarder**

The LeaderWorkerSet streaming GPU http-forwarder container (starting ~line 2769) was not updated, while the Helm chart's equivalent path (`{{- else if eq $mode "streaming" }}` in the LWS section, line 677) was correctly updated. `LeaderWorkerSetRunnableImageStreamingGpuArguments` extends `_BaseDeploymentArguments`, which declares `FORWARDER_MAX_CONCURRENCY: Optional[int]`, so when a non-None value is passed the rendered CI manifest will silently omit the `--set max_concurrency` arg while the production manifest includes it. Add the two lines below after line 2799 to match every other forwarder container in this file:

```
                  - --set
                  - "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"
```

Reviews (2): Last reviewed commit: "[MLI-6891] style: apply black 24.8.0 for..." | Re-trigger Greptile

Adds `--set "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"` to all 6
http-forwarder containers in the service template (CircleCI + chart
variants). The flag is rendered conditionally — when the kwarg is
None, a new `_strip_optional_set_pairs` preprocessor in
`load_k8s_yaml` drops the `--set` arg-pair entirely, so existing
endpoints fall back to the forwarder's config-file default (100)
with zero behavior change.

Also fixes the description on `forwarder_max_concurrency` to reflect
the actual fallback (config-file default, not per_worker).

Follow-up to MLI-6876 (#835). Without this, the API field is
silently dropped at the manifest-rendering layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@diazagasatya diazagasatya requested a review from a team May 29, 2026 00:51
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants