Add service-mode FastMCP integration for local and remote agents by jdye64 · Pull Request #2263 · NVIDIA/NeMo-Retriever

jdye64 · 2026-06-24T14:13:05Z

Adds a FastMCP integration for NeMo Retriever service mode so agents can use the service locally over stdio or remotely over HTTP.

Changes

Mounts a FastMCP HTTP endpoint at /mcp from retriever service start
Adds retriever service mcp-stdio for local stdio-based agent configs
Adds MCP tools for:
- health checks
- pipeline config inspection
- document ingestion
- job/document status lookup
- VectorDB query
- answer generation
Supports remote document upload through inline/base64 content
Keeps write tools enabled by default
Reuses the existing service bearer-token middleware for /mcp
Documents local stdio and remote HTTP MCP usage
Adds focused MCP tests

upload-artifact flattens nemo_retriever/dist into ./dist on download; use find instead of ./dist/nemo_retriever/dist/*. Re-run failed jobs does not pick up workflow changes — dispatch a new run after merging.

copy-pr-bot · 2026-06-24T14:13:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-24T14:18:08Z

Greptile Summary

This PR adds a FastMCP integration to the NeMo Retriever service mode, mounting a /mcp HTTP endpoint on service start and adding a retriever service mcp-stdio CLI command for local stdio-based agents. The new mcp_server.py module is a thin facade over the existing service HTTP API, sharing the bearer-token auth middleware and exposing tools for health checks, pipeline inspection, document ingestion, job/document status, VectorDB query, and answer generation.

MCPConfig is added to ServiceConfig with sensible defaults and path validation; mcp.enabled: true by default means the endpoint activates automatically on upgrade for all existing deployments.
The ingest_documents MCP tool accepts either a content_base64 payload or a server-side path; for the HTTP /mcp endpoint the path route is not restricted to any safe directory, allowing any authenticated agent to read arbitrary server-visible files (see security finding).
Three focused unit tests cover settings derivation, query auth header forwarding, and the base64 ingest flow using httpx.MockTransport; the auth-protection assertion is weaker than it could be.

Confidence Score: 3/5

The core feature is well-structured and the auth middleware is correctly reused, but the unrestricted server-side file read in _document_bytes is a real data-exfiltration risk that should be addressed before merging.

The new mcp_server.py deliberately allows agents to specify arbitrary server filesystem paths via the ingest_documents path parameter. When accessed over the HTTP /mcp endpoint, any valid bearer-token holder can read files outside the intended document scope — config files, TLS certificates, or any file the service process can open. The documentation advises remote agents to use base64 instead, but nothing in the code enforces that boundary. Until that restriction is added (or path is blocked entirely on the HTTP transport), the endpoint creates an intentional but unconstrained file-read surface.

nemo_retriever/src/nemo_retriever/service/mcp_server.py — specifically _document_bytes and the lack of any path allowlist for the HTTP-mounted variant of the tool.

Security Review

Server-side file read / data exfiltration (mcp_server.py _document_bytes): The ingest_documents MCP tool accepts a path parameter that is resolved against the server filesystem without any directory restriction. When the /mcp HTTP endpoint is mounted and auth is enabled, any bearer-token holder can supply an arbitrary server path (e.g. /etc/shadow, ~/.config/..., TLS keys) and have its bytes uploaded as a document, exfiltrating the file contents through the normal ingestion API response. The documentation advises remote agents to use content_base64 instead, but this is not enforced at the code level.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/service/mcp_server.py	New FastMCP facade over the service HTTP API; contains an unrestricted server-side file read via the `path` parameter accessible to any authenticated remote agent, no file-size guard before `read_bytes()`, redundant `_json_or_text` logic, and missing return type on `build_mcp_app`.
nemo_retriever/src/nemo_retriever/service/app.py	Mounts the FastMCP ASGI app at `/mcp` during `create_app`; the broad `except Exception:` catch during MCP init will silently degrade programming errors rather than surfacing them at startup.
nemo_retriever/src/nemo_retriever/service/config.py	Adds well-structured `MCPConfig` Pydantic model with path validation; defaults to `enabled: true`, which auto-enables the MCP endpoint for all existing deployments on upgrade.
nemo_retriever/src/nemo_retriever/service/cli.py	Adds `mcp-stdio` Typer command that wires CLI flags into `ServiceMCPSettings` and runs the FastMCP server over stdio; clean and straightforward.
nemo_retriever/tests/test_service_mcp.py	Adds three focused MCP tests using `httpx.MockTransport`; the auth-protection test assertion is too permissive (`!= 401`) and could pass on a broken mount.
nemo_retriever/src/nemo_retriever/service/retriever-service.yaml	Adds the `mcp:` config block to the default YAML; values match `MCPConfig` defaults and are clearly commented.
docs/docs/extraction/workflow-agentic-retrieval.md	Adds MCP access documentation covering stdio and HTTP usage; correctly advises remote agents to use base64 content rather than server paths.
.github/workflows/reusable-pypi-publish.yml	Minor comment improvement clarifying the `upload-artifact` flat-layout behavior; no logic change.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent as Agent (local/remote)
    participant MCP as FastMCP at /mcp
    participant Auth as BearerAuthMiddleware
    participant Client as ServiceMCPClient
    participant Service as Retriever Service HTTP

    Note over Agent,Service: HTTP mode (remote agent)
    Agent->>Auth: GET/POST /mcp (Bearer token)
    Auth-->>Agent: 401 if token invalid
    Auth->>MCP: Forward request
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP /v1/ingest/job, /v1/query, /v1/answer, etc.
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: MCP tool result

    Note over Agent,Service: stdio mode (local agent)
    Agent->>MCP: stdin JSON-RPC
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP calls to base_url
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: stdout JSON-RPC result

    Note over Agent,Client: ingest_documents with path parameter
    Agent->>MCP: "ingest_documents(path='/etc/passwd')"
    MCP->>Client: _document_bytes reads path.read_bytes()
    Client->>Service: "POST /v1/ingest/job/{id}/document (file bytes)"

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent as Agent (local/remote)
    participant MCP as FastMCP at /mcp
    participant Auth as BearerAuthMiddleware
    participant Client as ServiceMCPClient
    participant Service as Retriever Service HTTP

    Note over Agent,Service: HTTP mode (remote agent)
    Agent->>Auth: GET/POST /mcp (Bearer token)
    Auth-->>Agent: 401 if token invalid
    Auth->>MCP: Forward request
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP /v1/ingest/job, /v1/query, /v1/answer, etc.
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: MCP tool result

    Note over Agent,Service: stdio mode (local agent)
    Agent->>MCP: stdin JSON-RPC
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP calls to base_url
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: stdout JSON-RPC result

    Note over Agent,Client: ingest_documents with path parameter
    Agent->>MCP: "ingest_documents(path='/etc/passwd')"
    MCP->>Client: _document_bytes reads path.read_bytes()
    Client->>Service: "POST /v1/ingest/job/{id}/document (file bytes)"

Comments Outside Diff (3)

nemo_retriever/src/nemo_retriever/service/mcp_server.py, line 609-624 (link)

Unrestricted server-side file read via HTTP /mcp endpoint

When the MCP app is mounted at /mcp, authenticated remote agents can pass path: "/etc/passwd" (or any other server-visible path) in an ingest_documents request, and the server will read and upload that file's contents. There is no directory restriction or allowlist — Path(doc.path).expanduser() resolves the path freely. The documentation advises remote agents to use content_base64 instead, but nothing in the code enforces this. An attacker who obtains the bearer token, or any authorized-but-malicious agent, can exfiltrate arbitrary files accessible to the service process (config files, TLS keys, credentials under ~/.config, etc.).

Prompt To Fix With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py
Line: 609-624

Comment:
**Unrestricted server-side file read via HTTP `/mcp` endpoint**

When the MCP app is mounted at `/mcp`, authenticated remote agents can pass `path: "/etc/passwd"` (or any other server-visible path) in an `ingest_documents` request, and the server will read and upload that file's contents. There is no directory restriction or allowlist — `Path(doc.path).expanduser()` resolves the path freely. The documentation advises remote agents to use `content_base64` instead, but nothing in the code enforces this. An attacker who obtains the bearer token, or any authorized-but-malicious agent, can exfiltrate arbitrary files accessible to the service process (config files, TLS keys, credentials under `~/.config`, etc.).

How can I resolve this? If you propose a fix, please make it concise.

nemo_retriever/src/nemo_retriever/service/mcp_server.py, line 610-614 (link)

No file size limit before reading into memory

path.read_bytes() loads the entire file into memory without any size check. The security rules for this codebase require validating file sizes at every entry point to prevent OOM. A multi-GB file (e.g. a large PDF) submitted by an agent would be read in full before the HTTP upload even starts, potentially exhausting the server's heap.

Prompt To Fix With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py
Line: 610-614

Comment:
**No file size limit before reading into memory**

`path.read_bytes()` loads the entire file into memory without any size check. The security rules for this codebase require validating file sizes at every entry point to prevent OOM. A multi-GB file (e.g. a large PDF) submitted by an agent would be read in full before the HTTP upload even starts, potentially exhausting the server's heap.

How can I resolve this? If you propose a fix, please make it concise.

nemo_retriever/tests/test_service_mcp.py, line 940-944 (link)

Weak auth-protection assertion

assert authorized.status_code != 401 only confirms the MCP endpoint doesn't return a 401, but would silently pass for a 500, 404, or 405 returned by a broken mount. Asserting a specific expected status code (e.g. 200 or 307) would catch regressions where the endpoint mounts but fails for an unrelated reason.
Prompt To Fix With AI
```
This is a comment left during a code review.
Path: nemo_retriever/tests/test_service_mcp.py
Line: 940-944

Comment:
**Weak auth-protection assertion**

`assert authorized.status_code != 401` only confirms the MCP endpoint doesn't return a 401, but would silently pass for a 500, 404, or 405 returned by a broken mount. Asserting a specific expected status code (e.g. 200 or 307) would catch regressions where the endpoint mounts but fails for an unrelated reason.

How can I resolve this? If you propose a fix, please make it concise.
```
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:609-624
**Unrestricted server-side file read via HTTP `/mcp` endpoint**

When the MCP app is mounted at `/mcp`, authenticated remote agents can pass `path: "/etc/passwd"` (or any other server-visible path) in an `ingest_documents` request, and the server will read and upload that file's contents. There is no directory restriction or allowlist — `Path(doc.path).expanduser()` resolves the path freely. The documentation advises remote agents to use `content_base64` instead, but nothing in the code enforces this. An attacker who obtains the bearer token, or any authorized-but-malicious agent, can exfiltrate arbitrary files accessible to the service process (config files, TLS keys, credentials under `~/.config`, etc.).

### Issue 2 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:521-523
`build_mcp_app` is a public function but is missing a return type annotation, violating the `type-hints-public-api` rule.

```suggestion
def build_mcp_app(settings: ServiceMCPSettings | None = None) -> Any:
    """Create the ASGI app mounted by ``retriever service start``."""
    return build_mcp(settings).http_app(path="/")
```

### Issue 3 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:140-148
The first branch of `_json_or_text` is redundant. When `content-type` contains `"application/json"` the code calls `resp.json()` immediately; when it doesn't, the `try` block still calls `resp.json()` and only falls back to `{"text": resp.text}` on failure. The `if "application/json"` guard therefore never changes behaviour and can be removed.

```suggestion
    @staticmethod
    def _json_or_text(resp: httpx.Response) -> Any:
        try:
            return resp.json()
        except ValueError:
            return {"text": resp.text}
```

### Issue 4 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:610-614
**No file size limit before reading into memory**

`path.read_bytes()` loads the entire file into memory without any size check. The security rules for this codebase require validating file sizes at every entry point to prevent OOM. A multi-GB file (e.g. a large PDF) submitted by an agent would be read in full before the HTTP upload even starts, potentially exhausting the server's heap.

### Issue 5 of 6
nemo_retriever/tests/test_service_mcp.py:940-944
**Weak auth-protection assertion**

`assert authorized.status_code != 401` only confirms the MCP endpoint doesn't return a 401, but would silently pass for a 500, 404, or 405 returned by a broken mount. Asserting a specific expected status code (e.g. 200 or 307) would catch regressions where the endpoint mounts but fails for an unrelated reason.

### Issue 6 of 6
nemo_retriever/src/nemo_retriever/service/app.py:257-259
**Bare `except Exception:` silently masks MCP init bugs**

Catching every exception here (including programming errors thrown by `build_mcp_app` or `combine_lifespans`) means a coding mistake introduced in `mcp_server.py` will degrade silently to "MCP not mounted" rather than surfacing at startup. Consider narrowing to `ImportError` for the optional-dependency case and letting other errors propagate, or at minimum catching as `except Exception as exc:` so the warning can include the exception message inline rather than relying solely on `exc_info=True`.

_{Reviews (1): Last reviewed commit: "Add service-mode FastMCP integration" | Re-trigger Greptile}

greptile-apps · 2026-06-24T14:18:13Z

+def build_mcp_app(settings: ServiceMCPSettings | None = None):
+    """Create the ASGI app mounted by ``retriever service start``."""
+    return build_mcp(settings).http_app(path="/")


build_mcp_app is a public function but is missing a return type annotation, violating the type-hints-public-api rule.

Suggested change

def build_mcp_app(settings: ServiceMCPSettings | None = None):

"""Create the ASGI app mounted by ``retriever service start``."""

return build_mcp(settings).http_app(path="/")

def build_mcp_app(settings: ServiceMCPSettings | None = None) -> Any:

"""Create the ASGI app mounted by ``retriever service start``."""

return build_mcp(settings).http_app(path="/")

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py Line: 521-523 Comment: `build_mcp_app` is a public function but is missing a return type annotation, violating the `type-hints-public-api` rule. ```suggestion def build_mcp_app(settings: ServiceMCPSettings | None = None) -> Any: """Create the ASGI app mounted by ``retriever service start``.""" return build_mcp(settings).http_app(path="/") ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-24T14:18:14Z

+    @staticmethod
+    def _json_or_text(resp: httpx.Response) -> Any:
+        content_type = resp.headers.get("content-type", "")
+        if "application/json" in content_type:
+            return resp.json()
+        try:
+            return resp.json()
+        except ValueError:
+            return {"text": resp.text}


The first branch of _json_or_text is redundant. When content-type contains "application/json" the code calls resp.json() immediately; when it doesn't, the try block still calls resp.json() and only falls back to {"text": resp.text} on failure. The if "application/json" guard therefore never changes behaviour and can be removed.

Suggested change

@staticmethod

def _json_or_text(resp: httpx.Response) -> Any:

content_type = resp.headers.get("content-type", "")

if "application/json" in content_type:

return resp.json()

try:

return resp.json()

except ValueError:

return {"text": resp.text}

@staticmethod

def _json_or_text(resp: httpx.Response) -> Any:

try:

return resp.json()

except ValueError:

return {"text": resp.text}

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py Line: 140-148 Comment: The first branch of `_json_or_text` is redundant. When `content-type` contains `"application/json"` the code calls `resp.json()` immediately; when it doesn't, the `try` block still calls `resp.json()` and only falls back to `{"text": resp.text}` on failure. The `if "application/json"` guard therefore never changes behaviour and can be removed. ```suggestion @staticmethod def _json_or_text(resp: httpx.Response) -> Any: try: return resp.json() except ValueError: return {"text": resp.text} ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-24T14:18:17Z

+        except Exception:
+            mcp_asgi_app = None
+            logger.warning("FastMCP service integration failed to initialise; /mcp will not be mounted", exc_info=True)


Bare except Exception: silently masks MCP init bugs

Catching every exception here (including programming errors thrown by build_mcp_app or combine_lifespans) means a coding mistake introduced in mcp_server.py will degrade silently to "MCP not mounted" rather than surfacing at startup. Consider narrowing to ImportError for the optional-dependency case and letting other errors propagate, or at minimum catching as except Exception as exc: so the warning can include the exception message inline rather than relying solely on exc_info=True.

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/service/app.py Line: 257-259 Comment: **Bare `except Exception:` silently masks MCP init bugs** Catching every exception here (including programming errors thrown by `build_mcp_app` or `combine_lifespans`) means a coding mistake introduced in `mcp_server.py` will degrade silently to "MCP not mounted" rather than surfacing at startup. Consider narrowing to `ImportError` for the optional-dependency case and letting other errors propagate, or at minimum catching as `except Exception as exc:` so the warning can include the exception message inline rather than relying solely on `exc_info=True`. How can I resolve this? If you propose a fix, please make it concise.

jdye64 added 14 commits May 6, 2026 11:03

Adjust location that uv creates venv

b6b9e1e

Fix PyPI publish wheel path and add artifact listing step

3412827

upload-artifact flattens nemo_retriever/dist into ./dist on download; use find instead of ./dist/nemo_retriever/dist/*. Re-run failed jobs does not pick up workflow changes — dispatch a new run after merging.

Merge remote-tracking branch 'upstream/main'

9233869

Merge remote-tracking branch 'upstream/main'

c2743c8

upmerge

68b2402

upmerge

8b443b2

Merge remote-tracking branch 'upstream/main'

63391f0

Merge remote-tracking branch 'upstream/main'

126feed

Merge remote-tracking branch 'upstream/main'

79e05eb

Merge remote-tracking branch 'nvidia/main'

b4c6369

Merge branch 'main' of github.com:jdye64/nv-ingest

69e9ab1

Merge remote-tracking branch 'upstream/main'

13298c9

Merge remote-tracking branch 'upstream/main'

aa783a7

Add service-mode FastMCP integration

89c8fe3

jdye64 requested review from a team as code owners June 24, 2026 14:13

jdye64 requested a review from drobison00 June 24, 2026 14:13

greptile-apps Bot reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add service-mode FastMCP integration for local and remote agents#2263

Add service-mode FastMCP integration for local and remote agents#2263
jdye64 wants to merge 14 commits into
NVIDIA:mainfrom
jdye64:feature/service-mode-mcp

jdye64 commented Jun 24, 2026

Uh oh!

copy-pr-bot Bot commented Jun 24, 2026

Uh oh!

greptile-apps Bot commented Jun 24, 2026 •

edited

Loading

Confidence Score: 3/5

Security Review

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jun 24, 2026

Uh oh!

greptile-apps Bot Jun 24, 2026

Uh oh!

greptile-apps Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jdye64 commented Jun 24, 2026

Uh oh!

copy-pr-bot Bot commented Jun 24, 2026

Uh oh!

greptile-apps Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Security Review

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 24, 2026 •

edited

Loading