Skip to content

Add service-mode FastMCP integration for local and remote agents#2263

Open
jdye64 wants to merge 14 commits into
NVIDIA:mainfrom
jdye64:feature/service-mode-mcp
Open

Add service-mode FastMCP integration for local and remote agents#2263
jdye64 wants to merge 14 commits into
NVIDIA:mainfrom
jdye64:feature/service-mode-mcp

Conversation

@jdye64

@jdye64 jdye64 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Adds a FastMCP integration for NeMo Retriever service mode so agents can use the service locally over stdio or remotely over HTTP.

Changes

  • Mounts a FastMCP HTTP endpoint at /mcp from retriever service start

  • Adds retriever service mcp-stdio for local stdio-based agent configs

  • Adds MCP tools for:

    • health checks
    • pipeline config inspection
    • document ingestion
    • job/document status lookup
    • VectorDB query
    • answer generation
  • Supports remote document upload through inline/base64 content

  • Keeps write tools enabled by default

  • Reuses the existing service bearer-token middleware for /mcp

  • Documents local stdio and remote HTTP MCP usage

  • Adds focused MCP tests

@jdye64 jdye64 requested review from a team as code owners June 24, 2026 14:13
@jdye64 jdye64 requested a review from drobison00 June 24, 2026 14:13
@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a FastMCP integration to the NeMo Retriever service mode, mounting a /mcp HTTP endpoint on service start and adding a retriever service mcp-stdio CLI command for local stdio-based agents. The new mcp_server.py module is a thin facade over the existing service HTTP API, sharing the bearer-token auth middleware and exposing tools for health checks, pipeline inspection, document ingestion, job/document status, VectorDB query, and answer generation.

  • MCPConfig is added to ServiceConfig with sensible defaults and path validation; mcp.enabled: true by default means the endpoint activates automatically on upgrade for all existing deployments.
  • The ingest_documents MCP tool accepts either a content_base64 payload or a server-side path; for the HTTP /mcp endpoint the path route is not restricted to any safe directory, allowing any authenticated agent to read arbitrary server-visible files (see security finding).
  • Three focused unit tests cover settings derivation, query auth header forwarding, and the base64 ingest flow using httpx.MockTransport; the auth-protection assertion is weaker than it could be.

Confidence Score: 3/5

The core feature is well-structured and the auth middleware is correctly reused, but the unrestricted server-side file read in _document_bytes is a real data-exfiltration risk that should be addressed before merging.

The new mcp_server.py deliberately allows agents to specify arbitrary server filesystem paths via the ingest_documents path parameter. When accessed over the HTTP /mcp endpoint, any valid bearer-token holder can read files outside the intended document scope — config files, TLS certificates, or any file the service process can open. The documentation advises remote agents to use base64 instead, but nothing in the code enforces that boundary. Until that restriction is added (or path is blocked entirely on the HTTP transport), the endpoint creates an intentional but unconstrained file-read surface.

nemo_retriever/src/nemo_retriever/service/mcp_server.py — specifically _document_bytes and the lack of any path allowlist for the HTTP-mounted variant of the tool.

Security Review

  • Server-side file read / data exfiltration (mcp_server.py _document_bytes): The ingest_documents MCP tool accepts a path parameter that is resolved against the server filesystem without any directory restriction. When the /mcp HTTP endpoint is mounted and auth is enabled, any bearer-token holder can supply an arbitrary server path (e.g. /etc/shadow, ~/.config/..., TLS keys) and have its bytes uploaded as a document, exfiltrating the file contents through the normal ingestion API response. The documentation advises remote agents to use content_base64 instead, but this is not enforced at the code level.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/service/mcp_server.py New FastMCP facade over the service HTTP API; contains an unrestricted server-side file read via the path parameter accessible to any authenticated remote agent, no file-size guard before read_bytes(), redundant _json_or_text logic, and missing return type on build_mcp_app.
nemo_retriever/src/nemo_retriever/service/app.py Mounts the FastMCP ASGI app at /mcp during create_app; the broad except Exception: catch during MCP init will silently degrade programming errors rather than surfacing them at startup.
nemo_retriever/src/nemo_retriever/service/config.py Adds well-structured MCPConfig Pydantic model with path validation; defaults to enabled: true, which auto-enables the MCP endpoint for all existing deployments on upgrade.
nemo_retriever/src/nemo_retriever/service/cli.py Adds mcp-stdio Typer command that wires CLI flags into ServiceMCPSettings and runs the FastMCP server over stdio; clean and straightforward.
nemo_retriever/tests/test_service_mcp.py Adds three focused MCP tests using httpx.MockTransport; the auth-protection test assertion is too permissive (!= 401) and could pass on a broken mount.
nemo_retriever/src/nemo_retriever/service/retriever-service.yaml Adds the mcp: config block to the default YAML; values match MCPConfig defaults and are clearly commented.
docs/docs/extraction/workflow-agentic-retrieval.md Adds MCP access documentation covering stdio and HTTP usage; correctly advises remote agents to use base64 content rather than server paths.
.github/workflows/reusable-pypi-publish.yml Minor comment improvement clarifying the upload-artifact flat-layout behavior; no logic change.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent as Agent (local/remote)
    participant MCP as FastMCP at /mcp
    participant Auth as BearerAuthMiddleware
    participant Client as ServiceMCPClient
    participant Service as Retriever Service HTTP

    Note over Agent,Service: HTTP mode (remote agent)
    Agent->>Auth: GET/POST /mcp (Bearer token)
    Auth-->>Agent: 401 if token invalid
    Auth->>MCP: Forward request
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP /v1/ingest/job, /v1/query, /v1/answer, etc.
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: MCP tool result

    Note over Agent,Service: stdio mode (local agent)
    Agent->>MCP: stdin JSON-RPC
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP calls to base_url
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: stdout JSON-RPC result

    Note over Agent,Client: ingest_documents with path parameter
    Agent->>MCP: "ingest_documents(path='/etc/passwd')"
    MCP->>Client: _document_bytes reads path.read_bytes()
    Client->>Service: "POST /v1/ingest/job/{id}/document (file bytes)"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent as Agent (local/remote)
    participant MCP as FastMCP at /mcp
    participant Auth as BearerAuthMiddleware
    participant Client as ServiceMCPClient
    participant Service as Retriever Service HTTP

    Note over Agent,Service: HTTP mode (remote agent)
    Agent->>Auth: GET/POST /mcp (Bearer token)
    Auth-->>Agent: 401 if token invalid
    Auth->>MCP: Forward request
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP /v1/ingest/job, /v1/query, /v1/answer, etc.
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: MCP tool result

    Note over Agent,Service: stdio mode (local agent)
    Agent->>MCP: stdin JSON-RPC
    MCP->>Client: dispatch tool call
    Client->>Service: HTTP calls to base_url
    Service-->>Client: JSON response
    Client-->>MCP: result dict
    MCP-->>Agent: stdout JSON-RPC result

    Note over Agent,Client: ingest_documents with path parameter
    Agent->>MCP: "ingest_documents(path='/etc/passwd')"
    MCP->>Client: _document_bytes reads path.read_bytes()
    Client->>Service: "POST /v1/ingest/job/{id}/document (file bytes)"
Loading

Comments Outside Diff (3)

  1. nemo_retriever/src/nemo_retriever/service/mcp_server.py, line 609-624 (link)

    P1 security Unrestricted server-side file read via HTTP /mcp endpoint

    When the MCP app is mounted at /mcp, authenticated remote agents can pass path: "/etc/passwd" (or any other server-visible path) in an ingest_documents request, and the server will read and upload that file's contents. There is no directory restriction or allowlist — Path(doc.path).expanduser() resolves the path freely. The documentation advises remote agents to use content_base64 instead, but nothing in the code enforces this. An attacker who obtains the bearer token, or any authorized-but-malicious agent, can exfiltrate arbitrary files accessible to the service process (config files, TLS keys, credentials under ~/.config, etc.).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py
    Line: 609-624
    
    Comment:
    **Unrestricted server-side file read via HTTP `/mcp` endpoint**
    
    When the MCP app is mounted at `/mcp`, authenticated remote agents can pass `path: "/etc/passwd"` (or any other server-visible path) in an `ingest_documents` request, and the server will read and upload that file's contents. There is no directory restriction or allowlist — `Path(doc.path).expanduser()` resolves the path freely. The documentation advises remote agents to use `content_base64` instead, but nothing in the code enforces this. An attacker who obtains the bearer token, or any authorized-but-malicious agent, can exfiltrate arbitrary files accessible to the service process (config files, TLS keys, credentials under `~/.config`, etc.).
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. nemo_retriever/src/nemo_retriever/service/mcp_server.py, line 610-614 (link)

    P2 No file size limit before reading into memory

    path.read_bytes() loads the entire file into memory without any size check. The security rules for this codebase require validating file sizes at every entry point to prevent OOM. A multi-GB file (e.g. a large PDF) submitted by an agent would be read in full before the HTTP upload even starts, potentially exhausting the server's heap.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py
    Line: 610-614
    
    Comment:
    **No file size limit before reading into memory**
    
    `path.read_bytes()` loads the entire file into memory without any size check. The security rules for this codebase require validating file sizes at every entry point to prevent OOM. A multi-GB file (e.g. a large PDF) submitted by an agent would be read in full before the HTTP upload even starts, potentially exhausting the server's heap.
    
    How can I resolve this? If you propose a fix, please make it concise.
  3. nemo_retriever/tests/test_service_mcp.py, line 940-944 (link)

    P2 Weak auth-protection assertion

    assert authorized.status_code != 401 only confirms the MCP endpoint doesn't return a 401, but would silently pass for a 500, 404, or 405 returned by a broken mount. Asserting a specific expected status code (e.g. 200 or 307) would catch regressions where the endpoint mounts but fails for an unrelated reason.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/tests/test_service_mcp.py
    Line: 940-944
    
    Comment:
    **Weak auth-protection assertion**
    
    `assert authorized.status_code != 401` only confirms the MCP endpoint doesn't return a 401, but would silently pass for a 500, 404, or 405 returned by a broken mount. Asserting a specific expected status code (e.g. 200 or 307) would catch regressions where the endpoint mounts but fails for an unrelated reason.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:609-624
**Unrestricted server-side file read via HTTP `/mcp` endpoint**

When the MCP app is mounted at `/mcp`, authenticated remote agents can pass `path: "/etc/passwd"` (or any other server-visible path) in an `ingest_documents` request, and the server will read and upload that file's contents. There is no directory restriction or allowlist — `Path(doc.path).expanduser()` resolves the path freely. The documentation advises remote agents to use `content_base64` instead, but nothing in the code enforces this. An attacker who obtains the bearer token, or any authorized-but-malicious agent, can exfiltrate arbitrary files accessible to the service process (config files, TLS keys, credentials under `~/.config`, etc.).

### Issue 2 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:521-523
`build_mcp_app` is a public function but is missing a return type annotation, violating the `type-hints-public-api` rule.

```suggestion
def build_mcp_app(settings: ServiceMCPSettings | None = None) -> Any:
    """Create the ASGI app mounted by ``retriever service start``."""
    return build_mcp(settings).http_app(path="/")
```

### Issue 3 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:140-148
The first branch of `_json_or_text` is redundant. When `content-type` contains `"application/json"` the code calls `resp.json()` immediately; when it doesn't, the `try` block still calls `resp.json()` and only falls back to `{"text": resp.text}` on failure. The `if "application/json"` guard therefore never changes behaviour and can be removed.

```suggestion
    @staticmethod
    def _json_or_text(resp: httpx.Response) -> Any:
        try:
            return resp.json()
        except ValueError:
            return {"text": resp.text}
```

### Issue 4 of 6
nemo_retriever/src/nemo_retriever/service/mcp_server.py:610-614
**No file size limit before reading into memory**

`path.read_bytes()` loads the entire file into memory without any size check. The security rules for this codebase require validating file sizes at every entry point to prevent OOM. A multi-GB file (e.g. a large PDF) submitted by an agent would be read in full before the HTTP upload even starts, potentially exhausting the server's heap.

### Issue 5 of 6
nemo_retriever/tests/test_service_mcp.py:940-944
**Weak auth-protection assertion**

`assert authorized.status_code != 401` only confirms the MCP endpoint doesn't return a 401, but would silently pass for a 500, 404, or 405 returned by a broken mount. Asserting a specific expected status code (e.g. 200 or 307) would catch regressions where the endpoint mounts but fails for an unrelated reason.

### Issue 6 of 6
nemo_retriever/src/nemo_retriever/service/app.py:257-259
**Bare `except Exception:` silently masks MCP init bugs**

Catching every exception here (including programming errors thrown by `build_mcp_app` or `combine_lifespans`) means a coding mistake introduced in `mcp_server.py` will degrade silently to "MCP not mounted" rather than surfacing at startup. Consider narrowing to `ImportError` for the optional-dependency case and letting other errors propagate, or at minimum catching as `except Exception as exc:` so the warning can include the exception message inline rather than relying solely on `exc_info=True`.

Reviews (1): Last reviewed commit: "Add service-mode FastMCP integration" | Re-trigger Greptile

Comment on lines +521 to +523
def build_mcp_app(settings: ServiceMCPSettings | None = None):
"""Create the ASGI app mounted by ``retriever service start``."""
return build_mcp(settings).http_app(path="/")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 build_mcp_app is a public function but is missing a return type annotation, violating the type-hints-public-api rule.

Suggested change
def build_mcp_app(settings: ServiceMCPSettings | None = None):
"""Create the ASGI app mounted by ``retriever service start``."""
return build_mcp(settings).http_app(path="/")
def build_mcp_app(settings: ServiceMCPSettings | None = None) -> Any:
"""Create the ASGI app mounted by ``retriever service start``."""
return build_mcp(settings).http_app(path="/")
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py
Line: 521-523

Comment:
`build_mcp_app` is a public function but is missing a return type annotation, violating the `type-hints-public-api` rule.

```suggestion
def build_mcp_app(settings: ServiceMCPSettings | None = None) -> Any:
    """Create the ASGI app mounted by ``retriever service start``."""
    return build_mcp(settings).http_app(path="/")
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +140 to +148
@staticmethod
def _json_or_text(resp: httpx.Response) -> Any:
content_type = resp.headers.get("content-type", "")
if "application/json" in content_type:
return resp.json()
try:
return resp.json()
except ValueError:
return {"text": resp.text}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The first branch of _json_or_text is redundant. When content-type contains "application/json" the code calls resp.json() immediately; when it doesn't, the try block still calls resp.json() and only falls back to {"text": resp.text} on failure. The if "application/json" guard therefore never changes behaviour and can be removed.

Suggested change
@staticmethod
def _json_or_text(resp: httpx.Response) -> Any:
content_type = resp.headers.get("content-type", "")
if "application/json" in content_type:
return resp.json()
try:
return resp.json()
except ValueError:
return {"text": resp.text}
@staticmethod
def _json_or_text(resp: httpx.Response) -> Any:
try:
return resp.json()
except ValueError:
return {"text": resp.text}
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/mcp_server.py
Line: 140-148

Comment:
The first branch of `_json_or_text` is redundant. When `content-type` contains `"application/json"` the code calls `resp.json()` immediately; when it doesn't, the `try` block still calls `resp.json()` and only falls back to `{"text": resp.text}` on failure. The `if "application/json"` guard therefore never changes behaviour and can be removed.

```suggestion
    @staticmethod
    def _json_or_text(resp: httpx.Response) -> Any:
        try:
            return resp.json()
        except ValueError:
            return {"text": resp.text}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +257 to +259
except Exception:
mcp_asgi_app = None
logger.warning("FastMCP service integration failed to initialise; /mcp will not be mounted", exc_info=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Bare except Exception: silently masks MCP init bugs

Catching every exception here (including programming errors thrown by build_mcp_app or combine_lifespans) means a coding mistake introduced in mcp_server.py will degrade silently to "MCP not mounted" rather than surfacing at startup. Consider narrowing to ImportError for the optional-dependency case and letting other errors propagate, or at minimum catching as except Exception as exc: so the warning can include the exception message inline rather than relying solely on exc_info=True.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/app.py
Line: 257-259

Comment:
**Bare `except Exception:` silently masks MCP init bugs**

Catching every exception here (including programming errors thrown by `build_mcp_app` or `combine_lifespans`) means a coding mistake introduced in `mcp_server.py` will degrade silently to "MCP not mounted" rather than surfacing at startup. Consider narrowing to `ImportError` for the optional-dependency case and letting other errors propagate, or at minimum catching as `except Exception as exc:` so the warning can include the exception message inline rather than relying solely on `exc_info=True`.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant