Skip to content

LCORE-1617 Centralize Llama Stack Vector Store#1473

Open
JslYoon wants to merge 9 commits intolightspeed-core:mainfrom
JslYoon:LCORE-1617-vector-stor-operations
Open

LCORE-1617 Centralize Llama Stack Vector Store#1473
JslYoon wants to merge 9 commits intolightspeed-core:mainfrom
JslYoon:LCORE-1617-vector-stor-operations

Conversation

@JslYoon
Copy link
Copy Markdown
Contributor

@JslYoon JslYoon commented Apr 7, 2026

Description

The proposed architectural update aims to decouple the Red Hat Developer Hub (RHDH) plugin from its current tight dependency on the Llama Stack by centralizing all vector store operations within lightspeed-core. Currently, the RHDH plugin directly manages sessions and documents through multiple Llama Stack API calls, making it difficult to switch to alternative vector stores like ChromaDB or Pinecone without significant refactoring. By implementing a proxy/wrapper pattern, lightspeed-core will host domain-focused REST endpoints and handle all business logic, while the RHDH plugin is reduced to a thin HTTP client. This transition ensures a single source of truth for notebook logic, improves maintainability, and allows for backend infrastructure changes without impacting the frontend plugin.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude code
  • Generated by: Claude code

Related Tickets & Documents

  • Related Issue #1617
  • Closes #1617

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added authenticated REST endpoints to manage vector stores and their files (create, list, retrieve, update, delete; upload and attach files), with retry for transient attachment errors.
  • API / Schemas

    • Added request/response schemas for vector stores, files, and list endpoints; integrated into v1 API.
    • Added new authorization actions for vector store and file operations.
  • Tests

    • Added comprehensive unit tests covering success/error flows, retries, validation, and router registration.
  • Chores

    • Added a 100MB default max file upload limit.

@JslYoon JslYoon changed the title [LCORE-1617] Centralize Llama Stack Vector Store/File Operations in Lightspeed-Core [LCORE-1617] Centralize Llama Stack Vector Store Apr 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new FastAPI router vector_stores with authenticated endpoints for vector store and file lifecycle management, new request/response models and RBAC actions, router registration, a file size constant, retry/backoff for lock errors, and comprehensive unit tests for handlers and models.

Changes

Cohort / File(s) Summary
Vector Store REST API
src/app/endpoints/vector_stores.py
New APIRouter exposing CRUD endpoints for vector stores, file upload/attachment endpoints, auth checks, configuration gating, client calls via AsyncLlamaStackClientHolder().get_client(), retry/backoff for lock errors, and standardized error-to-HTTP mapping.
Router Integration
src/app/routers.py
Imports and registers vector_stores.router under the "/v1" prefix.
API Models & Enums
src/models/config.py, src/models/requests.py, src/models/responses.py
Added RBAC actions MANAGE_VECTOR_STORES, READ_VECTOR_STORES, MANAGE_FILES. Added request models VectorStoreCreateRequest, VectorStoreUpdateRequest, VectorStoreFileCreateRequest with validation (including empty-update and attributes validators). Added response models VectorStoreResponse, VectorStoresListResponse, FileResponse, VectorStoreFileResponse, VectorStoreFilesListResponse.
Constants
src/constants.py
Added DEFAULT_MAX_FILE_UPLOAD_SIZE = 100 * 1024 * 1024 (100 MB) for upload size enforcement.
Unit Tests — endpoints
tests/unit/app/endpoints/test_vector_stores.py
New comprehensive async tests covering vector store CRUD and file upload/attach/list/get/delete flows, error mappings (APIConnectionError→503, BadRequestError→404/400), configuration-not-loaded case, and retry/backoff behaviors for lock errors.
Unit Tests — routers
tests/unit/app/test_routers.py
Updated router tests to include vector_stores.router registration and adjusted expected router counts/prefix assertions.
Unit Tests — models
tests/unit/models/requests/test_vector_store_requests.py
New tests for VectorStoreUpdateRequest and VectorStoreFileCreateRequest validating field constraints and validators (empty-update rejection, attribute limits).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (HTTP)
    participant Auth as Auth Dependency
    participant Config as Configuration
    participant API as Vector Stores API
    participant Llama as AsyncLlamaStackClient

    Client->>API: HTTP request (e.g., POST /v1/vector-stores or /files)
    API->>Auth: authorize(Action.*) / get_auth_dependency()
    Auth-->>API: auth tuple
    API->>Config: check_configuration_loaded(configuration)
    Config-->>API: ok / error
    API->>Llama: AsyncLlamaStackClientHolder().get_client()
    Llama-->>API: client
    API->>Llama: client.vector_stores.* or client.files.* (create/add/list/get/delete)
    alt add_file lock error
        Llama-->>API: LockError
        API->>Llama: retry (exponential backoff, up to 3 attempts)
    end
    Llama-->>API: success / other error
    API-->>Client: HTTP response (mapped model or error)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references LCORE-1617 and mentions 'Centralize Llama Stack Vector Store', which directly aligns with the PR's primary objective of centralizing vector store operations within lightspeed-core as a proxy/wrapper pattern.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 129-144: Remove the ad-hoc debugging call to await
client.models.list() and any plain print() statements in vector store request
handling (including the print at the start of the create flow and the other
print occurrences noted) and either delete them or convert them to structured
logger.debug() calls; additionally, change the logger.info that currently logs
the full request payload (the logger.info referencing "Creating vector store -
body_dict: %s, extra_body: %s") to either logger.debug or redact
sensitive/session/document fields before logging so no full request body is
emitted at info level; locate the code by the body.model_dump(exclude_none=True)
usage and the extra_body construction to apply these changes.
- Around line 366-369: Update both delete route decorators (e.g.,
`@router.delete`("/vector-stores/{vector_store_id}") and the other delete route
for entries) to include status_code=status.HTTP_204_NO_CONTENT so the actual
response matches the OpenAPI responses={"204": ...}; also fix the imports by
adding status and importing Depends directly from fastapi (e.g., from fastapi
import Depends, status) instead of importing Depends from fastapi.params and
ensure any existing references to status.HTTP_204_NO_CONTENT resolve.
- Around line 65-73: file_responses currently advertises a 503 via
ServiceUnavailableResponse for client-side upload validation failures; replace
that with a 400 Bad Request mapping and ensure BadRequestError is translated to
an HTTP 400 response. Update the file_responses dict to include a
BadRequestResponse/OpenAPI 400 entry instead of ServiceUnavailableResponse, and
in the handler where BadRequestError is caught (reference BadRequestError and
any exception handling code around the upload endpoint) raise or return FastAPI
HTTPException(status_code=400) so the OpenAPI and runtime behavior consistently
reflect a 400 Bad Request for malformed uploads.

In `@src/models/requests.py`:
- Around line 1009-1050: The VectorStoreUpdateRequest model currently allows an
empty payload because all fields are optional; add a Pydantic `@model_validator`
(e.g., on VectorStoreUpdateRequest) that runs after parsing and checks that at
least one of name, expires_at, or metadata is not None (and treat an empty
metadata dict as “empty”); if all are absent/empty, raise a ValueError so
requests return 422. Use `@field_validator` only if you need per-field
normalization (e.g., converting empty metadata to None) and reference the class
name VectorStoreUpdateRequest and the field names name, expires_at, metadata
when implementing the validators.
- Around line 1053-1097: The VectorStoreFileCreateRequest model does not enforce
the documented limits on the attributes field (max 16 pairs, keys max 64 chars,
string values max 512 chars, values may also be bool/number), so add validation:
implement a `@field_validator`("attributes", mode="before" or "after") to ensure
the field is a dict when present, has at most 16 entries, that each key is a
string <=64 chars, and each value is either a string (<=512 chars), bool, or
number; then implement a `@model_validator` to enforce any cross-field invariants
if needed and raise clear ValueError messages on violations. Reference the
VectorStoreFileCreateRequest class and the attributes field when adding these
Pydantic validators; keep model_config as-is (extra: forbid).

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 33-52: The test fakes lack the optional fields the real handler
expects; update the VectorStore __init__ to accept metadata: Optional[dict]=None
and attributes: Optional[dict]=None (defaulting to None) and assign them to
self.metadata and self.attributes, and likewise add those optional
parameters/attributes to the other mock class in the same file (the file handler
fake around lines 78-90) so create_vector_store() and file handlers can read
metadata/attributes without error.
- Around line 169-172: The assertions are inside the pytest.raises context and
never run after the awaited create_vector_store raises; move the assertions so
they execute after the with block. Specifically, in
tests/unit/app/endpoints/test_vector_stores.py surrounding the call to
create_vector_store(request=request, auth=auth, body=body), keep the "with
pytest.raises(HTTPException) as e:" only wrapping the await, then after exiting
that block assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
and assert e.value.detail["response"] == "Configuration is not loaded" (and make
the same change for the other occurrence around lines 224-227).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b8a0a1c8-d00b-4162-a0c2-b9dac912b21d

📥 Commits

Reviewing files that changed from the base of the PR and between 76fd4de and e28fc79.

📒 Files selected for processing (6)
  • src/app/endpoints/vector_stores.py
  • src/app/routers.py
  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
  • tests/unit/app/endpoints/test_vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/app/routers.py
  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
  • src/app/endpoints/vector_stores.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/routers.py
  • src/app/endpoints/vector_stores.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/app/endpoints/test_vector_stores.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_vector_stores.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/vector_stores.py
🪛 GitHub Actions: Unit tests
src/app/endpoints/vector_stores.py

[error] 159-159: Test_create_vector_store_success failed: AttributeError in create_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'


[error] 214-214: Test_list_vector_stores_success failed: AttributeError in list_vector_stores when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'


[error] 275-275: Test_get_vector_store_success failed: AttributeError in get_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'


[error] 345-345: Test_update_vector_store_success failed: AttributeError in update_vector_store when building VectorStoreResponse: 'VectorStore' object has no attribute 'metadata'


[error] 590-590: Test_add_file_to_vector_store_success failed: AttributeError in add_file_to_vector_store when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'


[error] 661-661: Test_list_vector_store_files_success failed: AttributeError in list_vector_store_files when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'


[error] 736-736: Test_get_vector_store_file_success failed: AttributeError in get_vector_store_file when building VectorStoreFileResponse: 'VectorStoreFile' object has no attribute 'attributes'

Signed-off-by: Lucas <lyoon@redhat.com>
@JslYoon JslYoon changed the title [LCORE-1617] Centralize Llama Stack Vector Store LCORE-1617 Centralize Llama Stack Vector Store Apr 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (4)
tests/unit/app/endpoints/test_vector_stores.py (1)

173-176: ⚠️ Potential issue | 🟠 Major

Move the assertions outside pytest.raises.

Once the awaited call raises, the remaining lines inside the context manager never run. These tests currently verify only that some HTTPException was raised.

Suggested fix
     with pytest.raises(HTTPException) as e:
         await create_vector_store(request=request, auth=auth, body=body)
-        assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
-        assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
+    assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
@@
     with pytest.raises(HTTPException) as e:
         await create_vector_store(request=request, auth=auth, body=body)
-        assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
-        assert e.value.detail["response"] == "Unable to connect to Llama Stack"  # type: ignore
+    assert e.value.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
+    assert e.value.detail["response"] == "Unable to connect to Llama Stack"  # type: ignore

Also applies to: 228-231

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 173 - 176, The
assertions are inside the pytest.raises context so they never run after await
create_vector_store(...) raises; move the two assert lines out of the with
pytest.raises(HTTPException) as e: block so you capture the exception into e and
then assert e.value.status_code and e.value.detail["response"] after the
context. Apply the same change for the second occurrence around lines
referencing the same pattern (the other test block at 228-231) and keep
references to create_vector_store, pytest.raises, e.value.status_code and
e.value.detail["response"] when updating the tests.
src/app/endpoints/vector_stores.py (3)

139-143: ⚠️ Potential issue | 🟠 Major

Stop logging the full create payload at info.

body_dict and extra_body can carry notebook/session metadata. Emitting the whole request at info level is noisy and leaks more context than this endpoint needs. Log stable identifiers only, and keep payload-level diagnostics at debug.

Suggested fix
-        logger.info(
-            "Creating vector store - body_dict: %s, extra_body: %s",
-            body_dict,
-            extra_body,
-        )
+        logger.debug(
+            "Creating vector store '%s' with provider '%s'",
+            body.name,
+            extra_body.get("provider_id"),
+        )

As per coding guidelines, use logger.debug() for detailed diagnostic information.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 139 - 143, Change the noisy
info-level logging that prints the full payload to log only stable identifiers
at info and move payload diagnostics to debug: replace the current logger.info
call that references body_dict and extra_body with an info log that emits only
stable identifiers (e.g., request id, user_id, notebook_id or session_id if
available) and use logger.debug to record the full body_dict/extra_body for
detailed diagnostics; update references in the surrounding function (the logger
usage in this module, e.g., the create vector store endpoint in
vector_stores.py) so sensitive/session metadata is not logged at info level.

65-73: ⚠️ Potential issue | 🟠 Major

Return 400 for upload bad requests, not 503.

BadRequestError here is a client-side upload failure, but the route currently advertises and returns ServiceUnavailableResponse. That makes malformed uploads look like transient backend outages and encourages retries instead of request fixes. The matching bad-request unit test will need to flip with this change.

Based on learnings, use FastAPI HTTPException with appropriate status codes for API endpoint error handling.

Also applies to: 487-490

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 65 - 73, The route currently
advertises and returns ServiceUnavailableResponse (503) for client-side
BadRequestError; change the response mapping to return a 400 BadRequestResponse
and raise FastAPI HTTPException(status_code=400) when catching BadRequestError
so uploads are treated as client errors; update file_responses to include
BadRequestResponse.openapi_response() instead of
ServiceUnavailableResponse.openapi_response() and adjust the error handling in
the same handler (and the similar handler referenced around the code marked
487-490) so unit tests expect 400 instead of 503.

9-10: ⚠️ Potential issue | 🟠 Major

Declare 204 on both delete routes and fix the FastAPI imports.

responses={"204": ...} only changes the OpenAPI schema. Without status_code=status.HTTP_204_NO_CONTENT, both handlers still return 200 at runtime. The import also bypasses the repo’s required fastapi import form for Depends/status.

Suggested fix
-from fastapi import APIRouter, File, HTTPException, Request, UploadFile
-from fastapi.params import Depends
+from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile, status
@@
 `@router.delete`(
     "/vector-stores/{vector_store_id}",
+    status_code=status.HTTP_204_NO_CONTENT,
     responses={"204": {"description": "Vector store deleted"}},
 )
@@
 `@router.delete`(
     "/vector-stores/{vector_store_id}/files/{file_id}",
+    status_code=status.HTTP_204_NO_CONTENT,
     responses={"204": {"description": "File deleted from vector store"}},
 )

As per coding guidelines, use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies.

Also applies to: 365-368, 760-763

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 9 - 10, The FastAPI import
and delete handlers are incorrect: change the import to use the repo style "from
fastapi import APIRouter, HTTPException, Request, status, Depends" (remove the
separate fastapi.params import) and update both delete route handlers that
currently only set responses={"204": ...} to also set
status_code=status.HTTP_204_NO_CONTENT so they return 204 at runtime; locate the
routes in src/app/endpoints/vector_stores.py that declare a 204 response (the
two DELETE handlers) and add status_code=status.HTTP_204_NO_CONTENT to their
decorator arguments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 86-94: The list_vector_store_files() handler currently only maps
APIConnectionError and generic Exception to responses; update
vector_store_files_list_responses to include a 404 entry (use the appropriate
NotFound/BadRequest response helper used elsewhere) and add a specific except
BadRequestError branch in list_vector_store_files() that returns the 404
response, mirroring how get_vector_store() handles BadRequestError; also update
the function docstring to mention the 404/BadRequest response.
- Around line 450-468: The code double-buffers uploaded files by calling await
file.read() then wrapping into BytesIO(content); change to pass the payload
directly to the Llama client as a tuple (filename, content) or (filename,
file.file) to avoid copying (use client.files.create(file=(filename, content),
purpose="assistants") or client.files.create(file=(filename, file.file),
purpose="assistants")), remove the BytesIO usage (file_bytes and its .name
assignment), and update logging to use the filename and size without forcing a
second copy; also add a pre-read size check using the request's Content-Length
header (and raise HTTPException when over the allowed limit) so you validate
large uploads before materializing them into memory.

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 184-205: Add a test case that verifies the adapter moves
provider_id, embedding_model, and embedding_dimension from the main request into
extra_body when calling the client: use the existing test setup
(get_test_request, get_test_auth, VectorStoreCreateRequest) but include
provider_id, embedding_model, and embedding_dimension in the body; patch
AsyncLlamaStackClientHolder.get_client to return mock_client and assert
mock_client.vector_stores.create was called with extra_body containing those
three keys (and that the top-level body forwarded to create does not contain
them) while still returning a VectorStore("vs_123","test_store") and asserting
the response fields as in the current test.

---

Duplicate comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 139-143: Change the noisy info-level logging that prints the full
payload to log only stable identifiers at info and move payload diagnostics to
debug: replace the current logger.info call that references body_dict and
extra_body with an info log that emits only stable identifiers (e.g., request
id, user_id, notebook_id or session_id if available) and use logger.debug to
record the full body_dict/extra_body for detailed diagnostics; update references
in the surrounding function (the logger usage in this module, e.g., the create
vector store endpoint in vector_stores.py) so sensitive/session metadata is not
logged at info level.
- Around line 65-73: The route currently advertises and returns
ServiceUnavailableResponse (503) for client-side BadRequestError; change the
response mapping to return a 400 BadRequestResponse and raise FastAPI
HTTPException(status_code=400) when catching BadRequestError so uploads are
treated as client errors; update file_responses to include
BadRequestResponse.openapi_response() instead of
ServiceUnavailableResponse.openapi_response() and adjust the error handling in
the same handler (and the similar handler referenced around the code marked
487-490) so unit tests expect 400 instead of 503.
- Around line 9-10: The FastAPI import and delete handlers are incorrect: change
the import to use the repo style "from fastapi import APIRouter, HTTPException,
Request, status, Depends" (remove the separate fastapi.params import) and update
both delete route handlers that currently only set responses={"204": ...} to
also set status_code=status.HTTP_204_NO_CONTENT so they return 204 at runtime;
locate the routes in src/app/endpoints/vector_stores.py that declare a 204
response (the two DELETE handlers) and add
status_code=status.HTTP_204_NO_CONTENT to their decorator arguments.

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 173-176: The assertions are inside the pytest.raises context so
they never run after await create_vector_store(...) raises; move the two assert
lines out of the with pytest.raises(HTTPException) as e: block so you capture
the exception into e and then assert e.value.status_code and
e.value.detail["response"] after the context. Apply the same change for the
second occurrence around lines referencing the same pattern (the other test
block at 228-231) and keep references to create_vector_store, pytest.raises,
e.value.status_code and e.value.detail["response"] when updating the tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eea96eaa-fe50-41d6-b442-dd1b7c8bd69a

📥 Commits

Reviewing files that changed from the base of the PR and between e28fc79 and 3fe63aa.

📒 Files selected for processing (3)
  • src/app/endpoints/vector_stores.py
  • tests/unit/app/endpoints/test_vector_stores.py
  • tests/unit/app/test_routers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_vector_stores.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_vector_stores.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/app/endpoints/vector_stores.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/vector_stores.py
🧠 Learnings (3)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use `from fastapi import APIRouter, HTTPException, Request, status, Depends` for FastAPI dependencies

Applied to files:

  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/vector_stores.py

Comment on lines +450 to +468
# Read file content
content = await file.read()
filename = file.filename or "uploaded_file"

# Add .txt extension if no extension present
# (since parsed PDFs/URLs are sent as plain text)
if not os.path.splitext(filename)[1]:
filename = f"{filename}.txt"

logger.info(
"Uploading file - filename: %s, size: %d bytes",
filename,
len(content),
)

# Convert to BytesIO for Llama Stack client
# The client expects bytes, io.IOBase, PathLike, or a tuple
file_bytes = BytesIO(content)
file_bytes.name = filename # Set the filename attribute
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full function context around lines 450-468
cat -n src/app/endpoints/vector_stores.py | sed -n '420,480p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 2378


🏁 Script executed:

# Check if there's a size limit check or any validation before reading
cat -n src/app/endpoints/vector_stores.py | sed -n '410,475p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 2563


🏁 Script executed:

# Look for any size limit constants or config
rg -n "size|SIZE|limit|LIMIT" src/app/endpoints/vector_stores.py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 140


🏁 Script executed:

# Check if UploadFile.file is used anywhere else in the codebase
rg -n "UploadFile" src/ -A 5 -B 2

Repository: lightspeed-core/lightspeed-stack

Length of output: 1217


🏁 Script executed:

# Check if there are other file upload endpoints in the codebase to see patterns
find src -name "*.py" -type f | xargs grep -l "UploadFile" | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 110


🏁 Script executed:

# Look for how file.file is used or if it's mentioned anywhere
rg -n "file\.file" src/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 278


🏁 Script executed:

# Check the imports and FastAPI version to understand UploadFile better
cat src/app/endpoints/vector_stores.py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 992


🏁 Script executed:

# Check the Llama Stack client usage - what does files.create() expect?
rg -n "client.files.create\|files\.create" src/ -B 2 -A 3 --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check if there's any documentation or type hints about what file parameter accepts
rg -n "files\.create" src/ -B 5 -A 5

Repository: lightspeed-core/lightspeed-stack

Length of output: 1763


🏁 Script executed:

# Search for any Python documentation or comments about UploadFile usage
rg -n "SpooledTemporaryFile\|\.file\|file object" src/ --type py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check if there's a requirements.txt to see llama-stack-client version
cat requirements.txt | grep -i llama

Repository: lightspeed-core/lightspeed-stack

Length of output: 124


🏁 Script executed:

# Look for requirements files with different names
find . -maxdepth 2 -name "*.txt" -o -name "pyproject.toml" -o -name "setup.py" | grep -E "req|setup|pyproject"

Repository: lightspeed-core/lightspeed-stack

Length of output: 239


🏁 Script executed:

# Check for pyproject.toml or setup.py with dependencies
ls -la | grep -E "requirements|setup|pyproject"

Repository: lightspeed-core/lightspeed-stack

Length of output: 539


🏁 Script executed:

# Look for any configuration about file size limits
rg -n "max.*size\|MAX.*SIZE\|file.*limit\|upload.*limit" src/ --type py -i | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check pyproject.toml for llama-stack-client version
cat pyproject.toml | grep -A 5 -B 5 "llama"

Repository: lightspeed-core/lightspeed-stack

Length of output: 813


🏁 Script executed:

# Check if the UploadFile's underlying file object is accessible
# FastAPI's UploadFile has a .file property that is SpooledTemporaryFile
# Let's see if there are any patterns in the code
rg -n "SpooledTemporaryFile\|BinaryIO\|AsyncBinaryIO" src/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check the full function signature and see what happens with file after the read
cat -n src/app/endpoints/vector_stores.py | sed -n '417,490p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 3117


🏁 Script executed:

# Look for the error handling to see what status code is returned for large files
cat -n src/app/endpoints/vector_stores.py | sed -n '490,510p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 982


🏁 Script executed:

# Search for any MAX_FILE_SIZE or similar constants
rg -n "MAX\|LIMIT\|SIZE" src/ --type py | grep -i "file\|upload\|size" | head -15

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🌐 Web query:

llama-stack-client files.create() file parameter accept bytes BytesIO SpooledTemporaryFile

💡 Result:

client.files.create(file=...) accepts FileTypes, where FileContent = IO[bytes] | bytes | PathLike[...]. So yes: bytes, io.BytesIO, and tempfile.SpooledTemporaryFile (opened in binary mode) are valid inputs. [1] [2]

What you typically want in practice is to also provide a filename (and optionally a MIME type), because that’s how the client/server can infer file type (especially for BytesIO, which has no .name). The supported “rich” forms are tuples like: (filename, contents) or (filename, contents, content_type) (and optionally headers). [1]

Examples:

from io import BytesIO
from llama_stack_client import LlamaStackClient

client = LlamaStackClient()

bio = BytesIO(b"%PDF-...")  # or any bytes
client.files.create(
    file=("doc.pdf", bio, "application/pdf"),
    purpose="assistants",
)
import tempfile
from llama_stack_client import LlamaStackClient

client = LlamaStackClient()

with tempfile.SpooledTemporaryFile(mode="w+b") as f:
    f.write(b"hello")
    f.seek(0)
    client.files.create(
        file=("hello.txt", f, "text/plain"),
        purpose="assistants",
    )

If you already have raw bytes and don’t care about name/type, you can pass them directly:

client.files.create(file=b"hello", purpose="assistants")

But supplying a filename is safer for type detection. [1]

Sources:
[1] FileTypes / FileContent definitions in _types.py (union includes IO[bytes], bytes, tuple variants) (github.com)
[2] FileCreateParams.file: Required[FileTypes] (github.com)
[3] README “File uploads” (bytes, PathLike, or (filename, contents, media type) tuple) (github.com)

Citations:


Avoid double-buffering the file into memory.

await file.read() loads the full payload into bytes, and BytesIO(content) copies it again. For large files this doubles memory consumption. Instead, pass the bytes directly using a tuple with the filename: await client.files.create(file=(filename, content), purpose="assistants"). Alternatively, pass the underlying spooled file object directly: await client.files.create(file=(filename, file.file), purpose="assistants").

Additionally, the docstring claims to raise HTTPException for large files, but there is no size validation before reading. Add a check using the Content-Length header before materializing the payload, or configure a file size limit upstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 450 - 468, The code
double-buffers uploaded files by calling await file.read() then wrapping into
BytesIO(content); change to pass the payload directly to the Llama client as a
tuple (filename, content) or (filename, file.file) to avoid copying (use
client.files.create(file=(filename, content), purpose="assistants") or
client.files.create(file=(filename, file.file), purpose="assistants")), remove
the BytesIO usage (file_bytes and its .name assignment), and update logging to
use the filename and size without forcing a second copy; also add a pre-read
size check using the request's Content-Length header (and raise HTTPException
when over the allowed limit) so you validate large uploads before materializing
them into memory.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
src/models/requests.py (2)

1029-1070: ⚠️ Potential issue | 🟠 Major

Add @model_validator to reject empty update payloads.

VectorStoreUpdateRequest accepts {} because all fields are optional. This allows no-op updates to reach the backend instead of failing fast with a 422.

🛠️ Proposed fix
 class VectorStoreUpdateRequest(BaseModel):
     ...
     model_config = {
         "extra": "forbid",
         ...
     }
+
+    `@model_validator`(mode="after")
+    def validate_non_empty_update(self) -> Self:
+        """Ensure at least one updatable field is provided."""
+        if self.name is None and self.expires_at is None and self.metadata is None:
+            raise ValueError(
+                "At least one of 'name', 'expires_at', or 'metadata' must be provided"
+            )
+        return self

As per coding guidelines, src/models/**/*.py should use @model_validator for custom validation in Pydantic models.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 1029 - 1070, VectorStoreUpdateRequest
currently allows an empty payload because all fields are optional; add a
Pydantic v2 model validator to reject no-op updates by raising a validation
error when every updatable field is missing or None. Implement an
`@model_validator`(mode="after") on VectorStoreUpdateRequest that inspects the
instance (checking name, expires_at, metadata) and raises a ValueError (or
ValidationError) with a clear message if all three are None/absent, otherwise
returns the model.

1089-1098: ⚠️ Potential issue | 🟠 Major

Enforce documented attributes limits with a @field_validator.

The docstring specifies max 16 pairs, keys max 64 chars, and string values max 512 chars, but there's no validation. Invalid payloads will pass through to the backend.

🛠️ Proposed fix
+    `@field_validator`("attributes")
+    `@classmethod`
+    def validate_attributes(
+        cls, value: Optional[dict[str, str | float | bool]]
+    ) -> Optional[dict[str, str | float | bool]]:
+        """Validate attribute count and key/value sizes."""
+        if value is None:
+            return value
+        if len(value) > 16:
+            raise ValueError("A maximum of 16 attributes is allowed")
+        for key, attr_value in value.items():
+            if len(key) > 64:
+                raise ValueError(f"Attribute key '{key}' exceeds 64 characters")
+            if isinstance(attr_value, str) and len(attr_value) > 512:
+                raise ValueError(
+                    f"Attribute value for '{key}' exceeds 512 characters"
+                )
+        return value

As per coding guidelines, src/models/**/*.py should use @field_validator for custom validation in Pydantic models.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 1089 - 1098, The attributes Field lacks
validation against the documented limits; add a Pydantic `@field_validator` for
"attributes" (on the model class containing the attributes field) that enforces:
at most 16 key/value pairs, each key is a string ≤64 chars, values are only
str|float|bool, and if a value is a string it is ≤512 chars; raise ValueError
with a clear message on violation. Implement the validator using
`@field_validator`("attributes", mode="after") (or appropriate mode for your
Pydantic version) and return the validated dict unchanged when all checks pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/models/requests.py`:
- Around line 1100-1104: The chunking_strategy Field in the model currently uses
dict[str, Any] which bypasses validation; replace it with a shared Pydantic
model (e.g., ChunkingStrategy) and reuse that same model used by
VectorStoreCreateRequest so validation and docs are consistent. Update the Field
type from Optional[dict[str, Any]] to Optional[ChunkingStrategy], import or move
the ChunkingStrategy class into a common module, and adjust any example/defaults
to match the ChunkingStrategy schema (keeping same description and examples).
- Around line 962-1026: VectorStoreCreateRequest uses dict[str, Any] for
chunking_strategy and metadata which violates the no-Any rule; replace these
with explicit Pydantic models/typed structures (e.g., a ChunkingStrategy model
with fields like type, chunk_size, chunk_overlap and a Metadata model or
TypedDict for allowed keys) and update the VectorStoreCreateRequest definitions
to use those types (chunking_strategy: Optional[ChunkingStrategy], metadata:
Optional[Metadata]) and adjust model_config json_schema_extra examples to match
the new schemas; ensure the new models live in the same module (or imported) and
include validation constraints (types, ranges) instead of Any.

In `@src/models/responses.py`:
- Around line 2659-2668: Move the metadata field declaration so it appears
before the model_config attribute to match the file's convention of declaring
model fields first and class config last; specifically, relocate the
Optional[dict[str, Any]] Field(...) named metadata (with its description and
examples) above the model_config = {"extra": "forbid"} line in the same class,
leaving the metadata Field signature and contents unchanged and retaining
model_config as the final class attribute.

In `@tests/unit/app/test_routers.py`:
- Line 118: Replace the hard-coded router count literal with a single shared
constant (e.g., ROUTER_COUNT) and use it wherever the literal 23 is asserted;
update the assertions that reference app.routers (the one using assert
len(app.routers) == 23 and the other at the second occurrence) to assert
len(app.routers) == ROUTER_COUNT and define ROUTER_COUNT once at the top of the
test module or local to the test file so future router additions only require
one change.

---

Duplicate comments:
In `@src/models/requests.py`:
- Around line 1029-1070: VectorStoreUpdateRequest currently allows an empty
payload because all fields are optional; add a Pydantic v2 model validator to
reject no-op updates by raising a validation error when every updatable field is
missing or None. Implement an `@model_validator`(mode="after") on
VectorStoreUpdateRequest that inspects the instance (checking name, expires_at,
metadata) and raises a ValueError (or ValidationError) with a clear message if
all three are None/absent, otherwise returns the model.
- Around line 1089-1098: The attributes Field lacks validation against the
documented limits; add a Pydantic `@field_validator` for "attributes" (on the
model class containing the attributes field) that enforces: at most 16 key/value
pairs, each key is a string ≤64 chars, values are only str|float|bool, and if a
value is a string it is ≤512 chars; raise ValueError with a clear message on
violation. Implement the validator using `@field_validator`("attributes",
mode="after") (or appropriate mode for your Pydantic version) and return the
validated dict unchanged when all checks pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1b09d28c-01cc-4b1d-93ea-0b3dfb0bde6f

📥 Commits

Reviewing files that changed from the base of the PR and between 3fe63aa and ca0313c.

📒 Files selected for processing (5)
  • src/app/routers.py
  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
  • tests/unit/app/test_routers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/app/routers.py
  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/routers.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/app/test_routers.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/test_routers.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic configuration models
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Pydantic configuration models should extend ConfigurationBase

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

  • src/models/requests.py
🔇 Additional comments (8)
src/app/routers.py (2)

31-31: Import update is consistent with router composition.

Adding vector_stores to the endpoint import list is correct and matches its later registration.


58-58: Vector stores router is mounted correctly under /v1.

This keeps the new endpoints aligned with existing v1 routing conventions.

tests/unit/app/test_routers.py (1)

141-141: New router assertions are accurate and valuable.

The added checks validate both registration and /v1 prefixing for vector_stores.router.

Also applies to: 182-182

src/models/config.py (1)

1036-1039: RBAC action additions are consistent and correctly integrated.

These enum values align with the new vector store/file authorization decorators and fit the existing Action pattern.

src/models/responses.py (4)

2686-2714: LGTM!

Model structure, docstrings, and type annotations are correct. The openapi_response method signature is consistent with other new models.


2717-2751: LGTM!

Model fields align correctly with the explicit field mapping in src/app/endpoints/vector_stores.py (context snippet 3). Type annotations and docstrings are complete.


2754-2796: LGTM!

Model structure is consistent with the endpoint construction pattern shown in context snippet 2. The attributes field type annotation correctly uses modern union syntax (str | float | bool).


2799-2827: LGTM!

Model follows the established list response pattern consistent with VectorStoresListResponse. Type annotations and docstrings are complete.

Copy link
Copy Markdown
Contributor

@jrobertboos jrobertboos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but code rabbit does bring up some good points that should be addressed.

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Apr 9, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 Security concerns

Denial of Service:
The create_file endpoint reads the entire upload into memory (await file.read() and BytesIO) without size limits, allowing an attacker to exhaust server memory by uploading large files. Consider streaming or enforcing a maximum file size.

⚡ Recommended focus areas for review

Missing 204 Status Code

Delete endpoints are documented as returning HTTP 204 but default to 200, which could confuse clients expecting a no‐content response. Decorate these handlers with status_code=204 or explicitly return a 204 Response.

@router.delete(
    "/vector-stores/{vector_store_id}",
    responses={"204": {"description": "Vector store deleted"}},
)
@authorize(Action.MANAGE_VECTOR_STORES)
async def delete_vector_store(
    request: Request,
    vector_store_id: str,
    auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
) -> None:
    """Delete a vector store.
Incorrect Error Mapping

The create_file endpoint maps a BadRequestError (e.g., file too large) to a 503 Service Unavailable instead of a 400 Bad Request, making it hard for clients to distinguish client‐side errors from backend unavailability.

except BadRequestError as e:
    logger.error("Bad request for file upload: %s", e)
    response = ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e))
    raise HTTPException(**response.model_dump()) from e
Inconsistent Error Response

In add_file_to_vector_store, if all retry attempts fail the code raises a bare HTTPException with a plain string detail, bypassing the standard error response model and resulting in an inconsistent JSON error structure.

if not vs_file:
    raise HTTPException(
        status_code=500, detail="Failed to create vector store file"
    )

Signed-off-by: Lucas <lyoon@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
src/app/endpoints/vector_stores.py (2)

64-71: ⚠️ Potential issue | 🟠 Major

Return HTTP 400 for upload validation failures.

BadRequestError is a client-side bad request, but this branch turns it into 503 Service Unavailable and file_responses doesn't document a 400 response. Callers will misclassify malformed uploads as transient backend outages.

Based on learnings, src/app/**/*.py should use FastAPI HTTPException with appropriate status codes for API endpoint error handling.

Also applies to: 488-491

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 64 - 71, The endpoint
currently maps client validation failures to ServiceUnavailableResponse and
doesn't document HTTP 400; update the file_responses dict (the file_responses
variable in src/app/endpoints/vector_stores.py) to include a 400 entry (e.g.,
BadRequest/BadRequestResponse.openapi_response()) and change the error handling
where BadRequestError is converted into a 503 to instead raise a FastAPI
HTTPException(status_code=400) or return the 400 response; apply the same fix to
the other similar branch(s) in this file that currently return 503 for
BadRequestError (the branches around the other ServiceUnavailableResponse
usages).

451-473: ⚠️ Potential issue | 🟠 Major

Reject oversized uploads before materializing them in memory.

This path loads the full body into content and then copies it again into BytesIO, so a single large upload can spike worker memory twice before the backend rejects it. Enforce a max upload size before await file.read(), and pass (filename, file.file) or (filename, content) directly to the client instead of allocating a second buffer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 451 - 473, The code
currently reads the entire upload into memory via await file.read() and then
copies it into BytesIO, which can double memory usage for large uploads; instead
enforce a MAX_UPLOAD_SIZE constant before materializing the body (e.g., check
file.spooled and/or file.headers/content-length or read in chunks up to the
limit and abort if exceeded) and reject with an error if the size is over the
limit, and when calling client.files.create use a zero-copy API by passing a
tuple with the filename and the underlying file stream (e.g., (filename,
file.file)) or the already-read content only once (e.g., (filename, content)) so
you don't allocate a second buffer; update the logic around filename, BytesIO,
and client.files.create to avoid creating file_bytes when using the stream tuple
and ensure the error path returns before any large allocation.
tests/unit/app/endpoints/test_vector_stores.py (1)

180-206: 🧹 Nitpick | 🔵 Trivial

Add coverage for the extra_body adapter path.

This success case only exercises name, so it never verifies the new wrapper logic that moves provider_id, embedding_model, and embedding_dimension into extra_body. A regression there would still leave the core behavior of this PR untested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/app/endpoints/test_vector_stores.py` around lines 180 - 206,
Update the success test to exercise the extra_body adapter path by providing
provider_id, embedding_model, and embedding_dimension on the incoming
VectorStoreCreateRequest so the wrapper moves them into extra_body; e.g., in
test_create_vector_store_success (or a new test) set body =
VectorStoreCreateRequest(name="test_store", provider_id="p1",
embedding_model="m1", embedding_dimension=128), then assert the mocked
mock_client.vector_stores.create was called with an extra_body containing those
keys/values and that the returned VectorStore is still handled
(response.id/name/status remain correct), keeping the existing mock setup for
AsyncLlamaStackClientHolder.get_client and configuration.
src/models/requests.py (1)

995-1007: 🛠️ Refactor suggestion | 🟠 Major

Replace Any-typed request bags with typed models.

chunking_strategy and metadata still accept arbitrary payloads here, so invalid shapes bypass validation at the API boundary and fail later against the backend. Reuse a shared ChunkingStrategy model and a bounded metadata type/dedicated model across these request schemas.

As per coding guidelines, src/models/**/*.py should "use complete type annotations for all class attributes; avoid using Any".

Also applies to: 1053-1057, 1116-1120

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 995 - 1007, The chunking_strategy and
metadata fields are typed as Optional[dict[str, Any]] which bypasses validation;
replace them with concrete Pydantic models and bounded types: use the shared
ChunkingStrategy model (e.g., replace chunking_strategy: Optional[dict[str,
Any]] with Optional[ChunkingStrategy]) and introduce or reuse a Metadata
model/typed mapping (e.g., Metadata or Dict[str, str] / a Pydantic model) for
the metadata field so shapes are validated at the API boundary; update all
occurrences of these fields (including the other similar definitions referenced
around the later ranges) to reference the typed models and adjust imports
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 64-71: The endpoint currently maps client validation failures to
ServiceUnavailableResponse and doesn't document HTTP 400; update the
file_responses dict (the file_responses variable in
src/app/endpoints/vector_stores.py) to include a 400 entry (e.g.,
BadRequest/BadRequestResponse.openapi_response()) and change the error handling
where BadRequestError is converted into a 503 to instead raise a FastAPI
HTTPException(status_code=400) or return the 400 response; apply the same fix to
the other similar branch(s) in this file that currently return 503 for
BadRequestError (the branches around the other ServiceUnavailableResponse
usages).
- Around line 451-473: The code currently reads the entire upload into memory
via await file.read() and then copies it into BytesIO, which can double memory
usage for large uploads; instead enforce a MAX_UPLOAD_SIZE constant before
materializing the body (e.g., check file.spooled and/or
file.headers/content-length or read in chunks up to the limit and abort if
exceeded) and reject with an error if the size is over the limit, and when
calling client.files.create use a zero-copy API by passing a tuple with the
filename and the underlying file stream (e.g., (filename, file.file)) or the
already-read content only once (e.g., (filename, content)) so you don't allocate
a second buffer; update the logic around filename, BytesIO, and
client.files.create to avoid creating file_bytes when using the stream tuple and
ensure the error path returns before any large allocation.

In `@src/models/requests.py`:
- Around line 995-1007: The chunking_strategy and metadata fields are typed as
Optional[dict[str, Any]] which bypasses validation; replace them with concrete
Pydantic models and bounded types: use the shared ChunkingStrategy model (e.g.,
replace chunking_strategy: Optional[dict[str, Any]] with
Optional[ChunkingStrategy]) and introduce or reuse a Metadata model/typed
mapping (e.g., Metadata or Dict[str, str] / a Pydantic model) for the metadata
field so shapes are validated at the API boundary; update all occurrences of
these fields (including the other similar definitions referenced around the
later ranges) to reference the typed models and adjust imports accordingly.

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 180-206: Update the success test to exercise the extra_body
adapter path by providing provider_id, embedding_model, and embedding_dimension
on the incoming VectorStoreCreateRequest so the wrapper moves them into
extra_body; e.g., in test_create_vector_store_success (or a new test) set body =
VectorStoreCreateRequest(name="test_store", provider_id="p1",
embedding_model="m1", embedding_dimension=128), then assert the mocked
mock_client.vector_stores.create was called with an extra_body containing those
keys/values and that the returned VectorStore is still handled
(response.id/name/status remain correct), keeping the existing mock setup for
AsyncLlamaStackClientHolder.get_client and configuration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed9adc5f-0cdf-4dc7-93e2-1881d261d61e

📥 Commits

Reviewing files that changed from the base of the PR and between ca0313c and 03470cb.

📒 Files selected for processing (4)
  • src/app/endpoints/vector_stores.py
  • src/models/requests.py
  • tests/unit/app/endpoints/test_vector_stores.py
  • tests/unit/models/requests/test_vector_store_requests.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/models/requests/test_vector_store_requests.py
  • tests/unit/app/endpoints/test_vector_stores.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/models/requests/test_vector_store_requests.py
  • tests/unit/app/endpoints/test_vector_stores.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/app/endpoints/vector_stores.py
  • src/models/requests.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/vector_stores.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Use typing_extensions.Self for model validators in type annotations
Pydantic data models should extend BaseModel
Include Attributes: section in Pydantic model docstrings

Files:

  • src/models/requests.py
🧠 Learnings (8)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use `from fastapi import APIRouter, HTTPException, Request, status, Depends` for FastAPI dependencies

Applied to files:

  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use complete type annotations for all class attributes; avoid using `Any`

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/requests.py
🔇 Additional comments (1)
tests/unit/models/requests/test_vector_store_requests.py (1)

45-52: No changes needed—the current exception handling is correct.

pytest.raises(ValueError) will successfully catch Pydantic's ValidationError exceptions because in Pydantic v2, ValidationError is a subclass of ValueError. The validators in both VectorStoreUpdateRequest and VectorStoreFileCreateRequest raise ValueError, which Pydantic wraps in ValidationError—and since ValidationError inherits from ValueError, the tests will pass as written.

While using the more explicit pytest.raises(ValidationError) would be consistent with other tests in the file (lines 152, 157) and already-imported ValidationError class, the current implementation is functionally correct and will not cause test failures.

@JslYoon JslYoon requested a review from tisnik April 9, 2026 21:57
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/vector_stores.py`:
- Around line 605-637: The retry loop around client.vector_stores.files.create
can re-raise a lock-related exception which gets caught by an outer
BadRequestError handler and incorrectly mapped to a 404; instead, detect when a
lock error occurs on the final attempt (use the is_lock_error and
is_last_attempt flags inside the except block for the create call) and raise a
proper HTTPException with your InternalServerErrorResponse (e.g., build
InternalServerErrorResponse(response="Failed to create vector store file",
cause="All retry attempts failed to create the vector store file") and raise
HTTPException(**response.model_dump())) rather than re-raising the original
exception; apply the same pattern to the similar block later (lines handling the
vector store file creation result around vs_file and the outer BadRequestError
mapping).
- Around line 338-346: The update response currently sets
metadata=vector_store.metadata or None which converts an empty dict to null;
change this to pass the metadata through unchanged
(metadata=vector_store.metadata) in the VectorStoreResponse returned by the PUT
handler in vector_stores.py so empty dicts remain {} and the response shape
matches other vector-store handlers.
- Around line 515-517: The local import of BytesIO inside the request path
causes pylint C0415; move the import to module scope by adding "from io import
BytesIO" at the top of the file (module imports) and remove the inline import
inside the handler; update any references to BytesIO in the function (e.g., in
the upload/handler function where the comment and inline import appear) to use
the module-level symbol so the request path no longer performs a local import.

In `@tests/unit/app/endpoints/test_vector_stores.py`:
- Around line 882-886: The test currently allocates a 200MB bytes object for
mock_file.read.return_value which is unnecessary; keep mock_file.size = 200 *
1024 * 1024 to trigger the size guard but change mock_file.read to raise if
invoked (e.g., set mock_file.read.side_effect to an exception or AssertionError)
so the test verifies the guard without allocating the large payload and fails if
the code incorrectly calls read().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d5b2653e-3339-4655-9839-47fea3d81e43

📥 Commits

Reviewing files that changed from the base of the PR and between 03470cb and a14ac62.

📒 Files selected for processing (3)
  • src/app/endpoints/vector_stores.py
  • src/constants.py
  • tests/unit/app/endpoints/test_vector_stores.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

  • src/constants.py
  • src/app/endpoints/vector_stores.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/vector_stores.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async unit tests

Files:

  • tests/unit/app/endpoints/test_vector_stores.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_vector_stores.py
🧠 Learnings (3)
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • src/constants.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/endpoints/vector_stores.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/vector_stores.py
🪛 GitHub Actions: Python linter
src/app/endpoints/vector_stores.py

[error] 517-517: pylint failed with C0415: Import outside toplevel (io.BytesIO) (import-outside-toplevel).

Comment on lines +338 to +346
return VectorStoreResponse(
id=vector_store.id,
name=vector_store.name,
created_at=vector_store.created_at,
last_active_at=vector_store.last_active_at,
expires_at=vector_store.expires_at,
status=vector_store.status or "unknown",
usage_bytes=vector_store.usage_bytes or 0,
metadata=vector_store.metadata or None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't collapse empty metadata to null in the update response.

metadata=vector_store.metadata or None turns a legitimate empty dict into None. The other vector-store handlers return metadata unchanged, so PUT /vector-stores/{id} ends up with a different response shape for the same value.

Proposed fix
         return VectorStoreResponse(
             id=vector_store.id,
             name=vector_store.name,
             created_at=vector_store.created_at,
             last_active_at=vector_store.last_active_at,
             expires_at=vector_store.expires_at,
             status=vector_store.status or "unknown",
             usage_bytes=vector_store.usage_bytes or 0,
-            metadata=vector_store.metadata or None,
+            metadata=vector_store.metadata,
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return VectorStoreResponse(
id=vector_store.id,
name=vector_store.name,
created_at=vector_store.created_at,
last_active_at=vector_store.last_active_at,
expires_at=vector_store.expires_at,
status=vector_store.status or "unknown",
usage_bytes=vector_store.usage_bytes or 0,
metadata=vector_store.metadata or None,
return VectorStoreResponse(
id=vector_store.id,
name=vector_store.name,
created_at=vector_store.created_at,
last_active_at=vector_store.last_active_at,
expires_at=vector_store.expires_at,
status=vector_store.status or "unknown",
usage_bytes=vector_store.usage_bytes or 0,
metadata=vector_store.metadata,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/vector_stores.py` around lines 338 - 346, The update
response currently sets metadata=vector_store.metadata or None which converts an
empty dict to null; change this to pass the metadata through unchanged
(metadata=vector_store.metadata) in the VectorStoreResponse returned by the PUT
handler in vector_stores.py so empty dicts remain {} and the response shape
matches other vector-store handlers.

Signed-off-by: Lucas <lyoon@redhat.com>
@JslYoon JslYoon force-pushed the LCORE-1617-vector-stor-operations branch from a14ac62 to 2bd58de Compare April 9, 2026 22:05
Signed-off-by: Lucas <lyoon@redhat.com>
@JslYoon JslYoon force-pushed the LCORE-1617-vector-stor-operations branch from f684423 to ffcc389 Compare April 9, 2026 22:31
Signed-off-by: Lucas <lyoon@redhat.com>
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in overall, thank you. We need to have a task to add integration tests and another to add end to end tests, and yet another to add documentation. I'll take care of it.

@tisnik tisnik requested a review from asimurka April 10, 2026 09:16

@router.delete(
"/vector-stores/{vector_store_id}",
responses={"204": {"description": "Vector store deleted"}},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this endpoint’s behavior to align with the existing DELETE v1/conversations/{conversation_id}. Namely:

  • Do not return 404 at all,
  • Return 200 with body suggesting whether the resource was actually deleted or not (see ConversationDeleteResponse
  • Continue returning standard error responses 401,403,500,503 as usual (update responses dict accordingly).

Signed-off-by: Lucas <lyoon@redhat.com>
@JslYoon JslYoon requested a review from asimurka April 10, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants