Skip to content

fix(lightspeed): pre-create /rag-content/vector_db/notebooks in init …#449

Open
JslYoon wants to merge 1 commit into
redhat-developer:release-1.10from
JslYoon:fix/RHDHBUGS-3371-lightspeed-notebooks-permissions
Open

fix(lightspeed): pre-create /rag-content/vector_db/notebooks in init …#449
JslYoon wants to merge 1 commit into
redhat-developer:release-1.10from
JslYoon:fix/RHDHBUGS-3371-lightspeed-notebooks-permissions

Conversation

@JslYoon

@JslYoon JslYoon commented Jun 22, 2026

Copy link
Copy Markdown

…container

On EKS/AKS, the RAG init container populates /rag-content/ but never creates the notebooks subdirectory. At runtime, llama-stack tries to write /rag-content/vector_db/notebooks/faiss_store.db and fails with PermissionError because it cannot create the directory on a volume it doesn't own. OCP avoids this via fsGroup/supplemental group defaults.

The fix pre-creates the directory and widens permissions before the sidecar starts, matching the fix the operator already applies via chmod -R 777 for the rest of vector_db.

Fixes: RHDHBUGS-3371

Description of the change

Which issue(s) does this PR fix or relate to

  • JIRA_issue_link

How to test changes / Special notes to the reviewer

Checklist

  • For each Chart updated, version bumped in the corresponding Chart.yaml according to Semantic Versioning.
  • For each Chart updated, variables are documented in the values.yaml and added to the corresponding README.md. The pre-commit utility can be used to generate the necessary content. Run pre-commit run --all-files to run the hooks and then push any resulting changes. The pre-commit Workflow will enforce this and warn you if needed.
  • JSON Schema template updated and re-generated the raw schema via the pre-commit hook.
  • Tests pass using the Chart Testing tool and the ct lint command.
  • If you updated the orchestrator-infra chart, make sure the versions of the Knative CRDs are aligned with the versions of the CRDs installed by the OpenShift Serverless operators declared in the values.yaml file. See Installing Knative Eventing and Knative Serving CRDs for more details.

…container

On EKS/AKS, the RAG init container populates /rag-content/ but never
creates the notebooks subdirectory. At runtime, llama-stack tries to
write /rag-content/vector_db/notebooks/faiss_store.db and fails with
PermissionError because it cannot create the directory on a volume it
doesn't own. OCP avoids this via fsGroup/supplemental group defaults.

The fix pre-creates the directory and widens permissions before the
sidecar starts, matching the fix the operator already applies via
chmod -R 777 for the rest of vector_db.

Fixes: RHDHBUGS-3371

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@JslYoon JslYoon requested a review from a team as a code owner June 22, 2026 21:15
@openshift-ci openshift-ci Bot requested review from Fortune-Ndlovu and rm3l June 22, 2026 21:15
@rhdh-qodo-merge

Copy link
Copy Markdown

PR Summary by Qodo

Fix Lightspeed RAG init to pre-create notebooks dir and relax /rag-content perms
🐞 Bug fix ⚙️ Configuration changes 🕐 Less than 10 minutes

Grey Divider

Description

• Pre-create /rag-content/vector_db/notebooks during RAG bootstrap init to avoid runtime mkdir
 failures.
• Widen /rag-content permissions so the llama-stack sidecar can write FAISS notebook storage on
 EKS/AKS.
Diagram

graph TD
  A["RAG bootstrap init"] --> B["Copy vector_db"] --> C[("/rag-content PV")]
  A --> D["mkdir notebooks"] --> E["/vector_db/notebooks"]
  A --> F["chmod a+rwX"] --> C
  G["llama-stack sidecar"] --> H["write faiss_store.db"] --> E
  subgraph Legend
    direction LR
    _job["Init/Runtime step"] ~~~ _pv[("Persistent Volume")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Set pod-level fsGroup/supplementalGroups for the volume
  • ➕ Aligns with OpenShift-style permission handling without chmod
  • ➕ Avoids broad world-writable permissions
  • ➖ May not work consistently across all storage classes/CSI drivers
  • ➖ Requires chart/pod spec changes beyond the init script and might impact other containers
2. Chown/chmod only the notebooks path (minimal scope)
  • ➕ Limits permission broadening to the exact directory needed
  • ➕ Reduces security exposure compared to recursive /rag-content chmod
  • ➖ May miss other write paths under /rag-content on some deployments
  • ➖ Requires careful auditing of all runtime write locations

Recommendation: Current approach is pragmatic for EKS/AKS: pre-creating the directory removes the failing mkdir path, and recursive a+rwX matches the existing operator behavior for vector_db. If security posture is a concern, consider narrowing the chmod scope to vector_db/notebooks in a follow-up.

Files changed (1) +2 / -0

Bug fix (1) +2 / -0
values.yamlCreate notebooks dir and relax /rag-content permissions in RAG init script +2/-0

Create notebooks dir and relax /rag-content permissions in RAG init script

• Adds a 'mkdir -p /rag-content/vector_db/notebooks' step during Lightspeed RAG bootstrap. Applies 'chmod -R a+rwX /rag-content' so the runtime sidecar can write the FAISS notebook store on volumes it does not own (notably EKS/AKS).

charts/backstage/values.yaml

@sonarqubecloud

Copy link
Copy Markdown

@rhdh-qodo-merge

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0) 📜 Skill insights (0)

Context used
✅ Tickets: RHDHBUGS-3371

Grey Divider


Action required

1. Non-root chmod breaks init 🐞 Bug ☼ Reliability
Description
The Lightspeed RAG initContainer runs as non-root with all capabilities dropped, but the new `chmod
-R a+rwX /rag-content can fail when the /rag-content` mount root isn’t owned by the container
user; because the script is chained with &&, this aborts the initContainer and blocks the pod in
init.
Code

charts/backstage/values.yaml[141]

+          chmod -R a+rwX /rag-content &&
Evidence
The initContainer explicitly runs as non-root and drops all capabilities; the new chmod is part of
an &&-chained shell script, so any chmod failure aborts the init. /rag-content is mounted from
an emptyDir volume, and the upstream chart defaults podSecurityContext to {} (no
fsGroup/ownership configuration), making chmod on the mount root a likely failure mode for non-root
containers.

charts/backstage/values.yaml[126-160]
charts/backstage/vendor/backstage/charts/backstage/templates/backstage-deployment.yaml[98-110]
charts/backstage/vendor/backstage/charts/backstage/templates/backstage-deployment.yaml[135-160]
charts/backstage/vendor/backstage/charts/backstage/values.yaml[272-279]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The init script runs `chmod -R a+rwX /rag-content` while the initContainer is `runAsNonRoot` with capabilities dropped. If the `/rag-content` mount root isn’t owned by the initContainer user, `chmod` returns non-zero and, due to `&&` chaining, the initContainer exits and the pod never starts.

### Issue Context
`/rag-content` is mounted from an `emptyDir` volume, and this chart doesn’t configure a pod-level security context (e.g., `fsGroup`) that would make ownership predictable for non-root chmod.

### Fix Focus Areas
- charts/backstage/values.yaml[126-142]
- charts/backstage/vendor/backstage/charts/backstage/templates/backstage-deployment.yaml[98-110]
- charts/backstage/vendor/backstage/charts/backstage/templates/backstage-deployment.yaml[135-160]
- charts/backstage/vendor/backstage/charts/backstage/values.yaml[272-279]

### Suggested fix
Change the chmod to target only paths created by the initContainer (and/or avoid touching the mount root), e.g.:
- `chmod -R a+rwX /rag-content/vector_db /rag-content/embeddings_model`
 or
- `chmod -R a+rwX /rag-content/*`
Additionally, ensure the notebooks directory itself is writable by the runtime UID (e.g., `mkdir -p -m 0777 /rag-content/vector_db/notebooks`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Overbroad permissions change 🐞 Bug ⚙ Maintainability
Description
chmod -R a+rwX /rag-content makes the entire RAG content tree writable, even though the configured
write target is specifically the notebooks sqlite DB under /rag-content/vector_db/notebooks; this
is broader than necessary and increases the blast radius of accidental writes.
Code

charts/backstage/values.yaml[141]

+          chmod -R a+rwX /rag-content &&
Evidence
The Lightspeed configuration declares a sqlite DB under /rag-content/vector_db/notebooks, but the
init script applies recursive write permissions to the entire /rag-content mount, not just the DB
directory subtree.

charts/backstage/files/lightspeed/config.yaml[153-166]
charts/backstage/values.yaml[135-142]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The initContainer recursively grants write permissions to all of `/rag-content`, including data that is expected to be static (e.g., the embeddings model), while the runtime write path in config is the notebooks sqlite DB under `/rag-content/vector_db/notebooks`.

### Issue Context
The Lightspeed config explicitly points the notebooks store to `/rag-content/vector_db/notebooks/faiss_store.db`, so write permissions only need to cover that subtree (and potentially other vector_db DB locations), not the whole mount.

### Fix Focus Areas
- charts/backstage/values.yaml[135-142]
- charts/backstage/files/lightspeed/config.yaml[153-166]

### Suggested fix
Restrict permission widening to the minimal required subtree, e.g.:
- `mkdir -p -m 0777 /rag-content/vector_db/notebooks`
- `chmod -R a+rwX /rag-content/vector_db`
Avoid granting write permissions across all of `/rag-content` unless there is a documented runtime need for it.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@rm3l rm3l left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JslYoon You'll need to also bump the chart version, run the pre-commit hooks and push the resulting changes. See the checklist on the PR description. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants