Skip to content

Feat: add wiki-memory-tool MCP service for multi-agent wiki knowledge#548

Open
aslom wants to merge 1 commit into
kagenti:mainfrom
aslom:wiki-memory-service
Open

Feat: add wiki-memory-tool MCP service for multi-agent wiki knowledge#548
aslom wants to merge 1 commit into
kagenti:mainfrom
aslom:wiki-memory-service

Conversation

@aslom
Copy link
Copy Markdown

@aslom aslom commented Jun 4, 2026

Summary

MVP wiki service to validate multi-tenant multi-agent collaboration using shared MCP services with Kagenti sandbox and OpenShift.

Implements git-backed wiki with SPIFFE workload identity for agents, GitHub OAuth for humans, per-topic ACL with team-based access control, and GitHub
Pages rendering with light/dark mode.

Related issue(s)

Relates-to: kagenti/kagenti#1461

(Optional) Testing Instructions

cd mcp/wiki_memory_tool
uv sync
uv run python run_local.py --clean
# In another terminal:
uv run python test_agents.py
uv run python test_user_skills.py

Copy link
Copy Markdown
Contributor

@mrsabath mrsabath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

MVP wiki service for multi-agent collaboration — solid scope, has assertive tests (test_user_skills.py, test_agents.py, test_user_access.py), uses yaml.safe_load, HMAC-SHA256 for JWTs, and a non-root Dockerfile USER.

Three must-fix items block merge:

  1. DCO failing — commit is Signed-off-by: Aleksander Slominski <aslom@apache.org> but authored by aslom@us.ibm.com. DCO requires the sign-off email to match the author's email. Fix with git commit --amend -s --reset-author (or edit the trailer to use the IBM address) and force-push.
  2. verify=False in production MCP client (mcp_server.py:44) — ships MITM-vulnerable by default.
  3. Hardcoded JWT fallback secret (wiki_service.py:47) — "dev-secret-change-me" is now in the public diff; if JWT_SECRET_KEY is unset in prod, the service silently signs with this. Fail closed instead.

K8s manifest also needs a securityContext, and the Dockerfile/deployment image should be pinned (:latest from a personal registry namespace).

Areas reviewed: Python, Dockerfile, K8s manifests, CI, commit format
Commits: 1 commit (DCO failing — email mismatch)
CI status: failing (DCO)

Assisted-By: Claude Code

Comment thread mcp/wiki_memory_tool/wiki_service.py Outdated
Comment thread mcp/wiki_memory_tool/mcp_server.py Outdated
Comment thread mcp/wiki_memory_tool/k8s/deployment.yaml
Comment thread mcp/wiki_memory_tool/Dockerfile Outdated
Comment thread mcp/wiki_memory_tool/k8s/deployment.yaml Outdated
Comment thread mcp/wiki_memory_tool/wiki_service.py
Comment thread mcp/wiki_memory_tool/Dockerfile Outdated
@aslom aslom force-pushed the wiki-memory-service branch from 43e27b9 to 4385b91 Compare June 4, 2026 21:51
aslom added a commit to aslom/agent-examples that referenced this pull request Jun 4, 2026
Resolved issues:
1. verify=False in MCP client — TLS verification now enabled by default,
   disabled only with explicit WIKI_INSECURE_TLS=1 env var + warning log
2. Hardcoded JWT fallback secret — service now fails with RuntimeError if
   JWT_SECRET_KEY env var is unset; run_local.py sets a dev-only default
3. :latest image tag — Dockerfile base pinned to sha256 digest, deployment
   manifest uses semver 0.0.1 tag
4. Missing securityContext — added runAsNonRoot, runAsUser 1001,
   readOnlyRootFilesystem, allowPrivilegeEscalation false, drop ALL caps
5. print() calls — replaced all 9 with structured logging module calls
6. chmod 777 — reduced to 755 for .venv and uv-cache in Dockerfile

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Aleksander Slominski <aslom@us.ibm.com>
@aslom aslom force-pushed the wiki-memory-service branch from b25cebb to e0177f1 Compare June 4, 2026 22:54
@aslom aslom requested a review from mrsabath June 4, 2026 23:01
Copy link
Copy Markdown
Contributor

@mrsabath mrsabath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Round 2 review. All three round-1 must-fix items are addressed in commit 7de6149:

  • verify=False in the MCP client is now gated by WIKI_INSECURE_TLS with a startup warning (mcp_server.py:47-50)
  • JWT secret fails fast at import time if JWT_SECRET_KEY is unset (wiki_service.py:50-52)
  • K8s deployment now has a proper securityContext (runAsNonRoot, runAsUser 1001, readOnlyRootFilesystem, drop ALL caps)
  • Dockerfile base image pinned by SHA256 digest
  • Deployment image tag bumped from :latest to :0.2.0

DCO is also passing (sign-off email now matches author).

Approving. Remaining items below are non-blocking: a possible version-drift between pyproject.toml and deployment.yaml, an autouse test fixture that could leak mutation state across tests, a missing env-var doc entry for WIKI_INSECURE_TLS, and a brittle pyproject parser in deploy.py.

Areas reviewed: Python (delta), Dockerfile, K8s manifests, tests, README, CI
Commits: 4 (all signed-off, DCO passing)
CI status: ✅ passing

Assisted-By: Claude Code

Comment thread mcp/wiki_memory_tool/k8s/deployment.yaml
Comment thread mcp/wiki_memory_tool/test_user_skills.py
Comment thread mcp/wiki_memory_tool/README.md
Comment thread mcp/wiki_memory_tool/deploy.py Outdated
Copy link
Copy Markdown

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@mrsabath
Copy link
Copy Markdown
Contributor

mrsabath commented Jun 5, 2026

@aslom — gave the wiki-memory-tool a proper end-to-end walkthrough today. Quick report.

Tested

  • Local mode (run_local.py --clean + test_agents.py) on Apple Silicon — clean install, all 4 ACL/discovery/query scenarios pass.
  • Live HyperShift instance (https://wiki-memory-service-team1.apps.ykt1.hcp.res.ibm.com, v0.2.0) via wiki_cli.py. All query subcommands, discover novelty, discover write --draft, kwiki login (GitHub device flow), whoami — all working. TF-IDF search ranks reasonably, ACL enforces consistently, draft commits land on disk and show up in query activity.

I skipped the kind deploy — you already had a live instance, and deploy.py hard-requires WIKI_REMOTE_URL with a GitHub PAT, so it would've added friction without adding signal.

Shortcomings worth flagging

  1. OAuth identity has no write path on the deployed instance. whoami warned "No kaslomorg teams in token" — the live ACL is wired to kaslomorg/* GitHub teams, so github:mrsabath (and any non-kaslomorg viewer) gets read-only on ai and 403 everywhere else. For demos this means viewers will only see SPIFFE-pseudo-identity writes; the GitHub OAuth write story is invisible.
  2. The "SPIFFE identity" is just an HTTP header. wiki_cli.py sends X-Spiffe-Id: ... and the service trusts it. Fine for an MVP but the README says "SPIFFE workload identity" — worth a banner clarifying header-based identity is dev-only.
  3. No Kagenti integration yet. No operator CR, no SPIRE wiring, no Keycloak/AuthBridge. Running this on kind doesn't exercise the Kagenti stack — the title and README imply more integration than this PR delivers.
  4. deploy.py hard-requires WIKI_REMOTE_URL even though wiki_service.py treats it as optional. Blocks the simplest "kick the tires on kind" path.
  5. Image is linux/amd64 only — emulates on Apple Silicon, but a multi-arch manifest would make local kind testing painless.
  6. No cleanup story. My draft-write left ai/_drafts/mrsabath-demo.md on the live instance. Repeated demos will accumulate test pages without a kwiki admin clean or similar.
  7. (Carry-over from round-2 review) WIKI_INSECURE_TLS still missing from the README env vars table; image-tag drift between pyproject.toml and k8s/deployment.yaml.

Suggestions, in priority order

  1. Pre-seed an ACL that demos both identity paths convincingly — a team-demo topic where any authenticated GitHub user can write, and an agents-demo topic for SPIFFE writes. Then any viewer can sign in and write within 30s.
  2. Ship a Kagenti Tool CR that brings this up via the Kagenti operator. Without it, the Kagenti angle is still aspirational.
  3. Pin SPIRE in the demo path — replace X-Spiffe-Id headers with real SVIDs so the security story is defensible.
  4. kwiki admin clean --topic <t> --pattern '_drafts/*' + a make demo-reset target.
  5. 60-second README quick-start that doesn't require a PAT or OAuth app.
  6. Multi-arch image for Apple Silicon kind clusters.
  7. Top-of-README "what this is / what this isn't" — set expectations that Kagenti glue is follow-up work (#1461).

Service itself is solid — ACL model is real, search ranks well, OAuth flow is smooth. Biggest demo unlock is the Kagenti integration glue.

Tested by: @mrsabath
Assisted-By: Claude Code

Copy link
Copy Markdown
Member

@esnible esnible left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get make test-docker to work?

Rebase off main and follow the pattern of adding to TEST_A2A_SKIP to skip building if it is too hard to build.

The goal is to ensure that PRs, especially dependabot PRs, break the ability to do a release which requires building the images.

@aslom aslom force-pushed the wiki-memory-service branch from 593df0d to 5e059f0 Compare June 5, 2026 18:27
aslom added a commit to aslom/agent-examples that referenced this pull request Jun 5, 2026
Resolved issues:
1. verify=False in MCP client — TLS verification now enabled by default,
   disabled only with explicit WIKI_INSECURE_TLS=1 env var + warning log
2. Hardcoded JWT fallback secret — service now fails with RuntimeError if
   JWT_SECRET_KEY env var is unset; run_local.py sets a dev-only default
3. :latest image tag — Dockerfile base pinned to sha256 digest, deployment
   manifest uses semver 0.0.1 tag
4. Missing securityContext — added runAsNonRoot, runAsUser 1001,
   readOnlyRootFilesystem, allowPrivilegeEscalation false, drop ALL caps
5. print() calls — replaced all 9 with structured logging module calls
6. chmod 777 — reduced to 755 for .venv and uv-cache in Dockerfile

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Aleksander Slominski <aslom@us.ibm.com>
Copy link
Copy Markdown
Contributor

@mrsabath mrsabath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing the force-pushed commit 5e059f0 only (since my previous APPROVED on this PR).

Most of the changes are clean wins — tomllib for version parsing, kubectl set image instead of rollout restart, session-scoped pytest fixture, useful clean-test-pages admin command, and a clear Status section in the README. Nice work addressing the earlier feedback.

One blocker on the Dockerfile change, plus two smaller notes inline.

Areas reviewed: Dockerfile (security), Python (deploy.py, wiki_cli.py, test_user_skills.py), Markdown
DCO: ✅ pass • CI: ✅ pass • Sign-off: ✅ present

Comment thread mcp/wiki_memory_tool/Dockerfile Outdated
Comment thread mcp/wiki_memory_tool/deploy.py Outdated
Comment thread mcp/wiki_memory_tool/wiki_cli.py
Git-backed wiki service implementing persistent shared memory for AI agents
in the Kagenti platform. Validates multi-tenant multi-agent collaboration
using MCP with SPIFFE workload identity, GitHub OAuth for humans, and
per-topic ACL with team-based access control.

Key capabilities:
- SPIFFE-authenticated agent endpoints (discovery write, query read)
- GitHub OAuth device flow for human users (CLI + JWT)
- Full-text search, backlinks, tag graph, activity log
- Git-backed storage with optional remote push
- Kubernetes deployment with security hardening
- MCP server for Claude Code integration
- CLI (kwiki) for human interaction

Security:
- JWT_SECRET_KEY required (no hardcoded fallback)
- TLS verification on by default (WIKI_INSECURE_TLS opt-out for dev)
- Dockerfile pinned to digest, non-root USER 1001
- K8s securityContext: readOnlyRootFilesystem, drop ALL caps
- PAT tokens redacted in error output

Testing:
- 16 pytest unit tests (test_user_skills.py via TestClient)
- Live agent integration tests (test_agents.py)
- Live user access tests (test_user_access.py)

Relates-to: kagenti/kagenti#1461

Signed-off-by: Aleksander Slominski <aslom@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
@aslom aslom force-pushed the wiki-memory-service branch from 5e059f0 to 0914738 Compare June 5, 2026 20:06
Copy link
Copy Markdown
Contributor

@mrsabath mrsabath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4 — verified the three round-3 items in 09147384:

  • Dockerfile is back to digest-pinned quay.io/fedora/python-314@sha256:381b2b... — supply-chain pin restored.
  • deploy.py print_summary now branches on repo_url; local-only mode prints Git remote: (local-only, no remote configured) and skips the misleading Secret: wiki-github-pat line.
  • wiki_cli.py clean-test-pages predicate is split into path_prefixes (full-path match) and basename_prefixes (basename match) — the dead clause is gone.

No new regressions in the force-push. Earlier security gates still hold: JWT_SECRET_KEY fail-fast on import (wiki_service.py), TLS-insecure gated by WIKI_INSECURE_TLS=1 with logged warning, K8s securityContext (runAsNonRoot, readOnlyRootFilesystem, drop ALL caps), image pinned to :0.2.0. The new run_local.py:setdefault('JWT_SECRET_KEY', 'local-dev-secret-do-not-use-in-production') is correct local-dev behavior — production still fails fast.

Areas re-reviewed: Dockerfile, deploy.py, wiki_cli.py, regression scan across security-critical paths
CI status: ✅ DCO passing

Approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New /:ToDo

Development

Successfully merging this pull request may close these issues.

5 participants