Skip to content

Migrate documentation from Jekyll to Astro Starlight#897

Open
jordanrburger wants to merge 44 commits into
mainfrom
feature/astro-migration
Open

Migrate documentation from Jekyll to Astro Starlight#897
jordanrburger wants to merge 44 commits into
mainfrom
feature/astro-migration

Conversation

@jordanrburger
Copy link
Copy Markdown
Contributor

Summary

Migrates the entire Keboola documentation site from Jekyll (Ruby/Docker) to Astro Starlight (Node.js), a modern documentation framework.

  • 264 markdown files and 1,218 images migrated to Starlight format
  • Sidebar auto-generated from existing _data/navigation.yml
  • Redirect pages generated from redirect_from frontmatter for URL compatibility
  • CI workflows updated from Docker/Jekyll to Node.js/Astro
  • Deploys to the same S3 bucket (help.keboola.com) with the same caching strategy
  • Client-side navigation with prefetch for fast, flicker-free page loads
  • Custom 404 page

Steps to go live

  1. Review & approve this PR — CI runs npm ci && npm run build on PRs to validate the build
  2. Verify redirects — Spot-check that existing URLs (especially linked from external sources) still work via the redirect_from frontmatter mappings
  3. Confirm AWS secrets — The deploy workflow uses AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (same secrets as today). Verify these are still configured in the repo's GitHub Actions secrets
  4. Merge to main — On merge, the main.yml workflow will:
    • npm ci + npm run build to produce a static dist/ folder
    • aws s3 sync dist s3://help.keboola.com --delete --acl "public-read" to deploy
  5. Verify production — After deploy, check https://help.keboola.com and a few key pages:
    • Homepage loads correctly
    • Sidebar navigation works
    • Search works
    • Old URLs redirect properly
  6. DNS / CDN — No changes needed; the S3 bucket and domain remain the same

What changes in the deploy pipeline

Before (Jekyll) After (Astro)
Runtime Ruby via Docker Node.js 22
Build command jekyll build_site/ npm run builddist/
Deploy s3 rm --recursive then s3 cp s3 sync --delete (atomic, safer)
CI on PRs Build only Build only (same)

Test plan

  • CI build passes on this PR
  • Local npm run build produces a clean dist/ output
  • Spot-check 10+ pages for content accuracy vs current live site
  • Verify sidebar navigation matches existing structure
  • Test redirect URLs (pick 5 from redirect_from frontmatter entries)
  • Confirm search functionality works
  • Check mobile responsiveness

🤖 Generated with Claude Code

jordanrburger and others added 22 commits March 18, 2026 16:58
Preparing for Jekyll-to-Astro migration: ignore worktree directory,
Node.js dependencies, and Astro build output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set up the core Starlight project structure for migrating Keboola docs
from Jekyll: package.json with Astro/Starlight dependencies, astro config
with GTM tracking, Keboola brand CSS overrides, custom Head component for
GTM noscript, content config, and TypeScript config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create scripts/convert-nav.mjs that parses _data/navigation.yml and
produces src/sidebar.mjs with the full Starlight sidebar tree (~200
entries, up to 4 levels deep). Update astro.config.mjs to import and
use the generated sidebar. Add gen:sidebar npm script for easy
regeneration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Transforms all markdown files from Jekyll format to Astro Starlight:
- permalink → slug frontmatter conversion
- Remove layout/showBreadcrumbs, convert sitemap→pagefind
- Convert highlight blocks to fenced code blocks
- Convert tip/warning/beta includes to Starlight admonitions
- Remove TOC markers and Kramdown attribute lists
- Copy images and special assets to correct locations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Docker-based Jekyll builds with Node.js 22 + npm ci + npm run build.
Deploy step now uses pip-installed awscli directly and targets dist/ instead of _site/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create src/pages/404.astro with branded error page content and copy
keboola-kolecko.png to public/ for the 404 page image.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix \r\n line ending bug in permalink→slug conversion that caused 34
files to retain their permalink: frontmatter. All Jekyll patterns
(includes, TOC markers, highlight blocks, kramdown attrs, image-popup)
are now fully transformed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ssues

- Upgrade from Astro 5/Starlight 0.34 to Astro 6/Starlight 0.38 to resolve
  Zod v3/v4 schema validation conflict that caused build failures
- Add docsLoader() to content config (required by Starlight 0.38)
- Extend docs schema with redirect_from field for Jekyll redirect compatibility
- Copy 4 missing .webp images that were skipped during migration
- Add .webp to IMAGE_EXTS in migration script to prevent future omissions
- Fix CSS @import ordering (must precede other statements)
- Fix sidebar slug: components/data-apps/backend-versions -> data-apps/backend-versions
- Remove src/pages/404.astro that conflicted with Starlight's built-in 404 route
  (404 content is served from src/content/docs/404.md instead)
- Add .astro/ to .gitignore
- Update starlight-image-zoom to v0.14 for Starlight 0.38 compatibility

Build produces 264 pages successfully with Pagefind search index.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tter

Create a custom Astro integration that reads redirect_from arrays from
content frontmatter and generates static HTML redirect pages (using meta
refresh) during the build. This ensures old URLs continue to work by
redirecting visitors to the correct page. Generates 162 redirect pages
across 152 content files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aping

- Strip {% raw %}/{% endraw %} and convert {% comment %}/{% endcomment %}
  to HTML comments in migration script (C1: 10 affected files)
- Add favicon: '/favicon.ico' to Starlight config to match public/ asset (C2)
- Replace pip install awscli + rm/cp with aws s3 sync --delete in CI (I2)
- Add escapeHtml() to redirect-from integration for safe URL interpolation (I4)
- Re-run migration to apply Liquid tag fixes to all content files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix light mode by scoping CSS color overrides to dark/light themes
- Fix sidebar: rename parent page entries to "Overview", collapse all groups
- Fix images: copy content images to public/ for absolute path serving
- Remove duplicate ## Overview headings from 9 content pages
- Add "Copy as Markdown" button to every page for AI agent workflows
- Make GTM conditional on help.keboola.com hostname (no cookie modal on localhost)
- Make logo visible in dark mode with CSS filter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Jekyll/Docker instructions with Node.js/Astro setup,
document project structure, content authoring, build process,
sidebar regeneration, and key features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds <ClientRouter /> to the Head component for SPA-style page
transitions instead of full page reloads on every navigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ClientRouter view transitions caused visible flickering. Replaced with
Astro's prefetch (hover strategy) which preloads pages in the background
so they load near-instantly on click without any visual artifacts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…w Transitions

The render-blocking Google Fonts @import in CSS caused a flash on every page
load. Self-hosting the Lato WOFF2 files eliminates the external request entirely.
Re-enabling Astro ClientRouter (View Transitions) provides SPA-style navigation
so pages swap without full reloads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… fallback

Three-part fix for page navigation flicker:
1. Inline anti-FOUC styles set correct background color immediately before
   any CSS loads, preventing white flash during theme initialization
2. Disable all View Transition crossfade animations via CSS for instant swap
3. Use "swap" fallback mode for non-VT browsers (instant DOM swap)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When navigating via View Transitions, the new page's sidebar would reset
group expand states. This script preserves the user's sidebar state while
ensuring the group containing the active page is always open.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Starlight renders group labels (items with children) as bold by default,
creating visual inconsistency with leaf links at the same nesting level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Top-level groups keep bold/large styling, but nested sub-groups now use
the same font-size and weight as regular leaf links for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Starlight does not accept slug: "" for sidebar links. Changed to
slug: "index" which correctly references the root docs page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jordanrburger
Copy link
Copy Markdown
Contributor Author

This is a draft of the docs migration. It SHOULD NOT be merged until further notice.

@jordanrburger jordanrburger requested review from odinuv and removed request for odinuv March 24, 2026 22:45
@jordanrburger jordanrburger marked this pull request as ready for review May 12, 2026 12:21
jordanrburger and others added 4 commits May 12, 2026 15:02
- Restore Google Search Console verification: copy
  google9cde6c6b9250e5a4.html into public/ so it's served at the root.
  The file was at the repo root for Jekyll (picked up automatically) but
  wasn't copied during the Astro migration, which would break GSC.

- Replicate Jekyll's redirect_to behavior for /data-apps/oidc/. Jekyll's
  redirect-from plugin treated the page as a server-side redirect to
  /data-apps/authentication/; the Astro redirect-from integration only
  honors redirect_from. Move the alias into the target page's
  redirect_from list and delete the now-empty oidc/index.md stub.

Verified with astro build: sitemap content matches Jekyll exactly,
163 redirect pages generated (was 162 + 1 skipped due to the oidc
collision), /data-apps/oidc/ now serves a meta-refresh redirect.
… components

Reskins the Astro/Starlight docs with the "Beacon" direction on Keboola
design system tokens (blue links, green primary, light code blocks). The
design is fully data-driven: every page picks up the new chrome and
auto-upgrades content through a remark plugin, with no per-page edits.

New / changed files:

- src/styles/custom.css — full Keboola token set mapped onto Starlight's
  CSS variables. Restyles sidebar (with "Ask Kai" gradient button injected
  via ::before), pill/eyebrow header, page-icon chip, gradient Ask-Kai
  hero, light code blocks, asides in Keboola palette, right-rail TOC,
  prev/next pagination cards, and feedback widget. Dark theme included.

- src/components/PageTitle.astro — auto-derives an eyebrow section pill
  ("TRANSFORMATIONS") and a page icon (snowflake, python, sparkles…) from
  the page slug, renders a lede from description: frontmatter, and shows
  an "Ask Kai about this page" hero with shortcut button.

- src/integrations/beacon-transforms.mjs — new remark plugin that
  auto-upgrades plain markdown on every page:
    1. First short bullet list before any H2 -> green-check advantage grid
    2. Ordered lists with title-then-detail items -> numbered step cards
    3. Two consecutive code blocks in the same language -> side-by-side
       compare/contrast pair grid
    4. **Note:** / **Tip:** / **Important:** bold-prefixed paragraphs ->
       matched Starlight aside callouts
  Per-page opt-out via beacon: false in frontmatter.

- astro.config.mjs — wires the remark plugin into markdown.remarkPlugins
  and enables pagination so prev/next nav appears in the page footer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Starlight 0.30+ moved page context from Astro.props.entry to
Astro.locals.starlightRoute. The previous PageTitle override was reading
from props, where `entry` is undefined — so every page fell back to the
"Keboola Docs" pill with no icon. Switching the source restores the
correct section pill ("Transformations", "Kai · AI Assistant", "Data
Apps", …) and the page-icon chip (❄️, ✨, 🚀, …) across the whole site.

Also expanded the icon map to cover tutorial sub-sections, Kai feature
pages, AI MCP, and the three Data Apps variants so popular pages get
distinct chips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Acts on review feedback from a 4-agent sweep of tutorial/, transformations/,
components/, kai/, ai/, data-apps/, storage/, management/, flows/, workspace/,
and catalog/. Six fixes here, all zero per-page edits.

1. Auto hero lede (PageTitle.astro):
   No page in the corpus declares `description:` in frontmatter, so the hero
   was bare site-wide. PageTitle now derives a lede from the raw markdown
   body of `route.entry.body` — picks the first prose paragraph, skips
   editorial italics / images / blockquotes / lists / HTML blocks, strips
   inline syntax, prefers a sentence boundary up to ~220 chars. Tested on
   /transformations/snowflake-plain/, /kai/, /tutorial/.

2. Italic-bold callouts (beacon-transforms.mjs):
   `transformBoldedCallouts` now also recognizes `***Note:** body*`
   (emphasis > strong) and bold-prefixed blockquotes. Unwraps the leading
   strong, trims trailing colon/dashes, and replaces the source paragraph
   with a Starlight aside.

3. Legacy `<div class="alert alert-warning">` HTML blocks → Starlight asides:
   New `transformLegacyAlerts` pass scans top-level html nodes for the
   alert-info / alert-warning / alert-danger / alert-success variants,
   collects siblings up to the matching `</div>`, and replaces the range
   with an aside. Confirmed working on /tutorial/ad-hoc/ and
   /tutorial/onboarding/usage-blueprint/ (2 asides each, zero raw HTML).

4. Pseudo-H4: `**Bold ending colon:**` + immediate list → real `<h4>` with
   beacon-pseudo-h4 chrome and a slugified id. Lifts subsection labels into
   the right-rail TOC and gives them a uppercase-tracking subhead style.
   /tutorial/onboarding/architecture-guide/ now has 13 of these (was 0).

5. Tightened ordered-step heuristic: requires `(title-shape AND
   (imperative-verbs OR cued))` instead of just title-shape. Cue =
   preceding paragraph matching /follow these steps:/ or nearest heading
   matching /^(steps|how to|setup|example|walkthrough|…)/. Drops
   /tutorial/manipulate/ false positives from 7 to 0; arch-guide from 9 to
   1; snowflake stays at 1 (the real Example walkthrough).

6. Imperative unordered → step cards: when a UL is preceded by a step cue
   AND ≥50% of items start with an imperative verb, the plugin flips
   `ordered = true` and applies `.beacon-steps`. Mirrors fix #5 for the
   "Follow these steps:" + bullets idiom common on transformation pages.

Also extends content.config.ts schema with `ask`, `icon`, `section`,
`beacon` so per-page overrides validate cleanly. Adds CSS for
.beacon-pseudo-h4 (uppercase tracking + bottom rule).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jordanrburger and others added 18 commits May 13, 2026 16:31
Adds the second batch of zero-edit markdown upgrades surfaced by the
4-agent doc review.

1. Backend-size pills — UL of XSmall / Small (default) / Medium / Large
   becomes a horizontal pill row with the default item highlighted green
   and DEFAULT badge. Triggers automatically on transformations/{snowflake,
   bigquery,oracle,python-plain,r-plain}/index.md.

2. Term–definition glossary lists — UL where 66%+ of items start with
   "**Term** — definition" becomes a 2-column .beacon-glossary grid with
   left-aligned terms. Picks up catalog/, storage/buckets/, management/
   project/limits/.

3. Label-value spec tables — 2-col GFM tables where the first column is
   bold across rows become borderless .beacon-spec-table layouts (label
   left, mono-formatted value right). Catches management/jobs/ (3 tables)
   and similar reference pages.

4. Status-emoji table cells — cells starting with ✅/❌/🚧/⚠️/🟢/🔴/🟡 get
   .beacon-status pill chrome, table gets zebra rows. flows/conditional-
   flows/ shows 9 such cells.

5. Prompt bubbles — `Label:` paragraph + fenced code block (no language,
   quoted body) becomes a chat-bubble container with sans-serif italic
   prompt text. kai/use-cases/ converts 16 of these.

6. Figure + caption — `<div align="center">caption</div>` immediately
   after an image paragraph becomes a real <figure>+<figcaption>.
   Architecture-guide converts 4.

7. Long-table scroll wrapper — every non-spec table gets wrapped in a
   .beacon-table-scroll container so wide reference tables (Postgres
   extractor data-types, limits credit tables, etc.) get horizontal
   overflow without breaking layout.

Hit a content-cache wrinkle while iterating: Astro/Starlight stores the
parsed mdast in .astro/data-store.json, so changes to a remark plugin
don't take effect on already-cached entries until the cache is cleared.
Documented in passing — relevant when iterating on the plugin locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously these were applied by hand after running migrate.mjs, and
re-running the script silently undid them. Both are now part of the
script so the output stays idempotent against the Jekyll source.

- Strip a leading '## Overview' H2. The Jekyll-era organizational
  wrapper is redundant under Starlight's title + lede convention.
  Only strips when it's the first H2 in the body, so mid-page Overview
  sections (e.g. catalog/index.md) are preserved.

- Honor Jekyll's redirect_to frontmatter. The Astro redirect-from
  integration only honors redirect_from, so a 'redirect_to: /x/' stub
  used to land as a duplicate page. The pre-pass now skips writing
  those stubs and folds the stub's permalink + its own redirect_from
  list into the target page's redirect_from. Matches the manual fix
  from bead520 for data-apps/oidc -> data-apps/authentication.
Propagates the changes pulled in from the preceding merge of main:
- New Kai Settings page + Kai navigation entries (settings, security & privacy)
- Branched storage GA copy + new screenshot
- Removal of deprecated backend references (Teradata, Exasol, Synapse, Redshift)
  and Julia references from overview/transformations/limits pages
- Beta-warning deletion + multi-project, mappings, tutorial copy edits
- Fixed sidebar path for data-apps/backend-versions
_data/navigation.yml pointed at /components/data-apps/backend-versions/,
which is a redirect alias rather than the page's real permalink
(/data-apps/backend-versions/). Jekyll tolerated this because
jekyll-redirect-from served the alias, but Starlight requires the entry
to map to a real content slug and the astro build was failing with:
  "The slug 'components/data-apps/backend-versions' specified in the
   Starlight sidebar config does not exist."

Point the nav at the canonical URL and regenerate src/sidebar.mjs.
The Telemetry Data connector page (5520a76) drives its table sections
from `_data/telemetry_tables.yml` rendered via `_includes/telemetry-table.html`
inside Jekyll Liquid loops. The migration script previously passed the
Liquid through untouched, so the page was effectively blank on Astro and
new entries like `kbc_mcp_event` never appeared.

- Add expandTelemetryTableIncludes(): loads the YAML, sorts by id, and
  resolves the three mode-filtered `{% for %}{% if %}{% include %}` loops
  to concatenated markdown using a faithful JS port of the include
  template. Also resolves `{{ site.data.telemetry_tables | jsonify }}`
  for the interactive diagram viewer.
- Add copyJekyllAssetsJs(): mirror `assets/js/*.js` into `public/assets/js/`
  so pages embedding `<script src="/assets/js/...">` (like the diagram
  viewer on telemetry-data) still load on Astro. `assets/` is in
  SKIP_DIRS for the markdown walk, so these files weren't being copied.
Brings in everything merged on main since 2026-03-20 (PRs #888 through
#920), including:

- Telemetry: dashboards rewrite (#900-class), connector docs migrated to
  YAML+include (#896-class), new kbc_mcp_event table, alphabetical
  ordering, ER diagram viewer
- Data Apps: storage-access docs (#914, #910), backend-versions index
  rename, streamlit config.toml docs (#917), nav restructure
- Custom Python component docs (#888), Salesforce External Client App
  Spring '26 updates (#920), Governed Change Management (#918)
- Public-beta-warning refactor (now 2 instances total)

Sidebar regenerated with new entries. Telemetry-data Liquid loops are
now expanded via the new transform (50 tables rendered, 0 Liquid tags
left in the page).
…h refs

Markdown pages reference images with a mix of absolute paths
(`/management/telemetry/.../foo.png`) and relative paths (`imgs/foo.png`).
Astro routes absolute paths through `public/`, while relative paths resolve
against the page folder under `src/content/docs/`. The migration script
was only copying images to the latter, so every absolute-path ref required
a manual one-time copy into `public/` (commit 474b96b). New or renamed
images added on `main` afterwards (e.g. the telemetry-dashboards rewrite)
were 404ing in production because nobody re-ran that manual step.

Copy each content image to both locations so absolute and relative refs
both work, and re-running migrate.mjs keeps `public/` in sync with the
Jekyll source going forward.

Includes the 23 new images this run picks up from main: Custom Python,
PowerBI/Tableau triggers, telemetry-dashboards rewrite, etc.
…ion, font flicker

1) H2 whitespace gap (e.g. overview/#extending-the-platform)
   .sl-markdown-content h2 had `display: flex` — that made the h2 a
   block-level flex row, pushing the .sl-anchor-link sibling (inline-flex
   with negative inline-start margin to overlay) onto its own "line" below
   the heading. The intended behavior was for the anchor to sit inside the
   heading's line box. Dropping the unused flex/align-items/gap restores
   normal flow and the anchor-link overlay works as designed.

2) Advantage-list (.beacon-check-grid) firing on regular content lists
   The transform promoted any short pre-H2 unordered list to a two-column
   green-check grid, which caught content lists like flows/orchestrator/
   tasks/nesting/ ("Adform data source connector with the `Campaigns`
   configuration", …) and made them wrap mid-word. Added three filters
   that pin the heuristic to feature/benefit lists:
     - items must start with uppercase or digit (rejects sentence
       continuations like "vary in terms and conditions.")
     - items must contain a space (rejects entity-name lists like
       "Activities / Customers / Subscriptions")
     - no inline code or links anywhere in the list
   Goes from 18 promoted lists to 3 legitimate ones (snowflake-plain,
   bigquery, ai/mcp-server).

3) View-transition style flicker
   The Beacon redesign reintroduced render-blocking Google Fonts via
   `@import` inside custom.css for Inter + JetBrains Mono. On every view
   transition the cached CSS file is re-applied to the swapped DOM, and
   the @import-fetched stylesheet briefly delays font/style application,
   producing the unstyled-content flash the user reported. (The earlier
   abdc3bd fix had self-hosted Lato to solve the exact same class of
   issue.) Moved both font families to a single preconnected `<link>` in
   Head.astro so font loading is non-blocking and parallel.
End-to-end widget plumbing, with the agent runtime deliberately stubbed
until the docs MCP server is live and the managed agent is registered.

- src/components/AskKaiDrawer.astro: sliding `<dialog>` drawer with
  message stream rendering, session-id memory, suggestion chips, ⌘+/
  shortcut. Hooks any `.b-ask` widget click (per-page) and posts
  user turns to /api/chat. Minimal markdown rendering for inline
  links so cited pages are clickable.
- src/components/Footer.astro: thin Starlight Footer override so the
  drawer renders once in <body> on every page (Head.astro is in <head>
  and can't host a <dialog>).
- astro.config.mjs: register the Footer override.
- src/styles/custom.css: Beacon-token drawer styling, suggestion chips,
  user/assistant bubbles, typing indicator. Cursor + hover on the
  page-level .b-ask widget so it reads as clickable.
- api/chat.ts: Vercel Node serverless function. Two modes:
    • STUB MODE (no KAI_AGENT_ID env): canned reply streamed via SSE
      so the UI plumbing is fully exercisable before the agent exists.
    • LIVE MODE (KAI_AGENT_ID + KAI_VAULT_ID + ANTHROPIC_API_KEY set):
      creates/reuses a Managed Agents session, posts the user event,
      streams agent.message text deltas back as SSE. Surfaces session
      id to the client so multi-turn within one tab works.
- scripts/register-managed-agent.mjs: one-off script that creates a
  vault holding the docs MCP bearer + creates the Managed Agent
  pointing at the MCP server URL. Idempotent — re-runs update.
  Writes ids to .agent.json (gitignored) and prints the env vars to
  set on the Vercel project.
- package.json: add @anthropic-ai/sdk dependency (0.98.x for
  beta.agents / beta.sessions / beta.vaults support).
- /api/chat: live mode no longer requires KAI_VAULT_ID. The vault only
  exists to hold a bearer credential; for an open MCP server we skip
  the vault entirely and the session is created without vault_ids.
  Fixes the deployed instance falling back to stub mode after the
  agent was registered without a vault.
- register-managed-agent.mjs: dotenv import dropped (use Node's
  built-in --env-file=.env.local instead) and DOCS_MCP_BEARER made
  optional; agent + vault creation now skip vault entirely when no
  bearer is provided. Add dotenv to devDependencies anyway (used
  transitively earlier in debugging).
Sessions.create() requires environment_id; the first live call returned
400 invalid_request_error / "environment_id: Field required". Created
env_017gCp7Mi2kpH1Ch4vKqEySJ on the workspace and:

- /api/chat: include environment_id when creating sessions; live mode
  check now also requires KAI_ENVIRONMENT_ID env var to be set.
- register-managed-agent.mjs: create/reuse the environment as step 0/2.
  Persist its id in .agent.json so repeat runs reuse instead of
  spawning new environments.
Three pain points addressed:

Slowness — agent.message arrives as one chunk (no token-level streaming
from Managed Agents), so previously the drawer just sat on typing dots
for 6-10s. Now /api/chat forwards intermediate agent state as 'progress'
SSE events ("Thinking…", "Searching docs for X…", "Writing answer…")
and the drawer swaps the dots for those labels until the final
agent.message arrives. Plus:
- Switched the agent to claude-haiku-4-5 (was Sonnet) — ~2-3x faster
  end-to-end for the synthesize-from-search pattern. Quality is fine
  for docs Q&A.
- Tightened the system prompt: 60-120 word default, no H1/H2/H3
  headings (they look wrong in a narrow card), citations inline only
  using URLs the tool returned, no padding/apologies.

Markdown rendering — replaced the toy regex renderer with marked
(GFM + breaks) and a custom link renderer that opens http(s) links in
a new tab and refuses non-safe URL schemes. Added scoped CSS in the
drawer for lists, headings, code blocks, bold so a model response
actually looks like prose instead of raw markdown source.

Local dev — vercel dev keeps throwing `spawn EBADF` on this macOS
setup, so during `astro dev` the drawer now points /api/chat at the
deployed Vercel function (via import.meta.env.DEV). The function
serves CORS headers + handles OPTIONS preflight so cross-origin calls
from localhost:4321 work. Frontend iteration is now instant; backend
changes still need `vercel deploy --prod --force`.

Also: added `npm run register-agent` shortcut; gitignored .env.local.bak
and .env.production from prior debugging; bumped @vercel/node so the
function's request/response types match Vercel's runtime.
…iter

Three fixes for the issues reported after the first round:

1) Drawer only opened on the home page. Astro view transitions re-render
   the Footer (and therefore the <dialog>), but the bundled drawer
   script's `drawer`/`form`/`input` references were captured once at
   first page load — after a nav they pointed to detached DOM and clicks
   were no-ops. Added `transition:persist` to the dialog so the same
   element survives navigation and the cached refs stay valid.

2) MCP server cold-start (503) was bubbling up as a user-facing error.
   The docs MCP is a Keboola data app that scales to zero when idle, so
   the first call after a quiet period reliably fails with
   `MCP server 'docs' initialize failed: upstream server error (HTTP 503)`.
   /api/chat now detects that specific error shape on either
   `agent.mcp_tool_result` or `session.error`, abandons the dead session,
   and retries with a fresh one — up to 3 retries within a 15s budget,
   with a "Docs service is waking up, retrying…" progress event so the
   user sees what's happening. If the budget elapses, we surface a
   friendly message instead of the raw error.

3) Even when the agent answers, Managed Agents delivers `agent.message`
   in a single chunk — there are no token-level deltas to forward. The
   user perceives this as "16s of silence then a wall of text". Added a
   client-side typewriter that paints the text in word-by-word at
   ~2-4 tokens per animation frame, so a 200-word answer reads in
   ~1.5s of smooth motion instead of landing as a blob.
Managed Agents was adding ~10s of wrapping LLM time around a tool call
that already returns a fully-formed answer. New default for /api/chat:
call docs_query on the MCP server directly, stream its text back, append
a Sources list from source_urls. Two Haiku calls eliminated; observed
total time drops from ~16-20s to ~6-8s in early tests.

Implementation:
- mcpToolCall(): minimal Streamable-HTTP MCP client. initialize → notify
  → tools/call, parsing SSE-wrapped JSON. No SDK dependency, ~80 LOC.
- handleDirectMcp(): runs mcpToolCall with the same cold-start retry
  behaviour as the agent path (up to 3 retries, 15s budget) and the
  same progress events ("Searching docs for …", "Writing answer…")
  so the drawer UI doesn't need to change.
- labelForUrl(): turns a docs URL into a short slug-derived label for
  the Sources list (e.g. https://help.keboola.com/storage/ → "storage").

Routing:
- KAI_USE_AGENT=1 in env keeps the old managed-agent path available as
  a fallback. Default behaviour is direct-MCP.
- DOCS_MCP_URL env can override the hard-coded MCP endpoint.

The managed-agent code stays in the file behind the flag; if we later
need agent-side decisioning or multi-step tool use, flip the env var
and redeploy.
The managed-agent path was already gated behind KAI_USE_AGENT and never
the default after the direct-MCP switch. With no plan to use it again
soon, removing it cuts code, dependencies, and a class of failure modes.

/api/chat is now a single straight line: POST /api/chat → call docs_query
on the MCP server → stream the text → emit a `sources` event with the
canonical docs URLs → done.

Code:
- api/chat.ts: removed handleLive, runOneSession, isMcpColdStartError,
  RunResult, the Anthropic SDK import, and the USE_AGENT routing. About
  150 LOC gone. handleDirectMcp is the only live path; emits a new
  `sources` event instead of appending a bulleted list to the body.
- scripts/register-managed-agent.mjs: deleted. The agent on Anthropic's
  side stays archived for now (zero cost while no sessions are running).
- package.json: dropped @anthropic-ai/sdk and dotenv (no longer used);
  removed the `register-agent` npm script.

Drawer chips:
- New `sources` SSE event carries [{ url, label }] items where label is
  the URL's slug rendered Title Case (e.g. /storage/ → "Storage").
- AskKaiDrawer.astro renders chips inside `.ak-sources` directly under
  the assistant bubble, after the typewriter finishes. Chips are pill-
  shaped anchor tags with an external-link glyph; hover lifts them to
  the Beacon blue accent.
- Styles use the existing Beacon tokens (--border-1, --kbc-blue-*),
  truncate long labels with ellipsis, and wrap to multiple rows.
The "Ask Kai about these docs" pill in the sidebar was a CSS
::before pseudo-element on .sidebar-content — visually a button,
but pseudo-elements aren't part of the event tree so clicking it
did nothing. That's the "doesn't open on non-home pages" bug; the
in-page .b-ask widget always worked, but new visitors were trying
the more prominent sidebar pill.

Replaced the pseudo-element with a real <button class=
"ak-sidebar-trigger" data-ask-kai-open> injected into the sidebar
by the drawer script. The injection runs once on initial page
load AND on every astro:page-load (sidebar re-renders with each
view transition, so we re-inject if our button is missing). Same
gradient styling, plus :active feedback. Click is wired through
the existing [data-ask-kai-open] handler so the same open()
function fires for both the sidebar pill and the in-page widget.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant