feat: expand eval coverage for backend SDKs and SvelteKit by nicknisi · Pull Request #78 · workos/cli

nicknisi · 2026-03-04T18:03:27Z

Summary

Expand backend SDK eval coverage from 2 to 4 states each (8 SDKs)
Expand SvelteKit eval coverage from 1 to 5 states
Standardize skill content and patterns across all 17 skills

Problem

Backend SDK skills (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) had only 2 eval test states each (example and example-auth0), while frontend skills had 4-6 states covering partial installs, conflicting middleware, auth migrations, and strict TypeScript. This left edge cases untested for backend integrations — partial installs, conflicting auth systems, and framework-specific gotchas that the evals didn't catch.

SvelteKit had just 1 test state (example), far behind all other frontend frameworks.

Additionally, skill content quality was inconsistent: README fetch URLs mixed HTML and plaintext formats, verification checklist structures varied across skills, and backend skills lacked the edge case coverage that makes frontend skills robust.

Changes

Eval Coverage Expansion (24 new scenarios)

Backend SDKs — 16 new fixtures (2 per SDK):

SDK	`partial-install`	`conflicting-auth`
Node	Express + WorkOS SDK, incomplete login route	Passport.js local auth
Python	Flask + WorkOS SDK, commented-out login	Flask-Login auth
Ruby	Sinatra + WorkOS gem, TODO login route	Warden auth
Go	Gin + WorkOS SDK, 501 login handler	Custom JWT middleware (stdlib)
PHP	WorkOS SDK in composer, empty login.php	Native PHP session auth
PHP-Laravel	SDK + published config, no middleware	Laravel Breeze scaffolding
Kotlin	SDK in Gradle, imported but no controller	Spring Security + SecurityFilterChain
Elixir	WorkOS in mix.exs, stub AuthController	Ueberauth + ueberauth_identity

SvelteKit — 4 new fixtures:

example-auth0 — Auth0 SPA auth with hooks and callback route
partial-install — SDK installed but hooks.server.ts exports passthrough, no callback
conflicting-auth — Full Lucia v3 auth with login, logout, dashboard, session cookies
typescript-strict — Strict TS with noImplicitAny, strictNullChecks, exactOptionalPropertyTypes

Skill Content Improvements (14 skills updated)

9 backend + SvelteKit skills — added two new sections to each:

Partial Install Recovery — detect half-completed AuthKit, complete gaps instead of starting over
Existing Auth System Detection — SDK-specific patterns (Passport, Flask-Login, Devise, Spring Security, etc.) with instructions to add WorkOS alongside existing auth

5 frontend skills — standardized verification checklist format to match Next.js pattern (numbered bash commands, "(ALL MUST PASS)" header, recovery guidance)

Grader Updates (9 graders)

Added bonus checks (non-blocking) for:

Existing routes preserved after integration
Conflicting auth config preserved (SDK-specific patterns)
SvelteKit: sequence() composition, WORKOS_COOKIE_PASSWORD presence

Infrastructure

Standardized all 9 SKILL.md README fetch URLs from github.com/blob/ (HTML) to raw.githubusercontent.com (plaintext)
Created tests/fixtures/README.md documenting fixture state conventions and per-language guidance
Registered 24 new scenarios in runner.ts

Results

Full eval suite: 62/62 passing (98.4% first-attempt, 100% with-correction)

First-attempt:    80.6% (required: 80%)
With-correction:  98.4% (required: 90%)
With-retry:       98.4% (required: 95%)

Framework	Base	Auth0	Partial	Strict	Conflict MW	Existing MW	Conflict Auth
nextjs	+	+	+	+	+	+	-
react	+	+	+	+	-	-	+
react-router	+	+	+	+	+	-	-
tanstack-start	+	+	+	+	+	-	-
vanilla-js	+	+	+	-	-	-	+
sveltekit	+	+	+	+	-	-	+
node	+	+	+	-	-	-	+
python	+	+	+	-	-	-	+
ruby	+	+	+	-	-	-	+
go	+	+	+	-	-	-	+
php	+	+	+	-	-	-	+
php-laravel	+	+	+	-	-	-	+
kotlin	+	+	+	-	-	-	+
elixir	+	+	+	-	-	-	+

Switch all SKILL.md WebFetch URLs from github.com/blob/ (HTML) to raw.githubusercontent.com (plain text) for cleaner parsing. Add tests/fixtures/README.md documenting fixture state conventions and per-language guidance for upcoming backend SDK eval expansion.

Add partial-install and conflicting-auth fixtures for all 8 backend SDKs (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) and expand SvelteKit from 1 to 5 test states. Each backend SDK now has 4 eval states (up from 2), matching frontend skill coverage. Includes: - 20 new fixture directories with validated, buildable projects - SKILL.md updates with partial install recovery and conflicting auth detection sections for 9 skills - Grader bonus checks for preserved routes and conflicting auth - 24 new eval scenarios registered in runner.ts

Align React, React Router, TanStack Start, and Vanilla JS skills to match the Next.js verification checklist pattern: numbered bash commands with comments, "(ALL MUST PASS)" header, and recovery guidance for critical checks.

v1.x-v2.x require illuminate/support ^5-9 via workos-php, which conflicts with Laravel 11. v5.x requires workos-php ^4.29 with no illuminate/support constraint, resolving the conflict.

socket-security · 2026-03-04T18:06:54Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action	Severity	Alert (click "▶" to expand/collapse)
Warn		High CVE: HTTP/2 rapid reset can cause excessive work in net/http in golang `golang.org/x/net` CVE: GHSA-4374-p667-p6c8 HTTP/2 rapid reset can cause excessive work in net/http (HIGH) Affected versions: < 0.17.0 Patched version: 0.17.0 From: `?` → `golang/github.com/gin-gonic/gin@v1.9.1` → `golang/golang.org/x/net@v0.10.0` ℹ Read more on: This package \| This alert \| What is a CVE? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: Remove or replace dependencies that include known high severity CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore golang/golang.org/x/net@v0.10.0`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Steps 3, 4, and error recovery still referenced the old @workos-inc namespace. The npm package is @workos/authkit-sveltekit.

nicknisi added 5 commits March 4, 2026 10:29

chore: standardize frontend skill verification checklists

ed2c71b

Align React, React Router, TanStack Start, and Vanilla JS skills to match the Next.js verification checklist pattern: numbered bash commands with comments, "(ALL MUST PASS)" header, and recovery guidance for critical checks.

fix: bump workos-php-laravel to ^5.0 in partial-install fixture

5e6c104

v1.x-v2.x require illuminate/support ^5-9 via workos-php, which conflicts with Laravel 11. v5.x requires workos-php ^4.29 with no illuminate/support constraint, resolving the conflict.

chore: formatting

26b3a16

fix: correct SvelteKit SDK package name to @workos/authkit-sveltekit

609632c

Steps 3, 4, and error recovery still referenced the old @workos-inc namespace. The npm package is @workos/authkit-sveltekit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expand eval coverage for backend SDKs and SvelteKit#78

feat: expand eval coverage for backend SDKs and SvelteKit#78
nicknisi wants to merge 6 commits intomainfrom
nicknisi/evals

nicknisi commented Mar 4, 2026

Uh oh!

socket-security bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

nicknisi commented Mar 4, 2026

Summary

Problem

Changes

Eval Coverage Expansion (24 new scenarios)

Skill Content Improvements (14 skills updated)

Grader Updates (9 graders)

Infrastructure

Results

Uh oh!

socket-security bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

socket-security bot commented Mar 4, 2026 •

edited

Loading