Skip to content

feat: expand eval coverage for backend SDKs and SvelteKit#78

Open
nicknisi wants to merge 6 commits intomainfrom
nicknisi/evals
Open

feat: expand eval coverage for backend SDKs and SvelteKit#78
nicknisi wants to merge 6 commits intomainfrom
nicknisi/evals

Conversation

@nicknisi
Copy link
Member

@nicknisi nicknisi commented Mar 4, 2026

Summary

  • Expand backend SDK eval coverage from 2 to 4 states each (8 SDKs)
  • Expand SvelteKit eval coverage from 1 to 5 states
  • Standardize skill content and patterns across all 17 skills

Problem

Backend SDK skills (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) had only 2 eval test states each (example and example-auth0), while frontend skills had 4-6 states covering partial installs, conflicting middleware, auth migrations, and strict TypeScript. This left edge cases untested for backend integrations — partial installs, conflicting auth systems, and framework-specific gotchas that the evals didn't catch.

SvelteKit had just 1 test state (example), far behind all other frontend frameworks.

Additionally, skill content quality was inconsistent: README fetch URLs mixed HTML and plaintext formats, verification checklist structures varied across skills, and backend skills lacked the edge case coverage that makes frontend skills robust.

Changes

Eval Coverage Expansion (24 new scenarios)

Backend SDKs — 16 new fixtures (2 per SDK):

SDK partial-install conflicting-auth
Node Express + WorkOS SDK, incomplete login route Passport.js local auth
Python Flask + WorkOS SDK, commented-out login Flask-Login auth
Ruby Sinatra + WorkOS gem, TODO login route Warden auth
Go Gin + WorkOS SDK, 501 login handler Custom JWT middleware (stdlib)
PHP WorkOS SDK in composer, empty login.php Native PHP session auth
PHP-Laravel SDK + published config, no middleware Laravel Breeze scaffolding
Kotlin SDK in Gradle, imported but no controller Spring Security + SecurityFilterChain
Elixir WorkOS in mix.exs, stub AuthController Ueberauth + ueberauth_identity

SvelteKit — 4 new fixtures:

  • example-auth0 — Auth0 SPA auth with hooks and callback route
  • partial-install — SDK installed but hooks.server.ts exports passthrough, no callback
  • conflicting-auth — Full Lucia v3 auth with login, logout, dashboard, session cookies
  • typescript-strict — Strict TS with noImplicitAny, strictNullChecks, exactOptionalPropertyTypes

Skill Content Improvements (14 skills updated)

9 backend + SvelteKit skills — added two new sections to each:

  • Partial Install Recovery — detect half-completed AuthKit, complete gaps instead of starting over
  • Existing Auth System Detection — SDK-specific patterns (Passport, Flask-Login, Devise, Spring Security, etc.) with instructions to add WorkOS alongside existing auth

5 frontend skills — standardized verification checklist format to match Next.js pattern (numbered bash commands, "(ALL MUST PASS)" header, recovery guidance)

Grader Updates (9 graders)

Added bonus checks (non-blocking) for:

  • Existing routes preserved after integration
  • Conflicting auth config preserved (SDK-specific patterns)
  • SvelteKit: sequence() composition, WORKOS_COOKIE_PASSWORD presence

Infrastructure

  • Standardized all 9 SKILL.md README fetch URLs from github.com/blob/ (HTML) to raw.githubusercontent.com (plaintext)
  • Created tests/fixtures/README.md documenting fixture state conventions and per-language guidance
  • Registered 24 new scenarios in runner.ts

Results

Full eval suite: 62/62 passing (98.4% first-attempt, 100% with-correction)

First-attempt:    80.6% (required: 80%)
With-correction:  98.4% (required: 90%)
With-retry:       98.4% (required: 95%)
Framework Base Auth0 Partial Strict Conflict MW Existing MW Conflict Auth
nextjs + + + + + + -
react + + + + - - +
react-router + + + + + - -
tanstack-start + + + + + - -
vanilla-js + + + - - - +
sveltekit + + + + - - +
node + + + - - - +
python + + + - - - +
ruby + + + - - - +
go + + + - - - +
php + + + - - - +
php-laravel + + + - - - +
kotlin + + + - - - +
elixir + + + - - - +

nicknisi added 5 commits March 4, 2026 10:29
Switch all SKILL.md WebFetch URLs from github.com/blob/ (HTML) to
raw.githubusercontent.com (plain text) for cleaner parsing. Add
tests/fixtures/README.md documenting fixture state conventions and
per-language guidance for upcoming backend SDK eval expansion.
Add partial-install and conflicting-auth fixtures for all 8 backend
SDKs (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) and
expand SvelteKit from 1 to 5 test states. Each backend SDK now has
4 eval states (up from 2), matching frontend skill coverage.

Includes:
- 20 new fixture directories with validated, buildable projects
- SKILL.md updates with partial install recovery and conflicting
  auth detection sections for 9 skills
- Grader bonus checks for preserved routes and conflicting auth
- 24 new eval scenarios registered in runner.ts
Align React, React Router, TanStack Start, and Vanilla JS skills
to match the Next.js verification checklist pattern: numbered bash
commands with comments, "(ALL MUST PASS)" header, and recovery
guidance for critical checks.
v1.x-v2.x require illuminate/support ^5-9 via workos-php, which
conflicts with Laravel 11. v5.x requires workos-php ^4.29 with no
illuminate/support constraint, resolving the conflict.
@socket-security
Copy link

socket-security bot commented Mar 4, 2026

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
High CVE: HTTP/2 rapid reset can cause excessive work in net/http in golang golang.org/x/net

CVE: GHSA-4374-p667-p6c8 HTTP/2 rapid reset can cause excessive work in net/http (HIGH)

Affected versions: < 0.17.0

Patched version: 0.17.0

From: ?golang/github.com/gin-gonic/gin@v1.9.1golang/golang.org/x/net@v0.10.0

ℹ Read more on: This package | This alert | What is a CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Remove or replace dependencies that include known high severity CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore golang/golang.org/x/net@v0.10.0. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Steps 3, 4, and error recovery still referenced the old @workos-inc
namespace. The npm package is @workos/authkit-sveltekit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant