Conversation
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughA new boolean field Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@cubic-dev-ai please review |
@ericallam I have started the AI code review. It will take a few minutes to complete. |
## Summary 12 new features, 59 improvements, 17 bug fixes. ## Highlights - Add support for setting TTL (time-to-live) defaults at the task level and globally in trigger.config.ts, with per-trigger overrides still taking precedence ([#3196](#3196)) - Large run outputs can use the new API which allows switching object storage providers. ([#3275](#3275)) ## Improvements - Add platform notifications support to the CLI. The `trigger dev` and `trigger login` commands now fetch and display platform notifications (info, warn, error, success) from the server. Includes discovery-based filtering to conditionally show notifications based on project file patterns, color markup rendering for styled terminal output, and a non-blocking display flow with a spinner fallback for slow fetches. Use `--skip-platform-notifications` flag with `trigger dev` to disable the notification check. ([#3254](#3254)) - Add `get_span_details` MCP tool for inspecting individual spans within a run trace. ([#3255](#3255)) - New `get_span_details` tool returns full span attributes, timing, events, and AI enrichment (model, tokens, cost, speed) - Span IDs now shown in `get_run_details` trace output for easy discovery - New API endpoint `GET /api/v1/runs/:runId/spans/:spanId` - New `retrieveSpan()` method on the API client - `get_query_schema` — discover available TRQL tables and columns - `query` — execute TRQL queries against your data - `list_dashboards` — list built-in dashboards and their widgets - `run_dashboard_query` — execute a single dashboard widget query - `whoami` — show current profile, user, and API URL - `list_profiles` — list all configured CLI profiles - `switch_profile` — switch active profile for the MCP session - `start_dev_server` — start `trigger dev` in the background and stream output - `stop_dev_server` — stop the running dev server - `dev_server_status` — check dev server status and view recent logs - `GET /api/v1/query/schema` — query table schema discovery - `GET /api/v1/query/dashboards` — list built-in dashboards - `--readonly` flag hides write tools (`deploy`, `trigger_task`, `cancel_run`) so the AI cannot make changes - `read:query` JWT scope for query endpoint authorization - `get_run_details` trace output is now paginated with cursor support - MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools - `get_query_schema` now requires a table name and returns only one table's schema (was returning all tables) - `get_current_worker` no longer inlines payload schemas; use new `get_task_schema` tool instead - Query results formatted as text tables instead of JSON (~50% fewer tokens) - `cancel_run`, `list_deploys`, `list_preview_branches` formatted as text instead of raw JSON - Schema and dashboard API responses cached to avoid redundant fetches - Adapted the CLI API client to propagate the trigger source via http headers. ([#3241](#3241)) - Propagate run tags to span attributes so they can be extracted server-side for LLM cost attribution metadata. ([#3213](#3213)) - New `get_span_details` tool returns full span attributes, timing, events, and AI enrichment (model, tokens, cost, speed) - Span IDs now shown in `get_run_details` trace output for easy discovery - New API endpoint `GET /api/v1/runs/:runId/spans/:spanId` - New `retrieveSpan()` method on the API client - `get_query_schema` — discover available TRQL tables and columns - `query` — execute TRQL queries against your data - `list_dashboards` — list built-in dashboards and their widgets - `run_dashboard_query` — execute a single dashboard widget query - `whoami` — show current profile, user, and API URL - `list_profiles` — list all configured CLI profiles - `switch_profile` — switch active profile for the MCP session - `start_dev_server` — start `trigger dev` in the background and stream output - `stop_dev_server` — stop the running dev server - `dev_server_status` — check dev server status and view recent logs - `GET /api/v1/query/schema` — query table schema discovery - `GET /api/v1/query/dashboards` — list built-in dashboards - `--readonly` flag hides write tools (`deploy`, `trigger_task`, `cancel_run`) so the AI cannot make changes - `read:query` JWT scope for query endpoint authorization - `get_run_details` trace output is now paginated with cursor support - MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools - `get_query_schema` now requires a table name and returns only one table's schema (was returning all tables) - `get_current_worker` no longer inlines payload schemas; use new `get_task_schema` tool instead - Query results formatted as text tables instead of JSON (~50% fewer tokens) - `cancel_run`, `list_deploys`, `list_preview_branches` formatted as text instead of raw JSON - Schema and dashboard API responses cached to avoid redundant fetches - Add optional `hasPrivateLink` field to the dequeue message organization object for private networking support ([#3264](#3264)) - Define and manage AI prompts with `prompts.define()`. Create typesafe prompt templates with variables, resolve them at runtime, and manage versions and overrides from the dashboard without redeploying. ([#3244](#3244)) ## Bug fixes - Fix dev CLI leaking build directories on rebuild, causing disk space accumulation. Deprecated workers are now pruned (capped at 2 retained) when no active runs reference them. The watchdog process also cleans up `.trigger/tmp/` when the dev CLI is killed ungracefully (e.g. SIGKILL from pnpm). ([#3224](#3224)) - Fix `--load` flag being silently ignored on local/self-hosted builds. ([#3114](#3114)) - Fixed `search_docs` tool failing due to renamed upstream Mintlify tool (`SearchTriggerDev` → `search_trigger_dev`) - Fixed `list_deploys` failing when deployments have null `runtime`/`runtimeVersion` fields (#3139) - Fixed `list_preview_branches` crashing due to incorrect response shape access - Fixed `metrics` table column documented as `value` instead of `metric_value` in query docs - Fixed dev CLI leaking build directories on rebuild — deprecated workers now clean up their build dirs when their last run completes - Fixed `search_docs` tool failing due to renamed upstream Mintlify tool (`SearchTriggerDev` → `search_trigger_dev`) - Fixed `list_deploys` failing when deployments have null `runtime`/`runtimeVersion` fields (#3139) - Fixed `list_preview_branches` crashing due to incorrect response shape access - Fixed `metrics` table column documented as `value` instead of `metric_value` in query docs - Fixed dev CLI leaking build directories on rebuild — deprecated workers now clean up their build dirs when their last run completes ## Server changes These changes affect the self-hosted Docker image and Trigger.dev Cloud: - Add admin UI for viewing and editing feature flags (org-level overrides and global defaults). ([#3291](#3291)) - AI prompt management dashboard and enhanced span inspectors. **Prompt management:** - Prompts list page with version status, model, override indicators, and 24h usage sparklines - Prompt detail page with template viewer, variable preview, version history timeline, and override editor - Create, edit, and remove overrides to change prompt content or model without redeploying - Promote any code-deployed version to current - Generations tab with infinite scroll, live polling, and inline span inspector - Per-prompt metrics: total generations, avg tokens, avg cost, latency, with version-level breakdowns **AI span inspectors:** - Custom inspectors for `ai.generateText`, `ai.streamText`, `ai.generateObject`, `ai.streamObject` parent spans - `ai.toolCall` inspector showing tool name, call ID, and input arguments - `ai.embed` inspector showing model, provider, and input text - Prompt tab on AI spans linking to prompt version with template and input variables - Compact timestamp and duration header on all AI span inspectors **AI metrics dashboard:** - Operations, Providers, and Prompts filters on the AI Metrics dashboard - Cost by prompt widget - "AI" section in the sidebar with Prompts and AI Metrics links **Other improvements:** - Resizable panel sizes now persist across page refreshes - Fixed `<div>` inside `<p>` DOM nesting warnings in span titles and chat messages ([#3244](#3244)) - Add allowRollbacks query param to the promote deployment API to enable version downgrades ([#3214](#3214)) - Pre-warm compute templates on deploy for orgs with compute access. Required for projects using a compute region, background-only for others. ([#3114](#3114)) - Add automatic LLM cost calculation for spans with GenAI semantic conventions. When a span arrives with `gen_ai.response.model` and token usage data, costs are calculated from an in-memory pricing registry backed by Postgres and dual-written to both span attributes (`trigger.llm.*`) and a new `llm_metrics_v1` ClickHouse table that captures usage, cost, performance (TTFC, tokens/sec), and behavioral (finish reason, operation type) metrics. ([#3213](#3213)) - Add API endpoint `GET /api/v1/runs/:runId/spans/:spanId` that returns detailed span information including properties, events, AI enrichment (model, tokens, cost), and triggered child runs. ([#3255](#3255)) - Multi-provider object storage with protocol-based routing for zero-downtime migration ([#3275](#3275)) - Add IAM role-based auth support for object stores (no access keys required). ([#3275](#3275)) - Add platform notifications to inform users about new features, changelogs, and platform events directly in the dashboard. ([#3254](#3254)) - Add private networking support via AWS PrivateLink. Includes BillingClient methods for managing private connections, org settings UI pages for connection management, and supervisor changes to apply `privatelink` pod labels for CiliumNetworkPolicy matching. ([#3264](#3264)) - Reduce run start latency by skipping the intermediate queue when concurrency is available. This optimization is rolled out per-region and enabled automatically for development environments. ([#3299](#3299)) - Extended the search filter on the environment variables page to match on environment type (production, staging, development, preview) and branch name, not just variable name and value. ([#3302](#3302)) - Set `application_name` on Prisma connections from SERVICE_NAME so DB load can be attributed by service ([#3348](#3348)) - Fix transient R2/object store upload failures during batchTrigger() item streaming. - Added p-retry (3 attempts, 500ms–2s exponential backoff) around `uploadPacketToObjectStore` in `BatchPayloadProcessor.process()` so transient network errors self-heal server-side rather than aborting the entire batch stream. - Removed `x-should-retry: false` from the 500 response on the batch items route so the SDK's existing 5xx retry path can recover if server-side retries are exhausted. Item deduplication by index makes full-stream retries safe. ([#3331](#3331)) - Concurrency-keyed queues now use a single master queue entry per base queue instead of one entry per key. Prevents high-CK-count tenants from consuming the entire parentQueueLimit window and starving other tenants on the same shard. ([#3219](#3219)) - Reduce lock contention when processing large `batchTriggerAndWait` batches. Previously, each batch item acquired a Redis lock on the parent run to insert a `TaskRunWaitpoint` row, causing `LockAcquisitionTimeoutError` with high concurrency (880 errors/24h in prod). Since `blockRunWithCreatedBatch` already transitions the parent to `EXECUTING_WITH_WAITPOINTS` before items are processed, the per-item lock is unnecessary. The new `blockRunWithWaitpointLockless` method performs only the idempotent CTE insert without acquiring the lock. ([#3232](#3232)) - Strip `secure` query parameter from QUERY_CLICKHOUSE_URL before passing to ClickHouse client. This was already done for the main and logs ClickHouse clients but was missing for the query client, causing a startup crash with `Error: Unknown URL parameters: secure`. ([#3204](#3204)) - Fix `OrganizationsPresenter.#getEnvironment` matching the wrong development environment on teams with multiple members. All dev environments share the slug `"dev"`, so the previous `find` by slug alone could return another member's environment. Now filters DEVELOPMENT environments by `orgMember.userId` to ensure the logged-in user's dev environment is selected. ([#3273](#3273)) <details> <summary>Raw changeset output</summary> # Releases ## @trigger.dev/build@4.4.4 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.4` ## trigger.dev@4.4.4 ### Patch Changes - Add platform notifications support to the CLI. The `trigger dev` and `trigger login` commands now fetch and display platform notifications (info, warn, error, success) from the server. Includes discovery-based filtering to conditionally show notifications based on project file patterns, color markup rendering for styled terminal output, and a non-blocking display flow with a spinner fallback for slow fetches. Use `--skip-platform-notifications` flag with `trigger dev` to disable the notification check. ([#3254](#3254)) - Fix dev CLI leaking build directories on rebuild, causing disk space accumulation. Deprecated workers are now pruned (capped at 2 retained) when no active runs reference them. The watchdog process also cleans up `.trigger/tmp/` when the dev CLI is killed ungracefully (e.g. SIGKILL from pnpm). ([#3224](#3224)) - Fix `--load` flag being silently ignored on local/self-hosted builds. ([#3114](#3114)) - Add `get_span_details` MCP tool for inspecting individual spans within a run trace. ([#3255](#3255)) - New `get_span_details` tool returns full span attributes, timing, events, and AI enrichment (model, tokens, cost, speed) - Span IDs now shown in `get_run_details` trace output for easy discovery - New API endpoint `GET /api/v1/runs/:runId/spans/:spanId` - New `retrieveSpan()` method on the API client - MCP server improvements: new tools, bug fixes, and new flags. ([#3224](#3224)) **New tools:** - `get_query_schema` — discover available TRQL tables and columns - `query` — execute TRQL queries against your data - `list_dashboards` — list built-in dashboards and their widgets - `run_dashboard_query` — execute a single dashboard widget query - `whoami` — show current profile, user, and API URL - `list_profiles` — list all configured CLI profiles - `switch_profile` — switch active profile for the MCP session - `start_dev_server` — start `trigger dev` in the background and stream output - `stop_dev_server` — stop the running dev server - `dev_server_status` — check dev server status and view recent logs **New API endpoints:** - `GET /api/v1/query/schema` — query table schema discovery - `GET /api/v1/query/dashboards` — list built-in dashboards **New features:** - `--readonly` flag hides write tools (`deploy`, `trigger_task`, `cancel_run`) so the AI cannot make changes - `read:query` JWT scope for query endpoint authorization - `get_run_details` trace output is now paginated with cursor support - MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools **Bug fixes:** - Fixed `search_docs` tool failing due to renamed upstream Mintlify tool (`SearchTriggerDev` → `search_trigger_dev`) - Fixed `list_deploys` failing when deployments have null `runtime`/`runtimeVersion` fields (#3139) - Fixed `list_preview_branches` crashing due to incorrect response shape access - Fixed `metrics` table column documented as `value` instead of `metric_value` in query docs - Fixed dev CLI leaking build directories on rebuild — deprecated workers now clean up their build dirs when their last run completes **Context optimizations:** - `get_query_schema` now requires a table name and returns only one table's schema (was returning all tables) - `get_current_worker` no longer inlines payload schemas; use new `get_task_schema` tool instead - Query results formatted as text tables instead of JSON (~50% fewer tokens) - `cancel_run`, `list_deploys`, `list_preview_branches` formatted as text instead of raw JSON - Schema and dashboard API responses cached to avoid redundant fetches - Add support for setting TTL (time-to-live) defaults at the task level and globally in trigger.config.ts, with per-trigger overrides still taking precedence ([#3196](#3196)) - Adapted the CLI API client to propagate the trigger source via http headers. ([#3241](#3241)) - Updated dependencies: - `@trigger.dev/core@4.4.4` - `@trigger.dev/build@4.4.4` - `@trigger.dev/schema-to-json@4.4.4` ## @trigger.dev/core@4.4.4 ### Patch Changes - Fix `list_deploys` MCP tool failing when deployments have null `runtime` or `runtimeVersion` fields. ([#3224](#3224)) - Propagate run tags to span attributes so they can be extracted server-side for LLM cost attribution metadata. ([#3213](#3213)) - Add `get_span_details` MCP tool for inspecting individual spans within a run trace. ([#3255](#3255)) - New `get_span_details` tool returns full span attributes, timing, events, and AI enrichment (model, tokens, cost, speed) - Span IDs now shown in `get_run_details` trace output for easy discovery - New API endpoint `GET /api/v1/runs/:runId/spans/:spanId` - New `retrieveSpan()` method on the API client - MCP server improvements: new tools, bug fixes, and new flags. ([#3224](#3224)) **New tools:** - `get_query_schema` — discover available TRQL tables and columns - `query` — execute TRQL queries against your data - `list_dashboards` — list built-in dashboards and their widgets - `run_dashboard_query` — execute a single dashboard widget query - `whoami` — show current profile, user, and API URL - `list_profiles` — list all configured CLI profiles - `switch_profile` — switch active profile for the MCP session - `start_dev_server` — start `trigger dev` in the background and stream output - `stop_dev_server` — stop the running dev server - `dev_server_status` — check dev server status and view recent logs **New API endpoints:** - `GET /api/v1/query/schema` — query table schema discovery - `GET /api/v1/query/dashboards` — list built-in dashboards **New features:** - `--readonly` flag hides write tools (`deploy`, `trigger_task`, `cancel_run`) so the AI cannot make changes - `read:query` JWT scope for query endpoint authorization - `get_run_details` trace output is now paginated with cursor support - MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools **Bug fixes:** - Fixed `search_docs` tool failing due to renamed upstream Mintlify tool (`SearchTriggerDev` → `search_trigger_dev`) - Fixed `list_deploys` failing when deployments have null `runtime`/`runtimeVersion` fields (#3139) - Fixed `list_preview_branches` crashing due to incorrect response shape access - Fixed `metrics` table column documented as `value` instead of `metric_value` in query docs - Fixed dev CLI leaking build directories on rebuild — deprecated workers now clean up their build dirs when their last run completes **Context optimizations:** - `get_query_schema` now requires a table name and returns only one table's schema (was returning all tables) - `get_current_worker` no longer inlines payload schemas; use new `get_task_schema` tool instead - Query results formatted as text tables instead of JSON (~50% fewer tokens) - `cancel_run`, `list_deploys`, `list_preview_branches` formatted as text instead of raw JSON - Schema and dashboard API responses cached to avoid redundant fetches - Large run outputs can use the new API which allows switching object storage providers. ([#3275](#3275)) - Add optional `hasPrivateLink` field to the dequeue message organization object for private networking support ([#3264](#3264)) - Add support for setting TTL (time-to-live) defaults at the task level and globally in trigger.config.ts, with per-trigger overrides still taking precedence ([#3196](#3196)) - Adapted the CLI API client to propagate the trigger source via http headers. ([#3241](#3241)) ## @trigger.dev/python@4.4.4 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.4.4` - `@trigger.dev/core@4.4.4` - `@trigger.dev/build@4.4.4` ## @trigger.dev/react-hooks@4.4.4 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.4` ## @trigger.dev/redis-worker@4.4.4 ### Patch Changes - Adapted the CLI API client to propagate the trigger source via http headers. ([#3241](#3241)) - Updated dependencies: - `@trigger.dev/core@4.4.4` ## @trigger.dev/rsc@4.4.4 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.4` ## @trigger.dev/schema-to-json@4.4.4 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.4` ## @trigger.dev/sdk@4.4.4 ### Patch Changes - Define and manage AI prompts with `prompts.define()`. Create typesafe prompt templates with variables, resolve them at runtime, and manage versions and overrides from the dashboard without redeploying. ([#3244](#3244)) - Add support for setting TTL (time-to-live) defaults at the task level and globally in trigger.config.ts, with per-trigger overrides still taking precedence ([#3196](#3196)) - Adapted the CLI API client to propagate the trigger source via http headers. ([#3241](#3241)) - Updated dependencies: - `@trigger.dev/core@4.4.4` </details> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
Currently, every triggered run follows a two-step path through Redis:
processQueueForWorkerQueuejob fires ~500ms later, checks concurrency limits, removes the message from the sorted set, and pushes it to a worker queue (Redis list) where workers pick it up viaBLPOPThis means every run pays at least ~500ms of latency between being triggered and being available for a worker to execute, even when the queue is empty and concurrency is wide open.
What changed
The enqueue Lua scripts now atomically decide whether to skip the queue sorted set entirely and push directly to the worker queue. This happens inside the same Lua script that handles normal enqueue, so the decision is atomic with respect to concurrency bookkeeping.
A run takes the fast path when all of these are true:
WorkerInstanceGroup)ZRANGEBYSCOREfinds nothing with score ≤ now) — this respects priority ordering and allows fast path even when the queue has future-scored messages (e.g. nacked retries with delay)When the fast path is taken:
RPUSH)SADDto the same sets used by the normal dequeue path)processQueueForWorkerQueuejob is not scheduled (no work to do)expireRunworker job handles TTL independently)When any condition fails, the existing slow path runs unchanged.
Rollout gating
enableFastPathboolean onWorkerInstanceGroup(defaults tofalse), allowing region-by-region rolloutRolling deploy safety
Each process registers its own Lua scripts via
defineCommand(identified by SHA hash). Old and new processes never share scripts. The Redis data structures are fully compatible in both directions — ack, nack, and release operations work identically regardless of which path a message took.Test plan
enableFastPathis false