Skip to content

[Optimus] [Runtime] Complex app-facing script generation still fails across engines; Windows manifest EPERM under load #520

@cloga

Description

@cloga

Summary

v2.16.13 fixed the simple HTTP runtime smoke tests for Trinity, but the real Storyboard / script-generation workload is still not reliable.

This is a different problem from the earlier resolver/autopilot issue.

What works

On a fresh v2.16.13 runtime, simple structured requests like a minimal ping can succeed via github-copilot.

What still fails

Using the actual Trinity script-generation request shape built from build_script_generation_context(...):

  • payload keys: ticker, system_prompt, user_prompt, optional report_markdown, optional metadata
  • stringified input payload length in my minimal repro: about 9971 chars
  • orchestrator prompt sizes observed in runtime logs: roughly 24k-26k chars after role/skill/template wrapping

Repro 1 — script-writer + script-generation skill, no output_schema

Request:

  • role = script-writer
  • skill = script-generation
  • engine = github-copilot
  • model = gemini-3-pro-preview

Observed result:

  • HTTP status 200, runtime status completed
  • but result is only a human-readable summary like:
    • "The Douyin video script for ... has been generated and saved to ..."
  • and the file at runtime_metadata.output_path also contains only that summary text, not the JSON script object

So from an application-facing API perspective, this is not usable as a structured script response.

Repro 2 — same request, but with output_schema

Request:

  • role = script-writer
  • skill = script-generation
  • engine = github-copilot
  • model = gemini-3-pro-preview
  • output_schema = Trinity script schema

Observed result:

  • runtime status failed
  • error_code = invalid_structured_output
  • error_message = Expected JSON output but failed to parse result...
  • result is again a prose summary / output-artifact note, not JSON

Repro 3 — generic role with same complex input

Request:

  • role = dev
  • engine = github-copilot
  • model = gemini-3-pro-preview or gpt-5.4
  • same script-generation input payload
  • same output_schema

Observed result:

  • runtime status failed
  • result = Error: Execution failed: CAPIError: 400 400 Bad Request
  • surfaced as invalid_structured_output

So the issue is not limited to the script-writer role template alone; the complex script-generation prompt shape seems to trigger failures on the GitHub Copilot path even though simple structured pings now work.

Repro 4 — claude-code comparison

Fresh runtime, single request only:

  • role = script-writer
  • skill = script-generation
  • engine = claude-code
  • model = claude-opus-4.6-1m
  • same complex input + output_schema

Observed result:

  • runtime status completed
  • but result = {}
  • and output_path file content is also just {}

So this path also does not return a usable structured script object in the application-facing API.

Additional Windows stability issue

While running mixed requests on Windows, the HTTP runtime process also crashed with:

  • EPERM: operation not permitted, rename '.optimus/state/task-manifest.json.tmp' -> '.optimus/state/task-manifest.json'

Stack was in saveManifest(...) inside http-runtime.js.

So there also appears to be a Windows file-lock / manifest-write race under load.

Why this matters

This means:

  • v2.16.13 simple runtime validation is good news,
  • but Trinity's actual Storyboard Agent Runtime integration is still not production-safe,
  • because real script-generation calls either:
    • return prose instead of the script JSON,
    • fail with Copilot CAPIError: 400,
    • or complete with an empty object.

Requested fixes

  1. Make application-facing /api/v1/agent/run reliable for larger structured prompt payloads, not just minimal smoke tests.
  2. When output_schema is provided, ensure role/skill responses do not degrade into human-readable summaries or artifact notes.
  3. Fix Windows manifest persistence so concurrent or overlapping runs do not crash the HTTP runtime with EPERM on rename.

If helpful, I can also provide the exact Trinity request body shape and the specific run IDs used in this retest.


🤖 Created by master via Optimus Spartan Swarm


🤖 Created by master via Optimus Spartan Swarm

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't workingoptimus-botruntimesystem-maintainedCreated or processed by autonomous system agents

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions