ScriptaAI follows a monorepo structure with a clear separation between the frontend user interface and the backend AI orchestration layer.
Built with React 19 and Vite, the frontend is designed for high-performance state management and cinematic visuals.
- Routing:
react-router-dommanages transitions between the Landing, Storyboard, Scene Review, and Editor pages. - State Management: Primary state (scenes, project data) is passed via
location.stateand managed locally within page components to ensure persistence during the generation flow. - Visual Engine:
- GSAP: Used for all UI animations, ensuring 60fps transitions and complex timeline scrubbing.
- Three.js: Powers the
InteractiveBackground, providing a premium 3D particle environment.
- Custom Editor: The
Editor.jsxcomponent is a complex implementation of a non-linear video editor, utilizing HTML5 Canvas for real-time preview and multi-track rendering.
A Node.js/Express server that acts as an intelligent gateway to various AI inference providers.
- Orchestration: The backend manages the sequential dependency between models (e.g., LLM -> FLUX -> Wan-AI).
- Concurrency: Long-running generation jobs are handled asynchronously with a job ID system, allowing the frontend to poll for status without blocking.
- Key Management: Includes a
siliconKeys.jsutility (architected for rotation) to handle high-volume video generation requests. - Retry Logic: An
axiosWithRetrywrapper handles transient network errors and rate limits from AI providers.
- Model:
Qwen/Qwen2.5-72B-Instructvia Hugging Face. - Logic: Transforms unstructured prompts into cinematic narratives. It utilizes specific system prompts to enforce professional screenwriting standards.
- PDF Parsing: Uses
pdf-parseto extract text, which is then summarized by the LLM into a structured "Academic-to-Cinematic" storyboard.
The generate-scene-visuals endpoint implements a "Context Chain":
- Scene N-1 generates a visual description.
- Scene N receives the visual description of Scene N-1 as "Consistency Reference."
- LLM Refinement: Before generating an image, a dedicated "Visual Director" prompt refines the scene's prompt based on previous context.
- I2V (Image-to-Video): The generated FLUX image is converted to base64 and sent to Wan2.2-I2V-A14B to ensure the video begins exactly where the image left off.
The Editor.jsx component is the heart of the post-production suite.
- Pixels Per Second (PPS): A logarithmic zoom system that allows users to view the entire project or zoom into specific frames.
- Snap Logic: Clips automatically snap to the start/end of other clips or the playhead to prevent unintentional gaps.
- Multi-Track: Supports independent tracks for Video (primary assets), Text (overlays), and Audio.
The export process uses the MediaRecorder API:
- A hidden
<canvas>is created at 1920x1080 resolution. - The engine "plays" the timeline at a fixed frame rate.
- Every frame draws the active scenes (videos/images) and text overlays with correct transforms (scale, rotation, opacity).
- The canvas stream is captured into chunks and compiled into an MP4/WebM blob for download.
| Endpoint | Method | Description |
|---|---|---|
/api/generate-storyboard |
POST | Text prompt to cinematic script. |
/api/parse-pdf |
POST | PDF file to structured storyboard. |
/api/breakdown-storyboard |
POST | Script to 4 structured scenes. |
/api/chat-assistant |
POST | Context-aware script/scene refinement. |
/api/generate-scene-visuals |
POST | Start background job for Image/Video generation. |
/api/generation-status/:id |
GET | Poll status of a specific generation job. |
/api/regenerate-scene-video |
POST | Regenerate a specific video clip for a scene. |
- Environment: Ensure
public/generated-imagesandpublic/generated-videosdirectories exist in the backend root and have write permissions. - CORS: The backend is pre-configured to allow requests from
http://localhost:5173(default Vite port). - Timeouts: AI generation can take up to 2-3 minutes per scene; ensure proxy timeouts (like Nginx) are configured accordingly.
Last updated: May 2026