Skip to content

Commit e60bbd2

Browse files
brannnclaude
andcommitted
Upgrade planner to Llama 3.1 70B and add anti-pattern examples
Critical fixes for planner reliability: 1. Model upgrade: Mixtral-8x7B → Llama 3.1 70B Instruct Turbo - Superior instruction following capability - Better at structured output generation - Proven reliability with similar spec generation tasks 2. Added explicit anti-pattern examples showing: - ❌ WRONG: Landmarks outside FUNCTION blocks - ✅ CORRECT: Landmarks inside FUNCTION blocks - ❌ WRONG: Undefined DATA types in signatures - ✅ CORRECT: DATA defined before use 3. Concrete examples for both correct and incorrect structures help Llama understand exactly what to avoid These changes address the critical issue where even default planner examples were producing invalid specs with W001 (landmarks outside FUNCTION) and W006 (undefined DATA type) errors. Cost impact: Minimal (~$0.88/M tokens vs $0.60/M for Mixtral) Quality impact: Significant improvement expected in spec validity Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent d6ebb4b commit e60bbd2

2 files changed

Lines changed: 61 additions & 2 deletions

File tree

web/cmd/server/static/js/app.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
// Planner application JavaScript
22
// Depends on: simplex-spec.js (system prompt constants)
33

4-
const MODEL_NAME = 'mistralai/Mixtral-8x7B-Instruct-v0.1';
4+
const MODEL_NAME = 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo';
55

66
// State management
77
const state = {

web/cmd/server/static/js/simplex-spec.js

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,65 @@ Here is a more complex specification with DATA types and multiple functions:
303303
${COMPLEX_EXAMPLE}
304304
\`\`\`
305305
306+
## CRITICAL: Common Mistakes to Avoid
307+
308+
**❌ WRONG - Landmarks outside FUNCTION block:**
309+
\`\`\`
310+
RULES:
311+
- if user exists, authenticate them
312+
313+
FUNCTION: login(email, password) → session
314+
\`\`\`
315+
This will FAIL linting. RULES must be INSIDE the FUNCTION block.
316+
317+
**✅ CORRECT - Landmarks inside FUNCTION block:**
318+
\`\`\`
319+
FUNCTION: login(email, password) → session
320+
321+
RULES:
322+
- if user exists, authenticate them
323+
- if credentials valid, create session
324+
325+
DONE_WHEN:
326+
- session created with valid expiration
327+
328+
EXAMPLES:
329+
("user@example.com", "pass123") → Session{id: "...", expires: ...}
330+
331+
ERRORS:
332+
- invalid credentials → fail with "authentication failed"
333+
- any unhandled condition → fail with descriptive message
334+
\`\`\`
335+
336+
**❌ WRONG - Undefined DATA type:**
337+
\`\`\`
338+
FUNCTION: add_item(cart, item) → updated cart
339+
\`\`\`
340+
"updated cart" is not a recognized type. You must define it.
341+
342+
**✅ CORRECT - Define DATA types before use:**
343+
\`\`\`
344+
DATA: Cart
345+
items: list of Item
346+
total: number
347+
348+
FUNCTION: add_item(cart, item) → Cart
349+
350+
RULES:
351+
- add item to cart items list
352+
- recalculate total
353+
354+
DONE_WHEN:
355+
- item appears in cart
356+
- total reflects new item price
357+
358+
EXAMPLES:
359+
(Cart{items: [], total: 0}, Item{price: 10}) → Cart{items: [Item], total: 10}
360+
361+
ERRORS:
362+
- any unhandled condition → fail with descriptive message
363+
\`\`\`
364+
306365
## Generation Instructions
307366
308367
Based on the user's description and any refinement conversation, generate a complete Simplex specification that:
@@ -311,7 +370,7 @@ Based on the user's description and any refinement conversation, generate a comp
311370
312371
2. **Top-level landmarks**: Only DATA and CONSTRAINT appear at the top level (outside FUNCTION blocks). Everything else goes inside FUNCTION blocks.
313372
314-
3. **DATA types**: When using custom types in FUNCTION signatures (like "updated cart" or "PolicyRule"), you MUST define them with a DATA block BEFORE the FUNCTION that uses them. Never reference undefined types.
373+
3. **DATA types**: When using custom types in FUNCTION signatures (like "updated cart" or "PolicyRule"), you MUST define them with a DATA block BEFORE the FUNCTION that uses them. Never reference undefined types. Use simple types (list, number, string, boolean) when they suffice.
315374
316375
4. **Example coverage**: Provide enough EXAMPLES to cover every conditional branch in RULES. The linter will reject specs where example count is less than conditional branch count.
317376

0 commit comments

Comments
 (0)