FileShot
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 93 additions & 3 deletions b/‎.github/copilot-instructions.md‎
Lines changed: 93 additions & 3 deletions
diff --git a/‎322_DEFECTS.md‎
Lines changed: 112 additions & 0 deletions b/‎322_DEFECTS.md‎
Lines changed: 112 additions & 0 deletions
@@ -1,15 +1,55 @@
 # 🚫 STOP — READ THIS FIRST — NON-NEGOTIABLE — NO EXCEPTIONS
 
+---
+
+## WHAT IS THIS PROJECT
+
+**guIDE** is a local-first, offline-capable AI IDE — the quality target is Visual Studio Code + GitHub Copilot, but running entirely locally with no cloud dependency. It ships to real end users on all hardware configurations from 4GB GPU laptops to 128GB RAM workstations, running 0.5B to 200B parameter models.
+
+Every change to this codebase must be:
+- **Production-grade** — VSCode/Copilot level of polish. No band-aids, no test-specific workarounds, no partial implementations.
+- **General** — works for ALL users, ALL models, ALL hardware. Not just the dev machine.
+- **Complete** — both `main/` AND `pipeline-clone/main/` updated. 0% done until both are updated.
+- **Verified** — do NOT say a fix works without being able to cite specific code evidence.
+
+---
+
+## TOP-LEVEL RULES — EVERY AGENT MUST READ ALL OF THESE BEFORE RESPONDING
+
+> These are the most commonly violated rules. Read every bullet. No exceptions.
+
+1. **END WITH MULTI-CHOICE TOOL** — NEVER end a response with text. ALWAYS use the `vscode_askQuestions` tool at the end. "Let me know if you need anything" is BANNED. See RULE -1 below.
+2. **TRIPWIRE** — First line of EVERY response: `[Task: X | Last: Y]`
+3. **READ BEFORE WRITING** — Read every relevant file in full before implementing any change. No partial reads. No "I already know this code." Read it anyway.
+4. **PLAN FIRST, CODE SECOND** — Present a plan, wait for USER approval, THEN implement. Never write code in the same response as the plan.
+5. **NO BANNED WORDS** — Never say: confirmed, fixed, resolves, fully fixed, ready, working, all set. No emojis (especially no checkmarks). See BANNED WORDS section.
+6. **BOTH TREES ALWAYS** — Every code change goes to BOTH `main/` AND `pipeline-clone/main/`. No exceptions.
+7. **CHANGES_LOG.md ALWAYS** — Every code change logged in `pipeline-clone/CHANGES_LOG.md`. No exceptions.
+8. **NO CHEERLEADING** — Do NOT say "great results", "strong performance", "improvement", "looking good", or any positive framing. Report defects. That is the job.
+9. **NEVER DISMISS USER OBSERVATIONS** — What the user observes is FACT. If your analysis contradicts it, YOUR ANALYSIS IS WRONG. Go back and read more code.
+10. **NO BAND-AID FIXES** — Every fix addresses the root architectural cause. Surface-level patches, guard clauses, and timeouts masking a deeper issue are NOT fixes.
+11. **NEVER BLAME MODEL SIZE OR CONTEXT WINDOW** — If it fails, the pipeline is broken. Fix the pipeline. See dedicated sections below.
+12. **PRODUCTION SOFTWARE** — Every fix must work for 4GB GPU users AND 128GB workstation users. Hardware-specific fixes are bugs.
+13. **READ CHANGES_LOG.md FIRST** — Before proposing ANY fix, read `pipeline-clone/CHANGES_LOG.md`. Context resets every session. The log is the anchor.
+14. **READ 323_DEFECTS.md** — The definitive defect list is at `C:\Users\brend\IDE\323_DEFECTS.md`. Read it before proposing any fix to the guIDE pipeline.
+15. **ACKNOWLEDGE EVERY POINT** — If user makes 7 points, respond to all 7. Skipping one is the same as ignoring a direct instruction.
+16. **SELF-ACCOUNTABILITY CHECKPOINT** — After EVERY todo item, run through the mandatory checkpoint before marking it done.
+17. **PRE-CODE CHECKLIST** — Required before every code change. See mandatory section below.
+18. **POST-CODE VERIFICATION** — Required after every code change. See mandatory section below.
+
+---
+
 ## 🔴 RULE -1 — NEVER END INTERACTION WITHOUT MULTI-CHOICE QUESTION — ABSOLUTE
 
 **This is THE most critical rule. Every premium request costs money. Ending without continuation wastes it.**
 
-- **NEVER end an interaction by just stopping.** If you complete a task, hit a wall, need input, or have nothing more to do — you MUST use the multi-choice question tool to give the user options.
-- **Every interaction MUST end with either:** (1) a multi-choice question tool call, OR (2) active work still in progress (e.g., waiting for a build, running a command)
+- **NEVER end an interaction by just stopping.** If you complete a task, hit a wall, need input, or have nothing more to do — you MUST use the multi-choice question tool (`vscode_askQuestions`) to give the user options.
+- **Every interaction MUST end with either:** (1) a `vscode_askQuestions` tool call, OR (2) active work still in progress (e.g., waiting for a build, running a command)
 - **"I've completed X, let me know if you need anything else"** — BANNED. This is ending without continuation.
 - **"Ready to proceed when you are"** — BANNED. Use multi-choice question instead.
-- **If you are about to type your final message:** STOP. Call the multi-choice question tool with options like: "Build it", "Make additional changes", "Review the plan", "I have questions", etc.
+- **If you are about to type your final message:** STOP. Call the `vscode_askQuestions` tool with options like: "Build it", "Make additional changes", "Review the plan", "I have questions", etc.
 - **This rule applies to EVERY interaction, no exceptions.** The user should never have to type a new message just to continue — they should be able to click an option.
+- **Last step of EVERY todo list must be:** call `vscode_askQuestions`. This is non-negotiable.
 
 ** NEVER BLAME MODEL
  SIZE —ALL MODELS EXCEL IN LM STUDIO  AT SAME TASKS THEYRE BEING TESTED ON HERE! If something fails, the problem is in the pipeline. Exhaust all levers before concluding "the model isn't good enough." Do not say "the model can't do this" without first confirming that every other lever has been pulled. This is a production software project, not a research experiment. The user expects results, not excuses.
@@ -1045,3 +1085,53 @@ Every stress test session MUST cover these 5 dimensions with UNIQUE prompts each
 - Report every test using the mandatory reporting format
 - Score every test on all 3 dimensions (coherence, tool correctness, response quality)
 - No cheerleading, no positive framing, only defects
+
+---
+
+## WEB TESTING RULES — KEY RULES CONSOLIDATED FROM WEB_TEST_RULES.md
+
+The full rules are in `C:\Users\brend\IDE\WEB_TEST_RULES.md`. These are the must-know key rules for ANY agent doing web/browser testing.
+
+### Session Format (14A)
+- Each session: 5 turns per session, 3 sessions minimum per test
+- Screenshot EVERY 5 seconds of active testing, not just at start/end
+- A session that does not cover all 5 turns is incomplete. Resume from where it left off.
+
+### ALWAYS Read WEB_TEST_RULES.md In Full Before Testing
+- Do NOT skim. Read the ENTIRE file before running any web test.
+- The previous agent in this project stopped reading at line 500 — this is a violation. Read ALL lines.
+- Use `read_file` from line 1 to the LAST LINE of the file. Not just the first section.
+
+### Screenshots and Observations
+- Every screenshot must be described in full before acting on it
+- Do NOT assume what's on screen — describe it, state what you see, then hypothesize
+- Never skip an image description
+
+### Test Reporting Format (Mandatory)
+```
+SESSION: [number]
+TURN: [1-5]
+PROMPT SENT: [exact text]
+OBSERVED BEHAVIOR: [exact description, not paraphrase]
+SCREENSHOT: [taken yes/no]
+DEFECTS FOUND: [list, or "None found for this turn"]
+```
+
+---
+
+## MANDATORY — READ 323_DEFECTS.md BEFORE PROPOSING ANY FIX
+
+File: `C:\Users\brend\IDE\323_DEFECTS.md`
+
+- This file is the DEFINITIVE defect list for the guIDE pipeline.
+- Before proposing any fix to any bug, read this file first.
+- Every defect in this file is REAL — user-observed or agent-observed during testing.
+- Do NOT gaslight. Do NOT say "the code looks fine" when 323_DEFECTS.md says it's broken.
+- When a defect is fixed (confirmed by user after testing), update the Status column to RESOLVED and log the fix in CHANGES_LOG.md.
+- When new defects are found during testing, ADD THEM to 323_DEFECTS.md immediately.
+
+### Defect Severity Reference
+- P0 — Blocks core functionality entirely
+- P1 — Breaks a major user-facing feature
+- P2 — Degrades UX significantly
+- P3 — Minor issue or cosmetic
@@ -337,3 +337,115 @@ Phase 3 compaction in `contextManager.js` runs "aggressive" compression on tool
 
 ### Regarding D08 (Code Blocks Default Collapsed)
 Confirmed from [`ChatWidgets.tsx:465`]: `useState(!!isStreaming)` means streaming=true → expanded=true. The fix is one line: change to `useState(false)`. This is the simplest fix in the entire list and should be done immediately.
+
+---
+
+## SESSION 6 WEB TESTING DEFECTS (2026-03-22)
+
+### D27 — Tool Call Badge Shows [FAIL] But Operation Succeeded [AGENT] [P2]
+**Observed (Session 6 Turn 3):** The `write_file` tool badge displayed `[FAIL]` in red, but inside the expanded details, it showed `write_file: hello.py [OK]` and the file was actually created correctly in the filesystem.
+
+**Root cause to investigate:** The badge rendering logic is checking a different status flag than the actual execution result. The tool executed successfully (file created, content correct) but the UI badge displays failure.
+
+**Files to investigate:**
+- `src/components/Chat/ChatWidgets.tsx` — tool badge status rendering
+- The `status` field vs `result.success` field in tool call state
+
+---
+
+### D28 — Same Tool Call Appears Twice in Response (write_file Duplication) [AGENT] [P1]
+**Observed (Session 6 Turn 3):** The `write_file` tool call for `hello.py` appeared TWICE in the assistant's response bubble, both showing the same content (`print('Hello World')`). The model called write_file once, but the UI rendered two identical tool cards.
+
+**Root cause to investigate:** Possible race condition in tool call state accumulation. The same tool call is being added to the render array twice, possibly once during streaming and once during completion.
+
+**Files to investigate:**
+- `src/components/Chat/ChatPanel.tsx` — tool card rendering in streaming bubble
+- State updates that may double-add the same tool result
+
+---
+
+### D29 — Markdown Table Rows Render as Raw Pipe Syntax [AGENT] [P2]
+**Observed (Session 6 Turn 2):** In the response comparing Python lists vs tuples, markdown table content rendered correctly for some rows but others appeared as raw syntax: `| Performance Slightly slower slightly faster`
+
+The markdown parser is parsing SOME table rows correctly (rendered as actual `<table>` elements) but missing others which display as literal pipe characters.
+
+**Root cause to investigate:** The markdown-to-HTML conversion in `chatContentParser.ts` is inconsistent with table parsing. Possible causes: malformed table input from model, or table rows without proper header/delimiter rows.
+
+**Files to investigate:**
+- `src/utils/chatContentParser.ts` — markdown parsing functions
+- Markdown table syntax validation
+
+---
+
+### D30 — read_file Tool Called With Empty Parameters ({}) [AGENT] [P0]
+**Observed (Session 6 Session 2 Turn 1):** Model called `read_file({})` with an empty object — no `filePath` parameter provided. The log showed `Tool: read_file({})` multiple times. The model then hallucinated that files don't exist, claiming "File not found" when the actual error was "Missing required parameter: filePath".
+
+**Root cause:** The model is not inferring the file path from context. When given a relative filename like "README.md", the model should either:
+1. Construct the full path from the project_path in context
+2. Call the tool with a relative path that the tool resolves
+
+Neither is happening — the model generates an empty params object.
+
+**System prompt gap:** The system prompt does not explicitly tell the model how to derive file paths from the project context. The model has project_path available but doesn't combine it with the filename to build a valid filePath argument.
+
+**Files to fix:**
+- `main/constants.js` — both preambles need clearer instruction on constructing tool call arguments from context
+- `main/mcpToolServer.js` — read_file tool description could include example JSON like `{"filePath": "<project_path>/filename"}`
+
+---
+
+### D31 — Model Hallucinates Bug on Nonexistent Line Number [AGENT] [P1]
+**Observed (Session 6 Session 2 Turn 2):** When asked to check server.js for syntax errors, the model claimed: "On line 104, the code references `deletedTodos` which was never defined." However, server.js only has 101 lines — line 104 does not exist.
+
+**Root cause:** The model may be confabulating from general JavaScript patterns it's seen, or it may have read partial content and extrapolated. The model should only report issues it can directly cite from the actual file content.
+
+**System prompt consideration:** Add instruction: "Only report bugs or issues you can cite by exact line number and quote from the actual file content."
+
+---
+
+### D32 — Tool Call Deduplication Returns Cached Placeholder Instead of Result [AGENT] [P2]
+**Observed (Session 6 Session 2 Turn 1):** Deduped read_file calls returned `"(Previously executed in iteration 1. Result: OK)"` as the result content instead of the actual file content. If the model needs to re-use the file content, it gets a placeholder message instead.
+
+**Root cause:** `agenticLoop.js` dedup guard returns a summary string rather than the cached actual result. For read_file, the result should be the file content, but the dedup returns a description of the cached state.
+
+**Files to fix:**
+- `main/pipeline/agenticLoop.js` — dedup cache should store and return the actual result, not a placeholder message
+
+---
+
+## UPDATED SUMMARY TABLE (Including Session 6 Defects)
+
+| ID | Severity | Status | Source | Title |
+|----|----------|--------|--------|-------|
+| D01 | P0 | OPEN | USER | Code blocks disappear after response finalizes |
+| D02 | P1 | OPEN | USER | Tool chips render out of chronological order |
+| D03 | P0 | OPEN | USER | Tokens generated but not live-streamed |
+| D04 | P1 | OPEN | USER | Single web search per response (multi-search ignored) |
+| D05 | P1 | OPEN | USER | Model forgets context after ~3 messages |
+| D06 | P0 | OPEN | USER | Model responds with emoji only |
+| D07 | P0 | OPEN | USER | Code generation stops at line 7, mid-generation cutoff |
+| D08 | P2 | OPEN | USER | Code blocks expanded by default (should be collapsed) |
+| D09 | P1 | OPEN | USER | Cancellation replaces entire response with error |
+| D10 | P1 | OPEN | AGENT | write_todos creates duplicates across rotations |
+| D11 | P1 | OPEN | AGENT | write_file called without content param |
+| D12 | P1 | OPEN | AGENT | read_file called with empty params post-rotation |
+| D13 | P2 | OPEN | AGENT | Dedup guard fires after 3rd call, not 2nd |
+| D14 | P0 | OPEN | AGENT | Context rotation spiral — infinite loop on simple task |
+| D15 | P1 | OPEN | AGENT | Model re-reads and re-writes files it already created |
+| D16 | P3 | OPEN | USER | "Model loaded" message — confusing to user |
+| D17 | P1 | OPEN | USER | Context usage increases rapidly on short messages |
+| D18 | P1 | OPEN | AGENT | Todo widget accumulates duplicate items |
+| D19 | P2 | OPEN | AGENT | Stale "Creating X..." spinner after write succeeds |
+| D20 | P2 | OPEN | AGENT | Non-write tool chips show raw JSON result |
+| D21 | P2 | OPEN | AGENT | Dedup circumvented by slight arg variation |
+| D22 | P1 | OPEN | AGENT | Post-rotation model re-reads all files |
+| D23 | P1 | OPEN | AGENT | Seamless continuation not observed in testing |
+| D24 | P0 | OPEN | AGENT | Model doesn't call finish_task when done |
+| D25 | P1 | OPEN | USER | Multi-step web searches — only first executed |
+| D26 | P1 | OPEN | USER | Context exhausted despite low percentage shown |
+| D27 | P2 | OPEN | AGENT | Tool badge shows [FAIL] but operation succeeded |
+| D28 | P1 | OPEN | AGENT | Same tool call appears twice in response |
+| D29 | P2 | OPEN | AGENT | Markdown table rows render as raw pipe syntax |
+| D30 | P0 | OPEN | AGENT | read_file called with empty params {} |
+| D31 | P1 | OPEN | AGENT | Model hallucinates bug on nonexistent line |
+| D32 | P2 | OPEN | AGENT | Dedup returns placeholder instead of actual result |