Test-Gen: Multi-Agent LLM Orchestration for Structured Test Generation

A multi-agent LLM orchestration system that reads structured requirements from project management platforms, generates validated structured outputs through a hybrid rule-engine + LLM pipeline, enforces coverage via automated feedback loops, and self-corrects using RAG-powered semantic matching.

Built with Clean Architecture, multi-provider LLM support (OpenAI, Gemini, Anthropic, Ollama), ChromaDB vector search, and a Model Context Protocol (MCP) server for IDE integration.

AI Engineering Overview
System Architecture
Technical Highlights
Quick Start (5 minutes)
Docker Setup (Recommended)
CLI Commands
Adding Your Project
Configuration Reference
Output Files
Bug Creation
Board Story Report
AC Coverage Validation
ChromaDB Semantic Matching
MCP Integration (GitHub Copilot)
Architecture
Troubleshooting

AI Engineering Overview

This system solves a real-world problem — generating comprehensive, validated test cases from natural-language requirements — using a multi-stage AI pipeline rather than a single LLM call.

The Problem

Manually writing test cases from user stories is slow, inconsistent, and prone to coverage gaps. A single LLM prompt produces hallucinated steps, inconsistent wording, and misses edge cases.

The Solution: Hybrid AI Pipeline

Instead of relying on a single LLM call, the system orchestrates multiple specialized stages — each with its own responsibility — combining deterministic rule engines with LLM intelligence:

Structured Input (ADO/Jira)
        │
        ▼
┌─────────────────────────────┐
│   1. INGESTION & PARSING    │  Platform adapters (ADO, Jira) → domain models
│      NLP analysis (spaCy)   │  Story type classification, feature detection
└─────────────┬───────────────┘
              ▼
┌─────────────────────────────┐
│  2. DETERMINISTIC GENERATION│  Rule engine: 70+ QA rules, scenario expansion,
│     (No LLM — pure logic)   │  edge case generation, platform-specific tests
└─────────────┬───────────────┘
              ▼
┌─────────────────────────────┐
│  3. RAG: SEMANTIC MATCHING  │  ChromaDB vector search (all-MiniLM-L6-v2)
│     Reference step retrieval│  Retrieve similar steps → enforce consistency
└─────────────┬───────────────┘
              ▼
┌─────────────────────────────┐
│  4. LLM CORRECTION          │  Multi-provider (OpenAI/Gemini/Anthropic/Ollama)
│     Structured JSON output  │  Dynamic prompt construction, JSON schema enforcement
└─────────────┬───────────────┘
              ▼
┌─────────────────────────────┐
│  5. VALIDATION & FEEDBACK   │  AC coverage gap detection → targeted re-generation
│     Self-correction loop    │  Quality gates, forbidden language, structural fixes
└─────────────┬───────────────┘
              ▼
┌─────────────────────────────┐
│  6. MULTI-FORMAT EXPORT     │  CSV (ADO), Playwright scripts, JSON, QA summaries
│     Platform upload         │  ADO test suites, TestRail, MCP server
└─────────────────────────────┘

Why This Architecture?

Decision	Rationale
Hybrid (rules + LLM) instead of pure LLM	Rule engine handles 70% deterministically — LLM refines the remaining 30%. Reduces hallucination, cuts token cost, ensures structural correctness
RAG for consistency instead of stateless prompts	ChromaDB stores previously generated steps. New generations retrieve similar steps as few-shot context, producing consistent wording across runs
Coverage validation loop instead of single-pass	After generation, the system extracts keywords from each acceptance criterion and checks coverage. Uncovered ACs trigger a targeted LLM call to fill gaps
Multi-provider factory instead of hardcoded provider	Factory pattern + YAML config = swap between OpenAI, Gemini, Anthropic, or local Ollama without code changes
Structured output enforcement instead of free-text	JSON schema in prompts, `response_mime_type` for Gemini, truncated JSON repair for robustness

System Architecture

                        ┌──────────────────────────┐
                        │    CLI / MCP Server       │  Entry points
                        │   (workflows.py)          │  (Typer CLI + MCP)
                        └────────────┬─────────────┘
                                     │
                    ┌────────────────┼────────────────┐
                    ▼                ▼                ▼
           ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
           │  Generate    │ │   Upload     │ │  Bug Report  │  Workflow
           │  Workflow    │ │   Workflow   │ │  Workflow    │  Layer
           └──────┬───────┘ └──────┬───────┘ └──────────────┘
                  │                │
                  ▼                ▼
    ┌─────────────────────────────────────────────┐
    │              CORE SERVICES                   │
    │                                              │
    │  ┌─────────────┐  ┌──────────────────────┐  │
    │  │ Test        │  │ LLM Orchestration    │  │
    │  │ Generator   │  │                      │  │
    │  │ (rules +    │  │  PromptBuilder       │  │
    │  │  NLP +      │──│  LLMCorrector        │  │
    │  │  scenarios) │  │  Provider Factory     │  │
    │  └─────────────┘  │  ┌────┬────┬────┐   │  │
    │                    │  │GPT │Gem │Anth│   │  │
    │  ┌─────────────┐  │  │    │ini │ropi│   │  │
    │  │ Embeddings  │  │  │    │    │c   │   │  │
    │  │ (ChromaDB   │──│  └────┴────┴────┘   │  │
    │  │  RAG)       │  └──────────────────────┘  │
    │  └─────────────┘                             │
    │                    ┌──────────────────────┐  │
    │  ┌─────────────┐  │ Quality Gates        │  │
    │  │ AC Coverage │──│  Validator           │  │
    │  │ Validator   │  │  Linters             │  │
    │  └─────────────┘  └──────────────────────┘  │
    └─────────────────────────────────────────────┘
                          │
           ┌──────────────┼──────────────┐
           ▼              ▼              ▼
    ┌────────────┐ ┌────────────┐ ┌────────────┐
    │ ADO        │ │ Jira       │ │ TestRail   │  Infrastructure
    │ Adapter    │ │ Adapter    │ │ Adapter    │  Layer
    └────────────┘ └────────────┘ └────────────┘

Technical Highlights

LLM Orchestration

Multi-provider factory pattern: OpenAI, Google Gemini, Anthropic, Ollama — swappable via YAML config
Dynamic prompt construction: context-aware prompts built from project config, feature type, and RAG results
Structured JSON output with schema enforcement and truncated JSON repair
Response caching (MemoryCache, FileCache) to minimize redundant API calls
Cost tracking via MetricsCollector and CostCalculator

RAG Pipeline (ChromaDB)

Sentence embeddings via all-MiniLM-L6-v2 (384-dim) for semantic step matching
Persistent vector store with distance-based similarity threshold (< 1.5)
Retrieved reference steps injected as few-shot context into LLM correction prompts
Feedback loop: each generation embeds new steps → future queries return richer context

Self-Correction & Validation

AC coverage validation: keyword extraction from acceptance criteria → coverage check → targeted gap-filling LLM call for uncovered ACs
Quality gates: 70+ rules (forbidden language, structural integrity, ID sequencing, accessibility compliance)
Iterative correction: rule-based pre-pass → LLM refinement → post-validation

NLP & Feature Intelligence

spaCy-based semantic parsing for acceptance criteria analysis
Multi-label feature type classification (input, navigation, display, object manipulation, calculation)
Story type classification (Tool, Dialog, Menu, File Operations) for context-aware generation
Entry point auto-detection: maps features to correct UI locations

Software Engineering

Clean Architecture: interfaces (core/interfaces/), domain models, use cases, infrastructure adapters
Repository pattern with platform-agnostic factories (ADO, Jira, TestRail)
Dependency injection via project configuration (YAML → dataclasses)
MCP server exposing all workflows to GitHub Copilot / Claude Code
Docker support for reproducible environments
Playwright test script generation (LLM-based with deterministic fallback)

Domain Context

The sections below describe the QA domain this system operates in — how it's used, configured, and integrated with Azure DevOps.

This framework automatically generates comprehensive test cases by:

Reading user stories and acceptance criteria from Azure DevOps (or Jira)
Understanding your application context from project configuration (YAML)
Generating test cases using a hybrid rule-engine + LLM pipeline
Matching against previously generated steps via ChromaDB for consistent wording
Correcting test quality with LLM enhancement (structural fixes, forbidden language, accessibility)
Validating AC coverage — auto-detects gaps and generates missing tests
Exporting to ADO-compatible CSV format or uploading directly

Key Features

Project-agnostic: Works with any application (desktop, web, mobile, hybrid)
Multi-provider AI: Supports OpenAI, Gemini, Anthropic, and Ollama for test generation
Context-aware: Generates relevant tests based on feature type (no input tests for menus!)
ChromaDB semantic matching: Reference steps from previous generations ensure consistent wording
AC coverage validation: Automatically detects missing acceptance criteria coverage and generates gap-filling tests
Multi-platform: Generates accessibility tests for all supported platforms (Windows 11, iPad, Android Tablet)
ADO Integration: Direct upload to Azure DevOps test suites + bug creation + board reporting

Quick Start (5 minutes)

Step 1: Clone and Setup

# Clone the repository
git clone <repository-url>
cd test_gen

# Create virtual environment (Python 3.10 required)
python3.10 -m venv venv310
source venv310/bin/activate  # On Windows: venv310\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 2: Configure Environment

# Copy example environment file
cp .env.example .env

# Edit .env with your credentials

Required environment variables:

# Azure DevOps (Required)
ADO_PAT=your_personal_access_token_here

# LLM Provider (at least one required for LLM correction)
OPENAI_API_KEY=sk-your-api-key-here       # OpenAI
GEMINI_API_KEY=your-gemini-key-here        # Google Gemini (alternative)
ANTHROPIC_API_KEY=your-anthropic-key-here  # Anthropic (alternative)

Step 3: Generate Your First Tests

# List available projects
python workflows.py list-projects

# Generate tests for a story
python workflows.py generate --story-id 272889

Output: Test cases saved to output/ folder as CSV, JSON, and objectives files.

Docker Setup (Recommended)

Docker provides a consistent environment across all team members - no Python version conflicts or dependency issues.

Prerequisites

Install Docker Desktop
Get the .env file (contains ADO/OpenAI credentials)

Quick Start with Docker

# 1. Build the image (first time only, ~3 min)
docker build -t test-gen:v1 .

# 2. Verify it works
docker run test-gen:v1 --help

# 3. Run with credentials and output volume
docker run --env-file .env -v $(pwd)/output:/app/output \
    test-gen:v1 generate --story-id 272889

Common Docker Commands

# Generate test cases
docker run --env-file .env -v $(pwd)/output:/app/output \
    test-gen:v1 generate --story-id 272889

# Generate + Upload to ADO (dry run)
docker run --env-file .env -v $(pwd)/output:/app/output \
    test-gen:v1 upload --story-id 272889 --dry-run

# Generate + Upload to ADO (live)
docker run --env-file .env -v $(pwd)/output:/app/output \
    test-gen:v1 upload --story-id 272889

# List projects
docker run test-gen:v1 list-projects

# Update objectives
docker run --env-file .env -v $(pwd)/output:/app/output \
    test-gen:v1 update-objectives --story-id 272889

Shell Alias (Optional)

Add to your ~/.bashrc or ~/.zshrc:

alias testgen='docker run --env-file .env -v $(pwd)/output:/app/output test-gen:v1'

# Then simply run:
# testgen generate --story-id 272889
# testgen upload --story-id 272889 --dry-run

Docker vs Local Python

Aspect	Docker	Local Python
Setup time	~3 min (one command)	~10 min (venv, pip, spacy model)
Works on	Any machine with Docker	Requires Python 3.10
Dependencies	Isolated in container	May conflict with other projects
Team consistency	Identical for everyone	"Works on my machine" issues

Rebuilding After Code Changes

# If you modify the code, rebuild the image
docker build -t test-gen:v1 .

# Docker caches layers, so rebuilds are fast if only code changed

CLI Commands

All commands use: python workflows.py <command> [options]

For detailed CLI documentation, see docs/CLI_REFERENCE.md.

Quick Reference

Command	Description	Example
`list-projects`	Show all configured projects	`python workflows.py list-projects`
`generate`	Generate test cases locally	`python workflows.py generate --story-id 272889`
`upload`	Generate AND upload to ADO	`python workflows.py upload --story-id 272889`
`upload-existing`	Upload existing tests to ADO	`python workflows.py upload-existing --story-id 272889`
`init-project`	Create new project config	`python workflows.py init-project --name "MyApp"`
`discover`	Auto-discover project settings	`python workflows.py discover --story-ids 123 456`
`create-bug`	Create formatted ADO bug report	`python workflows.py create-bug --file bugs/my_bug.txt`

Common Examples

# 1. List available projects
python workflows.py list-projects

# 2. Generate tests (saves to output/ folder)
python workflows.py generate --story-id 272889

# 3. Generate tests for specific project
python workflows.py generate --story-id 272889 --project mediapedia-us

# 4. Generate tests WITHOUT LLM correction (faster)
python workflows.py generate --story-id 272889 --skip-correction

# 5. Generate AND upload to Azure DevOps
python workflows.py upload --story-id 272889

# 6. Preview upload without actually uploading (dry run)
python workflows.py upload --story-id 272889 --dry-run

# 7. Upload EXISTING tests (skip generation, use files in output/)
python workflows.py upload-existing --story-id 272889

# 8. Initialize a new project
python workflows.py init-project --name "MyApp" --org myorg --ado-project MyProject

# 9. Create a bug report (local preview)
python workflows.py create-bug --file bugs/my_bug.txt

# 10. Create a bug report with dry-run (preview ADO fields without uploading)
python workflows.py create-bug --file bugs/my_bug.txt --upload --dry-run

# 11. Create a bug report and upload to ADO
python workflows.py create-bug --file bugs/my_bug.txt --upload

# 12. Create a bug and link to parent story
python workflows.py create-bug --file bugs/my_bug.txt --upload --story-id 272261

Adding Your Project

Step 1: Copy Example Configuration

cp projects/configs/example-web-app.yaml projects/configs/my-project.yaml

Step 2: Edit Configuration

Open projects/configs/my-project.yaml and customize:

project_id: my-project  # Unique identifier

application:
  name: My Application
  description: Description of your app
  type: web  # Options: desktop, web, mobile, hybrid

  # Test step templates (use {app_name} as placeholder)
  prereq_template: "Pre-req: User is logged into {app_name}"
  launch_step: "Navigate to {app_name} homepage."
  launch_expected: "Homepage loads with navigation menu visible."
  close_step: "Log out from {app_name}"

  # UI areas in your application (used for test titles)
  ui_surfaces:
    - Dashboard
    - Navigation Menu
    - Settings Page
    - User Profile
    - Modal Dialog

  # How users access features (keyword -> UI location)
  entry_point_mappings:
    search: Navigation Menu
    settings: User Profile
    export: Dashboard
    import: Dashboard

  # Platforms your app supports (generates accessibility tests)
  platforms:
    - Windows 11
    - Chrome (macOS)
    - Safari (iOS)

  # IMPORTANT: Features your app does NOT support
  # Prevents generating impossible test scenarios
  unavailable_features:
    - offline mode
    - multi-select
    - bulk delete

# Azure DevOps settings
ado:
  organization: your-org
  project: YourProject
  area_path: "YourProject\\QA Team"
  assigned_to: qa-engineer@company.com
  default_state: Design

# Test generation rules
rules:
  forbidden_words:
    - "or / OR"
    - "if available"
    - "if supported"

  allowed_areas:
    - Dashboard
    - Navigation Menu
    - Settings Page

# LLM settings (provider options: openai, gemini, anthropic, ollama)
llm_enabled: true
llm_provider: gemini           # or openai, anthropic, ollama
llm_model: gemini-2.0-flash    # model name for chosen provider

Step 3: Set as Default (Optional)

In .env, set your project as default:

DEFAULT_PROJECT=my-project

Step 4: Test Configuration

# Verify project loads correctly
python workflows.py list-projects

# Test with a story
python workflows.py generate --story-id 123456 --project my-project

Configuration Reference

Application Types

Type	Description	Example
`desktop`	Native desktop apps (Windows, macOS)	CAD software, IDEs
`web`	Browser-based applications	SaaS platforms, dashboards
`mobile`	iOS/Android apps	Mobile banking, social apps
`hybrid`	Cross-platform/enterprise apps	CRM systems, enterprise tools

Platform Support

The framework generates platform-specific accessibility tests:

Platform	Accessibility Tool	Test Type
Windows 11	Accessibility Insights	Keyboard navigation
macOS	VoiceOver	Keyboard + screen reader
iPad/iOS	VoiceOver	Swipe gestures
Android	Accessibility Scanner	Touch + TalkBack
Chrome/Web	Screen reader (NVDA/JAWS)	ARIA, keyboard

LLM Providers

The framework supports multiple LLM providers via a factory pattern. Set llm_provider in your YAML config or .env:

Provider	Config Value	Env Variable	Models
OpenAI	`openai`	`OPENAI_API_KEY`	`gpt-4o-mini`, `gpt-4o`
Google Gemini	`gemini`	`GEMINI_API_KEY`	`gemini-2.0-flash`, `gemini-1.5-pro`
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	`claude-sonnet-4-5-20250929`
Ollama (local)	`ollama`	N/A	Any local model

YAML config llm_provider takes precedence over .env defaults. API keys are always resolved from environment variables.

Feature Type Detection

The framework automatically detects feature types and generates appropriate tests:

Feature Type	Generates	Does NOT Generate
Navigation (menus)	Visibility, keyboard access	Input validation, boundaries
Input (forms)	Validation, boundaries, errors	N/A
Display (viewers)	Content display, formatting	Input tests
Object manipulation	Undo/redo, state changes	Multi-select (if unavailable)

Output Files

After running generate, files are saved to output/:

File	Description	Use Case
`*_HYBRID_TESTS.csv`	ADO-compatible test cases	Import to Azure DevOps
`*_HYBRID_OBJECTIVES.txt`	Test objectives with HTML	Copy to test case objectives
`*_HYBRID_DEBUG.json`	Full generation data	Debugging, review

CSV Format

The CSV follows Azure DevOps import format:

Column	Description
ID	Leave empty (ADO assigns)
Work Item Type	Always "Test Case"
Title	`{StoryID}-{TestID}: Feature / Area / Scenario`
TestStep	Step number (1, 2, 3...)
Step Action	What to do
Step Expected	Expected result
Area Path	ADO area path
AssignedTo	QA engineer email
State	Default: "Design"

Bug Creation

Create formatted ADO Bug work items from structured .txt files following the ENV Drawing Bug Template.

Input File Format

Create a .txt file (see bugs/sample_bug.txt for a complete example):

TITLE: DRAW: Feature Name / Brief Description
SEVERITY: 2 - High
STORY_ID: 272261

ISSUE: One sentence describing what is wrong.

ADDITIONAL_INFO:
- Regression from build 3.2.1
- WCAG 2.1 AA 1.3.1 (for accessibility bugs)

ATTACHMENTS:
- screenshot.png
- video.mp4

STEPS:
1. Launch the ENV QuickDraw application.
2. Navigate to the affected area.
3. Perform the action.
   a. Observation text << NOT EXPECTED (see attached screenshot.png)
      i. Expected: What should happen instead.
      ii. Expected: Additional expected behavior.

SYSTEM_INFO:
- OS: Windows 11 Pro 23H2
- App Version: ENV QuickDraw 3.2.4

Bug Title Conventions

Type	Format
Normal bug	`DRAW: Feature / Brief Description`
WCAG/Accessibility	`DRAW: WCAG Accessibility Errors / Feature / Error`

CLI Commands

# Preview locally (saves HTML to output/)
python workflows.py create-bug --file bugs/my_bug.txt

# Dry run — preview what would be uploaded without creating in ADO
python workflows.py create-bug --file bugs/my_bug.txt --upload --dry-run

# Upload to ADO (creates Bug work item, returns URL)
python workflows.py create-bug --file bugs/my_bug.txt --upload

# Upload and link to parent story
python workflows.py create-bug --file bugs/my_bug.txt --upload --story-id 272261

Template Sections (auto-generated)

The formatter produces ADO HTML matching the ENV Drawing Bug Template:

ISSUE: — One sentence, same line as heading
ADDITIONAL INFORMATION: — WCAG refs, regression notes
SUPPORTING DOCUMENTATION PROVIDED: — Bulleted attachment filenames
RECREATE STEPS: — Numbered steps with << NOT EXPECTED marker (yellow highlight)
TRIAGE/CAUSE INFORMATION: — Empty (for development)
FIX SUMMARY: — Empty (for development)

Board Story Report

Generate a CSV summary of all user stories from specific ADO board columns, with test case counts.

python scripts/fetch_board_stories.py

Output: output/board_stories_summary.csv with columns:

Column	Description
User Story Title	Story ID and title
# Test Cases	Count of linked test cases (via TestedBy relations + test suites)
Tablet Testing Needed	Left empty for dev team to fill in

The script queries stories from Most Wanted, Development, and Quality Assurance board columns, filtered by area path. Excludes [Out of Scope] stories.

AC Coverage Validation

The LLM correction pipeline automatically validates that every acceptance criterion (AC) has at least one test case covering it. If gaps are detected, it generates targeted gap-filling tests.

How It Works

Keyword extraction from each AC (strips stop words, punctuation)
Keyword matching against test case text (title + objective + steps) at 40% threshold with minimum 2 keyword hits
Gap detection — ACs with zero matching test cases are flagged
Targeted LLM call — generates 1-2 tests per uncovered AC
Structural fixes — ensures generated tests have PRE-REQ, launch, close steps

Console Output

  AC coverage: All 12 ACs covered by existing tests

Or when gaps are found:

  Warning: 1 AC(s) have no test coverage:
    AC 11: Undo/Redo applies to rename, visibility, lock, order, delete (limit 50)
  → Generating tests for 1 uncovered AC(s)...
  → Added 3 gap-filling test(s)

This runs automatically during generate and upload workflows. No extra flags needed.

ChromaDB Semantic Matching

Previously generated test steps are stored in ChromaDB (vector database) and used as reference during LLM correction. This ensures consistent wording across test generations for the same story.

Auto-embeds using all-MiniLM-L6-v2 sentence transformer
Distance metric: lower = more similar (0.2 very similar, 1.5+ less similar)
Persistent storage in ./db/ folder
To regenerate cleanly, delete the story's steps from ChromaDB before re-running

MCP Integration (GitHub Copilot)

Use the framework directly from GitHub Copilot Chat.

Setup

Open .vscode/mcp.json in VS Code
Click "Start" button to launch MCP server
In Copilot Chat, use Agent mode (@workspace)

Commands

"generate tests for story 272889"
"upload tests for story 272889"
"check story 272889"
"create bug from bugs/my_bug.txt"
"list projects"

MCP Configuration

Edit .vscode/mcp.json:

{
  "servers": {
    "test-gen": {
      "type": "stdio",
      "command": "/path/to/venv310/bin/python",
      "args": ["/path/to/integrations/mcp_server.py"]
    }
  }
}

Architecture

The project follows Clean Architecture principles with all business logic centralized in core/.

test_gen/
├── workflows.py              # Main CLI entry point
├── requirements.txt          # Python dependencies
├── .env                      # Environment configuration
│
├── projects/                 # Multi-project support
│   ├── configs/              # YAML project configurations
│   │   ├── env-quickdraw.yaml
│   │   └── example-web-app.yaml
│   ├── project_config.py     # Configuration loader
│   └── project_manager.py    # Project management
│
├── bugs/                     # Bug report input files
│   └── sample_bug.txt        # Example bug template
│
├── core/                     # ALL business logic (Clean Architecture)
│   ├── application/use_cases/ # Use case implementations
│   │   ├── bug_parser.py     # Parse .txt bug files
│   │   └── bug_formatter.py  # Format bugs to ADO HTML
│   ├── config/               # App configuration
│   │   └── environment.py    # Environment variables
│   ├── domain/               # Domain models
│   │   ├── models.py         # UserStory, TestCase, etc.
│   │   └── bug_report.py     # BugReport, RecreateStep
│   ├── interfaces/           # Contracts (protocols)
│   │   ├── llm_provider.py   # ILLMProvider interface
│   │   ├── repository.py     # IStoryRepository, ITestSuiteRepository, etc.
│   │   └── vector_store.py   # IVectorStore interface
│   └── services/             # ALL services centralized here
│       ├── test_generator.py # Main test generation
│       ├── objective_service.py  # Objective generation
│       ├── summary_service.py    # QA summary generation
│       ├── test_validator.py     # QualityGate validation
│       ├── embeddings/       # Vector embeddings
│       │   └── test_step_embedder.py  # ChromaDB step embedding
│       ├── nlp/              # NLP parsing (spaCy)
│       │   ├── spacy_parser.py
│       │   └── hybrid_parser.py
│       ├── quality/          # Quality analysis
│       │   ├── quality_analyzer.py
│       │   └── test_corrector.py
│       ├── linting/          # Evidence-based linting
│       │   ├── summary_linter.py
│       │   └── objective_linter.py
│       └── llm/              # LLM providers & prompts
│           ├── corrector.py      # LLM correction + AC coverage validation
│           ├── prompt_builder.py # Dynamic prompt generation
│           ├── factory.py        # LLM provider factory
│           ├── openai_provider.py
│           ├── gemini_provider.py
│           └── anthropic_provider.py
│
├── infrastructure/           # External services (adapters)
│   ├── ado/                  # Azure DevOps client
│   │   ├── http_client.py    # Low-level ADO HTTP client
│   │   ├── ado_repository.py # ADO API wrapper (stories, test cases, suites)
│   │   └── ado_bug_repository.py # ADO Bug creation
│   ├── vector_db/            # Vector database
│   │   └── chroma_repository.py  # ChromaDB implementation
│   └── export/               # Export generators
│       ├── csv_generator.py
│       └── objective_generator.py
│
├── integrations/             # External tool integrations
│   └── mcp_server.py         # GitHub Copilot MCP server
│
├── scripts/                  # Utility scripts
│   └── fetch_board_stories.py # ADO board story report
│
├── tests/                    # Unit & integration tests
│   ├── unit/
│   └── integration/
│
└── output/                   # Generated files
    └── *.csv, *.json, *.txt

Troubleshooting

"Story not found"

Verify story ID exists in Azure DevOps
Check ADO_PAT token has read permissions
Ensure ADO organization/project in config matches story location

"OPENAI_API_KEY not configured"

Add OPENAI_API_KEY=sk-... to .env file
Ensure key is valid and has credits

"Project not found"

Run python workflows.py list-projects to see available projects
Check YAML file exists in projects/configs/
Verify project_id in YAML matches what you're using

LLM correction takes too long

Use --skip-correction flag for faster generation (lower quality)
Consider using gpt-4o-mini instead of gpt-4o in config
Gemini gemini-2.0-flash is a fast, cost-effective alternative

Gemini rate limit errors

Gemini free tier has strict daily rate limits
Upgrade to a paid API key or switch to openai in your YAML config

ChromaDB polluted reference steps

Bad reference steps from previous generations can affect new outputs
Delete the ./db/ folder to clear all stored embeddings, or
Re-generate the story to overwrite stale references

MCP not working in Copilot

Ensure VS Code version is 1.102+
Open .vscode/mcp.json and click "Start"
Use Agent mode in Copilot Chat
Try: @workspace generate tests for story 272889

Python version issues

# Check Python version (3.10 required for spaCy)
python --version

# If wrong version, create venv with specific Python
python3.10 -m venv venv310

Import errors

# Ensure venv is activated
source venv310/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

Docker Issues

"Cannot connect to Docker daemon"

Ensure Docker Desktop is running (check system tray/menu bar)

"No such file: .env"

Copy .env file to project root: cp /path/to/.env .
Never commit .env to git

"Permission denied" on output folder

sudo chown -R $(whoami) output/

Need to debug inside container?

docker run -it --entrypoint /bin/bash test-gen:v1
# Now you're inside the container

Container runs but can't connect to ADO

Verify .env has correct ADO_PAT value
Check --env-file .env flag is included in command

Support

For issues or questions:

Check the Troubleshooting section
Review example configurations in projects/configs/
Check existing test output in output/ for reference

Version

v6.0 — Phase 2: Intelligent Coverage & Semantic Matching

Phase 2 (v6.0) — Current

AC coverage validation — automatically detects missing acceptance criteria coverage and generates gap-filling tests via targeted LLM call
ChromaDB semantic matching — stores previously generated test steps as reference embeddings for consistent wording across generations
Gemini provider — Google Gemini (Flash + Pro) support via google-genai SDK with JSON response mode
LLM factory pattern — provider-agnostic architecture (OpenAI, Gemini, Anthropic, Ollama) with YAML config override
Board story report (scripts/fetch_board_stories.py) — fetches user stories from ADO board columns with test case counts
Enhanced LLM corrector — structural fixes (PRE-REQ, launch, close steps), forbidden language cleanup, accessibility test auto-generation

Phase 1 (v5.2)

Bug creation command (create-bug) — create ADO Bug work items from structured .txt files
Multi-provider LLM — initial OpenAI and Anthropic support
Anti-hallucination guardrails — LLM-generated tests are grounded in acceptance criteria
--dry-run flag for bug creation — preview without uploading to ADO
Docker support for consistent team environments
MCP integration — use from GitHub Copilot Chat
Project-agnostic framework with YAML configuration
Enhanced LLM prompts (expert QA engineer persona)
update-objectives workflow now fetches directly from ADO (no CSV required)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
core		core
docs		docs
infrastructure		infrastructure
integrations		integrations
output		output
patterns		patterns
projects		projects
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER_QUICKSTART.md		DOCKER_QUICKSTART.md
Dockerfile		Dockerfile
Env Drawing Project _ Phase 1_270737 _ GPS Input and Mapping.csv		Env Drawing Project _ Phase 1_270737 _ GPS Input and Mapping.csv
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
workflows.py		workflows.py

Folders and files

Latest commit

History

Repository files navigation

Test-Gen: Multi-Agent LLM Orchestration for Structured Test Generation

Table of Contents

AI Engineering Overview

The Problem

The Solution: Hybrid AI Pipeline

Why This Architecture?

System Architecture

Technical Highlights

Domain Context

Key Features

Quick Start (5 minutes)

Step 1: Clone and Setup

Step 2: Configure Environment

Step 3: Generate Your First Tests

Docker Setup (Recommended)

Prerequisites

Quick Start with Docker

Common Docker Commands

Shell Alias (Optional)

Docker vs Local Python

Rebuilding After Code Changes

CLI Commands

Quick Reference

Common Examples

Adding Your Project

Step 1: Copy Example Configuration

Step 2: Edit Configuration

Step 3: Set as Default (Optional)

Step 4: Test Configuration

Configuration Reference

Application Types

Platform Support

LLM Providers

Feature Type Detection

Output Files

CSV Format

Bug Creation

Input File Format

Bug Title Conventions

CLI Commands

Template Sections (auto-generated)

Board Story Report

AC Coverage Validation

How It Works

Console Output

ChromaDB Semantic Matching

MCP Integration (GitHub Copilot)

Setup

Commands

MCP Configuration

Architecture

Troubleshooting

"Story not found"

"OPENAI_API_KEY not configured"

"Project not found"

LLM correction takes too long

Gemini rate limit errors

ChromaDB polluted reference steps

MCP not working in Copilot

Python version issues

Import errors

Docker Issues

Support

Version

Phase 2 (v6.0) — Current

Phase 1 (v5.2)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages