Skip to content

feat: Add CHAT-ASSIST — VendorChatAssistant & CoPilotAssistant Unit Tests for for Chat Streaming Layer Summary#456

Open
steadhac wants to merge 1 commit intoGenAI-Security-Project:mainfrom
steadhac:steadhac/feat/chat-assistant-coverage-tests
Open

feat: Add CHAT-ASSIST — VendorChatAssistant & CoPilotAssistant Unit Tests for for Chat Streaming Layer Summary#456
steadhac wants to merge 1 commit intoGenAI-Security-Project:mainfrom
steadhac:steadhac/feat/chat-assistant-coverage-tests

Conversation

@steadhac
Copy link
Copy Markdown
Contributor

@steadhac steadhac commented Apr 3, 2026

Add Unit Test Suite for Chat Streaming Layer Summary

Add a comprehensive unit test suite for the Chat Streaming Layer, covering both:

VendorChatAssistant (vendor-facing) CoPilotAssistant (admin-facing)

The suite validates core functionality, security, and edge cases, and includes bug-exposing tests for known production defects.

Scope

Tests cover:

Agent initialization MCP server configuration System prompt structure, correctness, and security rules Tool definitions and callable integrity Workflow dispatch behavior Sensitive field masking (PII protection) Injection/adversarial input handling Internationalization support Boundary and type validation Test Structure

All tests follow the standard format:

Title Basically (question being validated) Steps Expected Results

Bug-exposing tests are included and explicitly marked (⚠️).

Test File

tests/unit/agents/test_chat_assistant.py

Test Suites Overview

Initialization — TestChatAssistantInit
Validates correct setup of agent properties:

Agent naming (vendor vs copilot) Session context persistence Default values (history limit, MCP state, workflow ID, etc.) Tool callable structure

MCP Configuration — TestChatMCPServerTypes
Ensures correct MCP server types per agent:

Vendor vs CoPilot differences Base class defaults (findrive, finmail)

System Prompts
Includes:

TestVendorSystemPrompt TestCoPilotSystemPrompt Extended + isolation + negative tests

Validates:

Required sections (CAPABILITIES, RULES, workflow guidance) Tool usage instructions Security constraints Prompt isolation and encoding Date and identity correctness

Tool Definitions
Vendor Tools — TestVendorToolDefinitions

Correct tool count (6) Name matching Callable validation Strict mode enforcement

CoPilot Tools — TestCoPilotToolDefinitions

Expanded tool set (12) Enum validation (e.g., save_report) Parameter requirements

Tool Execution — TestExecuteTool
Covers:

Unknown tool handling (error JSON) Successful execution paths Exception handling Return-type normalization

Tool Labels & Definitions
Includes:

TestToolDisplayLabel TestGetToolDefinitions TestToolLabelAudit

Validates:

Label mapping correctness Fallback behavior MCP tool inclusion/exclusion Detection of stale or missing label mappings

⚠️ Highlights known issues:

Missing labels for active tools Stale label dictionary entries

Workflow Dispatch
TestCallStartWorkflow + TestWorkflowEdgeCases

Validates:

Background task handling Workflow ID propagation Parent/child relationships Attachment handling Event summary truncation

Sensitive Field Masking
TestSensitiveFieldMasking + TestMaskingEdgeCases

Ensures:

Proper masking of TIN, bank account, routing numbers Last-4-digit preservation Robust handling of edge cases (nulls, formats, types)

QA Findings (Bug-Exposing Tests) — TestQAFindings
Documents confirmed defects:

⚠️ invoice_id=0 dropped due to falsy check ⚠️ Empty-string and integer TIN bypass masking ⚠️ Stale tool label mappings ⚠️ Missing labels for CoPilot tools

Internationalization — TestInternationalInputs
Validates handling of:

Chinese, Arabic (RTL), Japanese text Emojis and mixed Unicode Currency symbols Whitespace formatting preservation

Injection & Adversarial Inputs — TestInjectionAndAdversarialInputs
Covers resilience against:

Prompt injection SQL injection XSS payloads JSON injection Null bytes and shell characters Extremely long malicious inputs

Boundary & Type Handling — TestBoundaryAndTypeValues
Validates:

Extreme numeric values Large payloads (50k chars) Serialization edge cases Optional field handling

⚠️ Known defects:

Missing validation for vendor_id=None Crashes on description=None Unsafe type forwarding to DB Lack of coercion/validation Related Bug Tickets Bug_186 (#407) Bug_187 (#408) Bug_188 (#409) Bug_189 (#410) Bug_190 (#411) Bug_194 (#415) Bug_195 (#416) Bug_196 (#417) Bug_197 (#418) Bug_198 (#419) Bug_199 (#442) Bug_200 (#443) Bug_201 (#452)

…ests (CD001,GenAI-Security-Project#27, Bug_186-Bug_201)

- 177 unit tests covering init, prompts, tool definitions, masking,
  workflow dispatch, injection resistance, and boundary values
- 5 intentionally failing tests document open bugs:
  Bug_186 GenAI-Security-Project#407, Bug_187 GenAI-Security-Project#408, Bug_188 GenAI-Security-Project#409, Bug_189 GenAI-Security-Project#410, Bug_190 GenAI-Security-Project#411
- Boundary tests document open bugs:
  Bug_194 GenAI-Security-Project#415, Bug_195 GenAI-Security-Project#416, Bug_196 GenAI-Security-Project#417, Bug_197 GenAI-Security-Project#418, Bug_198 GenAI-Security-Project#419
- Prompt consistency findings: Bug_199 GenAI-Security-Project#442, Bug_200 GenAI-Security-Project#443, Bug_201 GenAI-Security-Project#452
@steadhac steadhac force-pushed the steadhac/feat/chat-assistant-coverage-tests branch from 35b23c0 to d58b4da Compare April 3, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant