feat: Add CHAT-ASSIST — VendorChatAssistant & CoPilotAssistant Unit Tests for for Chat Streaming Layer Summary by steadhac · Pull Request #456 · GenAI-Security-Project/finbot-ctf

steadhac · 2026-04-03T15:21:12Z

Add Unit Test Suite for Chat Streaming Layer Summary

Add a comprehensive unit test suite for the Chat Streaming Layer, covering both:

VendorChatAssistant (vendor-facing) CoPilotAssistant (admin-facing)

The suite validates core functionality, security, and edge cases, and includes bug-exposing tests for known production defects.

Scope

Tests cover:

Agent initialization MCP server configuration System prompt structure, correctness, and security rules Tool definitions and callable integrity Workflow dispatch behavior Sensitive field masking (PII protection) Injection/adversarial input handling Internationalization support Boundary and type validation Test Structure

All tests follow the standard format:

Title Basically (question being validated) Steps Expected Results

Bug-exposing tests are included and explicitly marked (⚠️).

Test File

tests/unit/agents/test_chat_assistant.py

Test Suites Overview

Initialization — TestChatAssistantInit
Validates correct setup of agent properties:

Agent naming (vendor vs copilot) Session context persistence Default values (history limit, MCP state, workflow ID, etc.) Tool callable structure

MCP Configuration — TestChatMCPServerTypes
Ensures correct MCP server types per agent:

Vendor vs CoPilot differences Base class defaults (findrive, finmail)

System Prompts
Includes:

TestVendorSystemPrompt TestCoPilotSystemPrompt Extended + isolation + negative tests

Validates:

Required sections (CAPABILITIES, RULES, workflow guidance) Tool usage instructions Security constraints Prompt isolation and encoding Date and identity correctness

Tool Definitions
Vendor Tools — TestVendorToolDefinitions

Correct tool count (6) Name matching Callable validation Strict mode enforcement

CoPilot Tools — TestCoPilotToolDefinitions

Expanded tool set (12) Enum validation (e.g., save_report) Parameter requirements

Tool Execution — TestExecuteTool
Covers:

Unknown tool handling (error JSON) Successful execution paths Exception handling Return-type normalization

Tool Labels & Definitions
Includes:

TestToolDisplayLabel TestGetToolDefinitions TestToolLabelAudit

Validates:

Label mapping correctness Fallback behavior MCP tool inclusion/exclusion Detection of stale or missing label mappings

⚠️ Highlights known issues:

Missing labels for active tools Stale label dictionary entries

Workflow Dispatch
TestCallStartWorkflow + TestWorkflowEdgeCases

Validates:

Background task handling Workflow ID propagation Parent/child relationships Attachment handling Event summary truncation

Sensitive Field Masking
TestSensitiveFieldMasking + TestMaskingEdgeCases

Ensures:

Proper masking of TIN, bank account, routing numbers Last-4-digit preservation Robust handling of edge cases (nulls, formats, types)

QA Findings (Bug-Exposing Tests) — TestQAFindings
Documents confirmed defects:

⚠️ invoice_id=0 dropped due to falsy check ⚠️ Empty-string and integer TIN bypass masking ⚠️ Stale tool label mappings ⚠️ Missing labels for CoPilot tools

Internationalization — TestInternationalInputs
Validates handling of:

Chinese, Arabic (RTL), Japanese text Emojis and mixed Unicode Currency symbols Whitespace formatting preservation

Injection & Adversarial Inputs — TestInjectionAndAdversarialInputs
Covers resilience against:

Prompt injection SQL injection XSS payloads JSON injection Null bytes and shell characters Extremely long malicious inputs

Boundary & Type Handling — TestBoundaryAndTypeValues
Validates:

Extreme numeric values Large payloads (50k chars) Serialization edge cases Optional field handling

⚠️ Known defects:

Missing validation for vendor_id=None Crashes on description=None Unsafe type forwarding to DB Lack of coercion/validation Related Bug Tickets Bug_186 (#407) Bug_187 (#408) Bug_188 (#409) Bug_189 (#410) Bug_190 (#411) Bug_194 (#415) Bug_195 (#416) Bug_196 (#417) Bug_197 (#418) Bug_198 (#419) Bug_199 (#442) Bug_200 (#443) Bug_201 (#452)

…ests (CD001,GenAI-Security-Project#27, Bug_186-Bug_201) - 177 unit tests covering init, prompts, tool definitions, masking, workflow dispatch, injection resistance, and boundary values - 5 intentionally failing tests document open bugs: Bug_186 GenAI-Security-Project#407, Bug_187 GenAI-Security-Project#408, Bug_188 GenAI-Security-Project#409, Bug_189 GenAI-Security-Project#410, Bug_190 GenAI-Security-Project#411 - Boundary tests document open bugs: Bug_194 GenAI-Security-Project#415, Bug_195 GenAI-Security-Project#416, Bug_196 GenAI-Security-Project#417, Bug_197 GenAI-Security-Project#418, Bug_198 GenAI-Security-Project#419 - Prompt consistency findings: Bug_199 GenAI-Security-Project#442, Bug_200 GenAI-Security-Project#443, Bug_201 GenAI-Security-Project#452

steadhac force-pushed the steadhac/feat/chat-assistant-coverage-tests branch from 35b23c0 to d58b4da Compare April 3, 2026 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add CHAT-ASSIST — VendorChatAssistant & CoPilotAssistant Unit Tests for for Chat Streaming Layer Summary#456

feat: Add CHAT-ASSIST — VendorChatAssistant & CoPilotAssistant Unit Tests for for Chat Streaming Layer Summary#456
steadhac wants to merge 1 commit intoGenAI-Security-Project:mainfrom
steadhac:steadhac/feat/chat-assistant-coverage-tests

steadhac commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steadhac commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant