Write tests as if you have zero knowledge of the implementation.
Tests that validate code output against code output catch nothing.
Example that missed production bugs:
# ❌ WRONG - Test written by looking at code
def test_list_formats():
result = list_creative_formats()
assert "formats" in result # Passes even with bugs!Read the ADCP spec first. Use generated Pydantic schemas. Validate every response.
# ✅ CORRECT - Test written by reading spec
def test_list_formats():
from schemas_generated import ListCreativeFormatsResponse
result_json = list_creative_formats()
result = json.loads(result_json) # Catches double-encoding
ListCreativeFormatsResponse.model_validate(result) # Validates ALL fields- Read spec/schema FIRST - Never look at implementation
- Import generated Pydantic model - From
schemas_generated/ - Call tool as client would - Test public API, get JSON string
- Parse JSON once -
json.loads(result)catches encoding bugs - Validate with Pydantic -
.model_validate()catches all schema violations
- Double-encoding:
'{"result": "{...}"}'→json.loads()fails or wrong structure - Missing required fields: Pydantic raises ValidationError
- Wrong field types: Pydantic raises ValidationError
- Extra fields not in spec: Pydantic raises ValidationError (when
extra="forbid") - Invalid values: Constraints like
ge=0,pattern=...enforced
Real bugs caught by spec-first testing that code-first tests missed:
- Missing
preview_id- Required per spec, not returned - Missing
rendersarray - Required per spec, not returned - Extra
adcp_version- Not in spec, added by mistake
Old test EXPECTED the bug:
def test_preview():
result = json.loads(preview_creative(...))
assert "adcp_version" in result # Test validates the bug!New test CAUGHT the bug:
def test_preview():
result = json.loads(preview_creative(...))
PreviewCreativeResponse.model_validate(result) # ValidationError: extra field!- ❌ Look at code before writing test
- ❌ Compare output to output:
assert result == expected_from_code - ❌ Trust variable names or comments
- ❌ Test internal types instead of wire format
- ❌ Mock everything (hides serialization bugs)
- ✅ Read spec first
- ✅ Use generated Pydantic schemas
- ✅ Call public API (tools/endpoints)
- ✅ Parse JSON explicitly
- ✅ Validate with
.model_validate() - ✅ Test error cases per spec
Every response must:
- Be valid JSON (single parse, no double-encoding)
- Match published schema exactly (Pydantic validates)
- Have no extra fields (unless spec allows)
- Have all required fields (Pydantic enforces)
- Use correct types (Pydantic enforces)
If your test would pass with broken code, it's not a good test.