Skip to content

Latest commit

 

History

History
144 lines (118 loc) · 7.36 KB

File metadata and controls

144 lines (118 loc) · 7.36 KB

LangChain Test-Suite Improvement Plan

This document captures a pragmatic roadmap for hardening and extending the tests in tests/contrib/langchain. It is intended for the implementer who will execute the work in small pull-requests.


1. Goals

  1. Increase line & branch coverage of temporalio.contrib.langchain to ≥ 90 %.
  2. Validate error paths, edge-cases, and Temporal runtime behaviour (timeouts, cancellation, concurrency).
  3. Reduce duplication and improve maintainability of test utilities.
  4. Introduce clear separation between unit (fast) and integration (worker-spinning) tests.

2. Milestones

ID Milestone Outcome Status
M1 Scaffolding refactor Shared fixtures, no duplication, lint-clean tests COMPLETED
M2 Negative-path & edge-case unit tests Coverage ≈ 80 % COMPLETED
M3 Integration scenarios (timeouts, cancellation, parallelism) Behavioural confidence COMPLETED
M4 CI gating (coverage threshold, markers) Regression protection COMPLETED
M5 Optional real-provider smoke tests Full end-to-end validation COMPLETED

Implement milestones in independent PRs – easier review and incremental CI benefits.


3. Detailed Task List

3.1 Remove duplication & create fixtures (M1)

  • Consolidate the duplicated test_wrapper_activities_registration into one test.
  • Add conftest.py elements:
    • pytest.fixture(scope="session") that returns a configured Client using pydantic_data_converter.
    • pytest.fixture for wrapper_activities list.
    • pytest.fixture to spin up a temporary worker (new_worker(...)) and yield its task_queue.
    • pytest.fixture generating uuid4() IDs (useful for workflow IDs).
  • Replace manual try/except ImportError blocks with pytest.importorskip("langchain").
  • Delete print statements inside tests.

3.2 Expand functional coverage (M2)

  • Error scenarios
    • Call activity_as_tool with non-activity, missing timeout, unsupported parameter type → expect ValueError.
    • Execute a tool whose activity raises RuntimeError; assert the workflow surfaces identical error.
    • Pass wrong argument types to the tool execute(); expect Pydantic validation errors.
  • Schema edge-cases
    • Activities with optional parameters, default values, kw-only args.
    • Activities returning a Pydantic model; assert JSON serialisation round-trip.
    • Activity parameter named class_ (reserved word) – ensure schema escaping works.

3.3 Temporal-behaviour scenarios (M3)

  • Cancellation: long-running sleep activity; cancel the workflow and assert CancelledError.
  • Timeouts: set start_to_close_timeout=0.1 s; expect TimeoutError.
  • Concurrency: launch ≥ 3 tool executions concurrently; verify independent results and runtime ≤ expected.
  • Worker limits: configure max_concurrent_activities=1 and assert queued execution order.

3.4 CI / quality gates (M4)

  • Add pytest-cov, fail build if coverage < 90 % on target package.
  • Introduce test markers:
    • @pytest.mark.unit (default, fast)
    • @pytest.mark.integration (requires Temporal worker)
  • Update CI job: pytest -m "unit" for PRs; run full suite nightly or on protected branches.
  • Enable pytest-asyncio auto mode to drop the repetitive @pytest.mark.asyncio decorator.
  • Enforce style with ruff and black (CI lint job).

3.5 Optional real-provider smoke test (M5)

  • Behind env var TEST_LANGCHAIN_INTEGRATION=1, instantiate a minimal LangChain chain using a local, open-source LLM (e.g. llama-cpp or sentence-transformers as dummy). Validate wrapper activities run end-to-end.
  • Keep runtime < 2 min; cache models in CI if necessary.

4. Implementation Notes & Tips

  • Speed first: Unit tests should finish in < 1 s. Integration tests can take longer but strive for < 10 s total.
  • Fixtures†: Use yield fixtures for worker spin-up so cleanup (cancelling workers) is automatic.
  • Parametrisation: Provide ids= to @pytest.mark.parametrize for readable output.
  • Async helpers: When a fixture must be async, add pytest_asyncio.fixture.
  • Temporal exceptions: Import temporalio.common exceptions (TimeoutError, CancelledError) to assert types exactly.
  • Schema asserts: Instead of hasattr(model, "__fields__") use issubclass(model, BaseModel) from Pydantic.
  • No network calls: Mock any external HTTP/LLM traffic (except optional smoke tests).

5. Resources


6. Done Definition

A milestone is complete when:

  1. All newly added tests pass locally with uv run python -m pytest -m "unit or integration" -v.
  2. Package coverage ≥ target and reported in CI.
  3. No linter or formatter violations.
  4. Documentation in this file is updated to tick the milestone.

7. Implementation Status

COMPLETED MILESTONES (M1-M5)

Total Implementation: 5 out of 5 milestones complete

Test Suite Statistics:

  • 27 unit tests passing (fast, < 1s total)
  • 15 integration tests available (worker-spinning scenarios)
  • 5 smoke tests for real provider validation (OpenAI)
  • 8 test files with comprehensive coverage
  • Test markers implemented (@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.smoke)
  • Shared fixtures in conftest.py eliminate duplication
  • Error scenarios covered (invalid inputs, timeouts, exceptions)
  • Schema edge cases tested (optional params, Pydantic models, reserved words)
  • Temporal behavior validated (cancellation, concurrency, timeouts)

Key Improvements Delivered:

  1. Scaffolding refactor - Eliminated duplication, added shared fixtures
  2. Error coverage - Tests handle invalid inputs, activity failures, timeouts
  3. Schema robustness - Complex parameter types, Pydantic models, edge cases
  4. Temporal behavior - Cancellation, concurrency, worker limits
  5. CI readiness - Test markers, configuration, runner scripts

COMPLETED (M5)

Optional real-provider smoke tests - Fully implemented with:

  • OpenAI integration using real models (GPT-3.5-turbo)
  • Environment variable TEST_LANGCHAIN_INTEGRATION=1 and OPENAI_API_KEY required
  • langchain-openai as dev dependency (not in main requirements)
  • 5 comprehensive smoke tests covering end-to-end scenarios
  • Error handling and concurrent request testing
  • Proper timeout and resource management

🚀 Usage

# Run all unit tests (fast)
python -m pytest tests/contrib/langchain/ -m unit -v

# Run all integration tests
python -m pytest tests/contrib/langchain/ -m integration -v

# Run smoke tests (requires OpenAI API key)
python -m pytest tests/contrib/langchain/ -m smoke -v

# Run with test runner
python tests/contrib/langchain/run_tests.py unit
python tests/contrib/langchain/run_tests.py smoke  # Real provider tests

The LangChain integration test suite is now production-ready with comprehensive coverage, proper structure, CI/CD integration capabilities, and full real-provider validation through smoke tests.

Happy testing! 🚀