feat: add smoke tests for CLI integration testing#14
Conversation
- Add smoke tests that verify end-to-end CLI functionality - Test basic CLI operations (--version, --help, error handling) - Test eval command with echo provider (no external dependencies) - Test output formats (JSON, YAML, CSV) - Test CLI flags (--repeat, --max-concurrency, --verbose, --no-cache) - Test exit codes (0 for success, 100 for failures, 1 for errors) - Test assertions (contains, icontains, failing assertions) - Add pytest configuration with 'smoke' marker for selective testing - Add comprehensive README documenting smoke test purpose and usage Total: 20 smoke tests, all passing ✅ Smoke tests run against the installed promptfoo CLI via subprocess, testing the Python wrapper integration with the Node.js CLI. Run smoke tests: pytest tests/smoke/ # Run all smoke tests pytest tests/ -m smoke # Run only smoke-marked tests pytest tests/ -m 'not smoke' # Skip smoke tests (unit tests only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Previously the CI was only testing CLI invocation but not running pytest. Changes: - Install dev dependencies (pytest, mypy, ruff) in test jobs - Run unit tests with: pytest tests/ -v -m 'not smoke' - Run smoke tests with: pytest tests/smoke/ -v - Both 'test' and 'test-npx-fallback' jobs now run full test suite This ensures: ✅ Unit tests run on all platforms (ubuntu, windows) and Python versions (3.9, 3.13) ✅ Smoke tests verify end-to-end CLI functionality ✅ Both global install and npx fallback paths are tested
- Split test_split_path into platform-specific versions (Unix/Windows) - Split test_find_external_promptfoo_prevents_recursion for platform paths - Use platform-appropriate node path in test_main_exits_when_neither_external_nor_npx_available - Tests now skip appropriately on incompatible platforms
The first npx call can be slow as it downloads promptfoo. Increased timeout from 60s to 120s to accommodate this.
Add safety checks for None values from subprocess.run() output, which can occur on Windows in certain error conditions.
Resolved conflict in tests/test_cli.py by keeping platform-appropriate node_path implementation from feature branch.
- Fix line too long (123 > 120) in test_cli.py - Run ruff format on test files - Add tests/smoke/.temp-output/ to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comprehensive testing strategy section with unit vs smoke tests - Document test directory structure - Add smoke test details and commands - Update CI/CD section to mention both test types - Update project structure to include tests directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a comprehensive smoke test suite to verify end-to-end CLI functionality for the Python wrapper. The tests use the echo provider to avoid external dependencies and test the integration between the Python wrapper and the Node.js CLI.
Changes:
- Added 20 smoke tests covering CLI operations, eval command, exit codes, echo provider functionality, and assertions
- Split platform-specific unit tests in
tests/test_cli.pyfor better cross-platform testing - Added pytest configuration with smoke test marker for selective test execution
- Updated CI workflows to run unit tests and smoke tests separately
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_cli.py | Split platform-specific tests for Unix and Windows PATH handling and recursion prevention |
| tests/smoke/test_smoke.py | New comprehensive smoke test suite with 20 tests for CLI integration |
| tests/smoke/fixtures/configs/basic.yaml | Basic test configuration with passing assertions |
| tests/smoke/fixtures/configs/failing-assertion.yaml | Test configuration with intentionally failing assertion |
| tests/smoke/fixtures/configs/assertions.yaml | Test configuration with multiple assertions including case-insensitive matching |
| tests/smoke/init.py | Package initialization for smoke tests |
| tests/smoke/README.md | Comprehensive documentation for smoke test suite |
| pyproject.toml | Added pytest configuration with smoke test marker |
| .gitignore | Added smoke test temporary output directory |
| .github/workflows/test.yml | Updated to run unit tests and smoke tests separately with proper dev dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/smoke/test_smoke.py
Outdated
| def test_version_flag(self): | ||
| """Test --version flag outputs version.""" | ||
| stdout, stderr, exit_code = run_promptfoo(["--version"]) | ||
|
|
||
| assert exit_code == 0 | ||
| # Should output a version number (semver format) | ||
| assert stdout.strip(), "Version output should not be empty" |
There was a problem hiding this comment.
Test method is missing return type annotation. For consistency with the rest of the codebase (see tests/test_cli.py), test methods should include '-> None' return type annotation.
tests/smoke/test_smoke.py
Outdated
| def test_help_flag(self): | ||
| """Test --help flag outputs help.""" | ||
| stdout, stderr, exit_code = run_promptfoo(["--help"]) | ||
|
|
||
| assert exit_code == 0 | ||
| assert "promptfoo" in stdout.lower() | ||
| assert "eval" in stdout.lower() | ||
|
|
||
| def test_eval_help(self): | ||
| """Test 'eval --help' outputs eval command help.""" | ||
| stdout, stderr, exit_code = run_promptfoo(["eval", "--help"]) | ||
|
|
||
| assert exit_code == 0 | ||
| assert "--config" in stdout or "-c" in stdout | ||
| assert "--output" in stdout or "-o" in stdout | ||
|
|
||
| def test_unknown_command(self): | ||
| """Test unknown command returns error.""" | ||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["unknowncommand123"], | ||
| expect_error=True, | ||
| ) | ||
|
|
||
| assert exit_code != 0 | ||
| output = stdout + stderr | ||
| assert "unknown" in output.lower() or "not found" in output.lower() | ||
|
|
||
| def test_missing_config_file(self): | ||
| """Test missing config file returns error.""" | ||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", "nonexistent-config-file.yaml"], | ||
| expect_error=True, | ||
| ) | ||
|
|
||
| assert exit_code != 0 | ||
| output = stdout + stderr | ||
| # Should indicate the file wasn't found | ||
| assert any( | ||
| phrase in output.lower() | ||
| for phrase in [ | ||
| "not found", | ||
| "no such file", | ||
| "does not exist", | ||
| "cannot find", | ||
| "no configuration file", | ||
| ] | ||
| ) | ||
|
|
||
|
|
||
| class TestEvalCommand: | ||
| """Eval command smoke tests.""" | ||
|
|
||
| def test_basic_eval(self): | ||
| """Test basic eval with echo provider.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
| stdout, stderr, exit_code = run_promptfoo(["eval", "-c", str(config_path), "--no-cache"]) | ||
|
|
||
| assert exit_code == 0, f"Eval failed:\nSTDOUT: {stdout}\nSTDERR: {stderr}" | ||
| # Should show evaluation results | ||
| assert "pass" in stdout.lower() or "✓" in stdout or "success" in stdout.lower() | ||
|
|
||
| def test_json_output(self): | ||
| """Test eval outputs valid JSON.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
| output_path = OUTPUT_DIR / "output.json" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "-o", str(output_path), "--no-cache"] | ||
| ) | ||
|
|
||
| assert exit_code == 0, f"Eval failed:\nSTDOUT: {stdout}\nSTDERR: {stderr}" | ||
| assert output_path.exists(), "Output file was not created" | ||
|
|
||
| # Verify it's valid JSON with expected structure | ||
| with open(output_path) as f: | ||
| data = json.load(f) | ||
|
|
||
| assert "results" in data | ||
| assert "results" in data["results"] | ||
| assert isinstance(data["results"]["results"], list) | ||
| assert len(data["results"]["results"]) > 0 | ||
|
|
||
| # Verify echo provider returns the prompt | ||
| first_result = data["results"]["results"][0] | ||
| assert "response" in first_result | ||
| assert "output" in first_result["response"] | ||
| output_text = first_result["response"]["output"] | ||
| assert "Hello" in output_text | ||
| assert "World" in output_text | ||
|
|
||
| def test_yaml_output(self): | ||
| """Test eval outputs YAML format.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
| output_path = OUTPUT_DIR / "output.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "-o", str(output_path), "--no-cache"] | ||
| ) | ||
|
|
||
| assert exit_code == 0 | ||
| assert output_path.exists() | ||
|
|
||
| # Verify it contains YAML-like content | ||
| with open(output_path) as f: | ||
| content = f.read() | ||
|
|
||
| assert "results:" in content | ||
|
|
||
| def test_csv_output(self): | ||
| """Test eval outputs CSV format.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
| output_path = OUTPUT_DIR / "output.csv" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "-o", str(output_path), "--no-cache"] | ||
| ) | ||
|
|
||
| assert exit_code == 0 | ||
| assert output_path.exists() | ||
|
|
||
| # Verify it's CSV format (has header row with columns) | ||
| with open(output_path) as f: | ||
| content = f.read() | ||
|
|
||
| lines = content.strip().split("\n") | ||
| assert len(lines) > 0 | ||
| # CSV should have comma-separated values | ||
| assert "," in lines[0] | ||
|
|
||
| def test_max_concurrency_flag(self): | ||
| """Test --max-concurrency flag.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "--max-concurrency", "1", "--no-cache"] | ||
| ) | ||
|
|
||
| assert exit_code == 0 | ||
|
|
||
| def test_repeat_flag(self): | ||
| """Test --repeat flag runs tests multiple times.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
| output_path = OUTPUT_DIR / "repeat-output.json" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| [ | ||
| "eval", | ||
| "-c", | ||
| str(config_path), | ||
| "--repeat", | ||
| "2", | ||
| "-o", | ||
| str(output_path), | ||
| "--no-cache", | ||
| ] | ||
| ) | ||
|
|
||
| assert exit_code == 0 | ||
|
|
||
| # Verify we got repeated results | ||
| with open(output_path) as f: | ||
| data = json.load(f) | ||
|
|
||
| # With repeat=2 and 1 test case, we should have 2 results | ||
| assert len(data["results"]["results"]) == 2 | ||
|
|
||
| def test_verbose_flag(self): | ||
| """Test --verbose flag.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo(["eval", "-c", str(config_path), "--verbose", "--no-cache"]) | ||
|
|
||
| assert exit_code == 0 | ||
| # Verbose mode should produce output | ||
| assert len(stdout) > 0 or len(stderr) > 0 |
There was a problem hiding this comment.
Test methods in this class are missing return type annotations. For consistency with the rest of the codebase (see tests/test_cli.py), test methods should include '-> None' return type annotation.
tests/smoke/test_smoke.py
Outdated
| def test_success_exit_code(self): | ||
| """Test exit code 0 when all assertions pass.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo(["eval", "-c", str(config_path), "--no-cache"]) | ||
|
|
||
| assert exit_code == 0 | ||
|
|
||
| def test_failure_exit_code(self): | ||
| """Test exit code 100 when assertions fail.""" | ||
| config_path = CONFIGS_DIR / "failing-assertion.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "--no-cache"], | ||
| expect_error=True, | ||
| ) | ||
|
|
||
| # Exit code 100 indicates test failures | ||
| assert exit_code == 100, f"Expected exit code 100, got {exit_code}" | ||
|
|
||
| def test_config_error_exit_code(self): | ||
| """Test exit code 1 for config errors.""" | ||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", "nonexistent-file.yaml", "--no-cache"], | ||
| expect_error=True, | ||
| ) | ||
|
|
||
| assert exit_code == 1 |
There was a problem hiding this comment.
Test methods in this class are missing return type annotations. For consistency with the rest of the codebase (see tests/test_cli.py), test methods should include '-> None' return type annotation.
tests/smoke/test_smoke.py
Outdated
| def test_echo_provider_basic(self): | ||
| """Test echo provider returns the prompt.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
| output_path = OUTPUT_DIR / "echo-test.json" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "-o", str(output_path), "--no-cache"] | ||
| ) | ||
|
|
||
| assert exit_code == 0 | ||
|
|
||
| # Verify echo provider returns the prompt | ||
| with open(output_path) as f: | ||
| data = json.load(f) | ||
|
|
||
| first_result = data["results"]["results"][0] | ||
|
|
||
| # Echo provider should return the prompt in the response | ||
| output = first_result["response"]["output"] | ||
| assert "Hello" in output | ||
| assert "World" in output | ||
|
|
||
| def test_echo_provider_with_multiple_vars(self): | ||
| """Test echo provider with multiple variables.""" | ||
| config_path = CONFIGS_DIR / "assertions.yaml" | ||
| output_path = OUTPUT_DIR / "echo-multi-var.json" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "-o", str(output_path), "--no-cache"] | ||
| ) | ||
|
|
||
| assert exit_code == 0 | ||
|
|
||
| with open(output_path) as f: | ||
| data = json.load(f) | ||
|
|
||
| first_result = data["results"]["results"][0] | ||
| output = first_result["response"]["output"] | ||
|
|
||
| # Should contain all variable values | ||
| assert "Alice" in output | ||
| assert "Wonderland" in output |
There was a problem hiding this comment.
Test methods in this class are missing return type annotations. For consistency with the rest of the codebase (see tests/test_cli.py), test methods should include '-> None' return type annotation.
tests/smoke/test_smoke.py
Outdated
| def test_contains_assertion(self): | ||
| """Test contains assertion.""" | ||
| config_path = CONFIGS_DIR / "basic.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo(["eval", "-c", str(config_path), "--no-cache"]) | ||
|
|
||
| assert exit_code == 0 | ||
| # All assertions should pass | ||
| assert "pass" in stdout.lower() or "✓" in stdout or "success" in stdout.lower() | ||
|
|
||
| def test_multiple_assertions(self): | ||
| """Test multiple assertions in single test.""" | ||
| config_path = CONFIGS_DIR / "assertions.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo(["eval", "-c", str(config_path), "--no-cache"]) | ||
|
|
||
| assert exit_code == 0 | ||
|
|
||
| def test_failing_assertion(self): | ||
| """Test failing assertion.""" | ||
| config_path = CONFIGS_DIR / "failing-assertion.yaml" | ||
|
|
||
| stdout, stderr, exit_code = run_promptfoo( | ||
| ["eval", "-c", str(config_path), "--no-cache"], | ||
| expect_error=True, | ||
| ) | ||
|
|
||
| # Should fail with exit code 100 | ||
| assert exit_code == 100 | ||
| output = stdout + stderr | ||
| # Should indicate failure | ||
| assert "fail" in output.lower() or "✗" in output or "error" in output.lower() |
There was a problem hiding this comment.
Test methods in this class are missing return type annotations. For consistency with the rest of the codebase (see tests/test_cli.py), test methods should include '-> None' return type annotation.
tests/smoke/test_smoke.py
Outdated
| These tests verify the core evaluation pipeline works correctly | ||
| using the echo provider (no external API dependencies). | ||
|
|
||
| These tests run against the installed promptfoo package via npx, |
There was a problem hiding this comment.
The comment states tests run "via npx" but they actually run via the Python wrapper which may use either a globally installed promptfoo or fall back to npx. Consider updating to "via the Python wrapper (using either global promptfoo or npx)" for accuracy.
| These tests run against the installed promptfoo package via npx, | |
| These tests run against the installed promptfoo package via the Python wrapper | |
| (using either a globally installed promptfoo CLI or falling back to npx), |
tests/smoke/README.md
Outdated
|
|
||
| Smoke tests are high-level integration tests that verify the most critical functionality works end-to-end. They: | ||
|
|
||
| - Run against the actual installed CLI (via `npx promptfoo`) |
There was a problem hiding this comment.
The documentation states tests run "via npx promptfoo" but they actually run via the Python wrapper which may use either a globally installed promptfoo or fall back to npx. Consider updating to "via the Python wrapper (using either global promptfoo or npx)" for accuracy.
| - Run against the actual installed CLI (via `npx promptfoo`) | |
| - Run against the actual installed CLI via the Python wrapper (using either global promptfoo or npx) |
- Add `-> None` return type annotations to all smoke test methods - Add Generator return type to setup_and_teardown fixture - Update documentation to clarify tests run via Python wrapper (not just npx) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add os.path.isfile mock to unit test to prevent _find_windows_promptfoo() from finding real promptfoo installations on Windows CI runners - Add UTF-8 encoding with error replacement to smoke tests to handle Windows cp1252 encoding issues with npx output - Add warmup_npx fixture to pre-download promptfoo via npx before tests, preventing timeout on first test when npx needs to download package Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add record_wrapper_used mock to tests that mock subprocess.run to prevent PostHog telemetry calls from interfering with mock call counts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add smoke tests for CLI integration testing - Add smoke tests that verify end-to-end CLI functionality - Test basic CLI operations (--version, --help, error handling) - Test eval command with echo provider (no external dependencies) - Test output formats (JSON, YAML, CSV) - Test CLI flags (--repeat, --max-concurrency, --verbose, --no-cache) - Test exit codes (0 for success, 100 for failures, 1 for errors) - Test assertions (contains, icontains, failing assertions) - Add pytest configuration with 'smoke' marker for selective testing - Add comprehensive README documenting smoke test purpose and usage Total: 20 smoke tests, all passing ✅ Smoke tests run against the installed promptfoo CLI via subprocess, testing the Python wrapper integration with the Node.js CLI. Run smoke tests: pytest tests/smoke/ # Run all smoke tests pytest tests/ -m smoke # Run only smoke-marked tests pytest tests/ -m 'not smoke' # Skip smoke tests (unit tests only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: run unit tests and smoke tests in CI Previously the CI was only testing CLI invocation but not running pytest. Changes: - Install dev dependencies (pytest, mypy, ruff) in test jobs - Run unit tests with: pytest tests/ -v -m 'not smoke' - Run smoke tests with: pytest tests/smoke/ -v - Both 'test' and 'test-npx-fallback' jobs now run full test suite This ensures: ✅ Unit tests run on all platforms (ubuntu, windows) and Python versions (3.9, 3.13) ✅ Smoke tests verify end-to-end CLI functionality ✅ Both global install and npx fallback paths are tested * fix: use Optional for Python 3.9 compatibility in smoke tests * fix: make platform-specific tests work on both Unix and Windows - Split test_split_path into platform-specific versions (Unix/Windows) - Split test_find_external_promptfoo_prevents_recursion for platform paths - Use platform-appropriate node path in test_main_exits_when_neither_external_nor_npx_available - Tests now skip appropriately on incompatible platforms * fix: increase smoke test timeout for npx fallback scenarios The first npx call can be slow as it downloads promptfoo. Increased timeout from 60s to 120s to accommodate this. * fix: handle None stdout/stderr in smoke tests Add safety checks for None values from subprocess.run() output, which can occur on Windows in certain error conditions. * fix: address linting issues and add temp output to gitignore - Fix line too long (123 > 120) in test_cli.py - Run ruff format on test files - Add tests/smoke/.temp-output/ to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update AGENTS.md with smoke test documentation - Add comprehensive testing strategy section with unit vs smoke tests - Document test directory structure - Add smoke test details and commands - Update CI/CD section to mention both test types - Update project structure to include tests directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: add return type annotations and fix documentation wording - Add `-> None` return type annotations to all smoke test methods - Add Generator return type to setup_and_teardown fixture - Update documentation to clarify tests run via Python wrapper (not just npx) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve Windows CI test failures - Add os.path.isfile mock to unit test to prevent _find_windows_promptfoo() from finding real promptfoo installations on Windows CI runners - Add UTF-8 encoding with error replacement to smoke tests to handle Windows cp1252 encoding issues with npx output - Add warmup_npx fixture to pre-download promptfoo via npx before tests, preventing timeout on first test when npx needs to download package Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: mock telemetry in CLI unit tests Add record_wrapper_used mock to tests that mock subprocess.run to prevent PostHog telemetry calls from interfering with mock call counts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary
This PR adds comprehensive smoke tests to verify end-to-end CLI functionality using the echo provider.
Smoke tests are high-level integration tests that run against the actual installed promptfoo CLI via subprocess, testing the Python wrapper integration with the Node.js CLI.
What's Added
Smoke Test Suite ()
20 smoke tests covering:
Basic CLI Operations (5 tests)
Eval Command (7 tests)
--max-concurrency--repeat--verbose--no-cacheExit Codes (3 tests)
Echo Provider (2 tests)
Assertions (3 tests)
containsassertionicontainsassertion (case-insensitive)Test Fixtures
fixtures/configs/basic.yaml- Simple test with passing assertionsfixtures/configs/failing-assertion.yaml- Test with failing assertionfixtures/configs/assertions.yaml- Multiple assertions testConfiguration
smokemarker for selective testingpyproject.tomlwith pytest optionsWhy Echo Provider?
The echo provider is perfect for smoke tests because:
Running Smoke Tests
Test Results
All 20 smoke tests pass ✅
Notes
Inspired By
These smoke tests are inspired by the Node.js promptfoo project's smoke test suite in
test/smoke/, adapted for the Python wrapper with similar structure and coverage.🤖 Generated with Claude Code