All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
labeille tsan-runcommand for running extension test suites under ThreadSanitizer-enabled free-threaded Python. Captures data race reports for use with ft-review-toolkit'stsan-report-analyzeragent. Includes--test-scriptfor running custom concurrent stress tests instead of the package's test suite,--quickmode for CI,--stress Nfor repeated runs, and bundled CPython TSan suppressions.labeille cext-buildcommand for generatingcompile_commands.jsoncompilation databases from C extension packages. Uses Bear to intercept compiler invocations during the build, with automatic fallback to build-system-specific mechanisms for Meson and CMake projects.CextBuildConfig,CextBuildResult,CextBuildMetadataclasses incext_build.py.detect_bear()for auto-detecting bear installation and version.detect_build_system()for identifying the build system from project files (meson, cmake, setuptools, flit, hatch, pdm).extract_build_requires()for reading[build-system].requiresfrompyproject.tomlto support--no-build-isolationbuilds.find_compile_db()for searching repo and build directories for generated compilation databases.postprocess_compile_db()for fixing source file paths in generated compilation databases.- Output includes per-package
compile_commands.json, repo symlinks, build logs, and JSONL result summaries. - Designed for integration with cext-review-toolkit Tier 2 analysis (clang-tidy).
- Extract
_build_run_configfromrun_cmdincli.pyto separate config validation/construction from CLI execution. - Extract
_md_tablehelper fromexport_registry_report_mdinanalyze_cli.pyto deduplicate 8 markdown table constructions. - Extract
_prepare_source,_classify_install_result,_check_import_resultfrom_survey_packageincompat.py, reducing complexity from 23 branches to ~8 per function. - Split
bench_cli.py:compareinto_compare_intra_run,_compare_cross_run, and_report_anomalies, eliminating CC=62. - Extract
_collect_package_data,_compute_trends,_classify_trends,_build_median_dictsfromanalyze_series_trendsinbench/trends.py(CC=48 → ~10 per helper). - Extract
_infer_field_typefrombatch_set_fieldinregistry_ops.pyto deduplicate type inference. - Type
FTRunMeta.system_profileandpython_profileasSystemProfileandPythonProfileinstead ofdict[str, Any]. - Extract
_validate_package_filefromvalidate_registryinregistry_ops.py(CC=59 → ~15 per function). - Extract
SEVERITY_LABELSconstant from 4 inline duplications inbench/display.pyandbench/export.py. - Extract
_make_anomalyhelper inbench/anomaly.pyto deduplicate 9 identicalPackageAnomalyconstructions indetect_condition_anomalies. bench runandft runnow use sharedsetup_logging()for consistent log formatting.- Simplify
_run_package_innerinrunner.pyby extracting 4 helper functions:_align_sdist_version,_setup_venv,_install_in_venv,_check_import. - Split
_check_import_and_extrasinto_check_importand_install_extra_depsfor single-responsibility. - Rename
compat difftocompat comparefor CLI vocabulary consistency. - Extract
parse_env_pairs()andparse_csv_list()CLI helpers to eliminate duplicated parsing acrosscli.py,registry_cli.py,compat_cli.py,bench_cli.py, andft_cli.py. - Decompose
generate_registry_report(190 lines) into focused helpers:_accumulate_package_stats,_analyze_compat_blockers,_analyze_per_version,_analyze_download_tiers. - Decompose
_print_run_summary(180 lines) into 7 section formatters for readability. - Unify run ID generation with
generate_run_id()inio_utils.py— all subsystems now use UTC with consistent format. - Extract
run_in_process_group()intoio_utils.pyas a shared subprocess lifecycle utility, used by bothrunner.pyandbench/timing.py. bench/results.pyappend_package_resultnow uses sharedappend_jsonlinstead of rawopen().- Replace
RepoHostStats,InstallComplexity,CompatBlockersdataclasses withdict[str, int]counters onRegistryReport, eliminatinggetattr/setattrusage. - Replace
Anytype annotations with concrete types inft/export.py,ft/compare.py,ft/display.py. - Simplify
_build_bench_configby splitting into_build_base_configand_apply_config_overrides, eliminatinglocals()dict unpacking via@click.pass_context. - Use
clean_env()inbench/system.pyinstead of inline env dict construction for pattern consistency. - Use
append_jsonl()inmigrations.pyinstead of rawopen()/json.dumpsfor pattern consistency.
- Fix
bench compareClick decorator chain pointing at wrong function after complexity split, restoring multi-directory comparison and--metricoption. - Log PyPI metadata extraction failures in
ft/runner.pyinstead of silently swallowing(KeyError, TypeError). - Wrap
load_packageinfilter_packageswith try/except to skip bad YAML instead of crashing the entire batch run. - Add logging to
bisect.pygit helpers (_resolve_rev,_get_commit_info) on failure. - Add
OSErrorhandler to_check_import_resultincompat.pyfor missing venv python. - Use
dataclass_from_dictforErrorMatchdeserialization incompat.pyfor forward compatibility. - Remove stale
RunOutput as RunOutputre-export fromrunner.py. - Narrow
FieldFilter.optoLiteraltype inregistry_ops.py. - Wrap
checkout_matching_taggit fetch --tags in try/except to prevent unhandledTimeoutExpired/OSErrorcrashes. - Make
load_ft_runlenient for missingft_results.jsonl(returns empty list likeload_bench_run). - Add missing
installer_backendfield toanalyze.PackageResultto matchrunner_models.PackageResult. - Narrow
IndexEntry.extension_typefromstrtoLiteral["pure", "extensions", "unknown"]. - Narrow
PackageResult.install_fromandinstaller_backendfromstrtoLiteraltypes. - Fix
--no-shallow/--clone-depth=0to produce actual full clones (no--depthflag) instead of silently defaulting to depth=1. - Add warning logs to 4 silent return-None paths in
repo_ops.py(clone_repo,pull_repo,checkout_revision,fetch_latest_pypi_version). - Promote
pull_repogit reset/clean failure logs from DEBUG to WARNING for visibility. - Log warning for corrupted
run_meta.jsoninanalyze.pyinstead of silently returning empty metadata. - Add
ignore_errors=Truetoshutil.rmtreeinbench/runner.pyvenv refresh to prevent crashes on permission errors. - Standardize exception chaining from
from Nonetofrom excin 3bench_cli.pyload_serieshandlers. - Type
FTPackageResult.extension_compatasExtensionCompat | Noneinstead ofdict[str, Any] | None, eliminating untyped dict access across ft subsystem. - Replace
_setup_packagereturn typedict[str, Any]with typed_PackageSetupdataclass inbench/runner.py. - Log
ValueError/OSErrorinft/runner.pystderr and stdout reader threads instead of silently ignoring. - Remove thin
extract_minor_versionwrapper fromanalyze.py; callers now useio_utils.extract_minor_versiondirectly. - Replace
ProgressCallback = AnywithCallable[[BenchProgress], None] | Noneinbench/runner.py. - Replace
list[Any]withlist[PackageEntry]inbench/runner.py_run_sequentialand_run_interleaved. - Replace
cond: AnywithBenchConditionResultinbench/compare.py_collect_call_durations. - Replace
Anyreturn/param types withBenchConfiginbench_cli.py_build_base_configand_apply_config_overrides. - Add
Literaltypes forBisectStep.status,BisectConfig.installer,PackageAnomaly.anomaly_type/severity,ValidationError.severity,BenchConfig.installer, andanalyze.PackageResult.status. - Log warning in
runner.py:list_installed_packageswhen pip exits with non-zero code instead of silently returning empty. - Raise
RegistrySchemaErrorincheck_registry_schemawhenschema.yamlexists but is corrupt instead of silently ignoring. - Set
import_ok = Falseinft/runner.pywhen extension compat check raises instead of leaving defaultTrue. - Remove dead
hasattr(result, "returncode")guard inbench/runner.py—install_with_fallbackalways returnsCompletedProcess. - Use
parse_env_pairsinft_cli.pyandbench_cli.pyinstead of inline env parsing that silently drops malformed pairs. - Use
load_jsonlinmigrations.py:read_migration_loginstead of hand-rolled JSONL parsing. - Add missing docstrings to 8
to_dictmethods inft/analysis.py,ft/compare.py,ft/compat.py. - Replace 21
assertTrue(len(...))withassertGreater/assertGreaterEqualin tests for better failure messages. - Convert
mkdtemp()toTemporaryDirectory()intest_ft_compat.pyfor consistency. - Type
pkg: Anyparameters asPackageEntryinbench/runner.pyandft/runner.pyfor type safety. - Fix
IndexEntry.packageattribute access (should be.name) inft/runner.py:_select_packages. - Add missing
encoding="utf-8"tobench_cli.pyexportwrite_text()call. - Import check
OSErrornow setsinstall_errorstatus instead of silently continuing. - Standardize exception variable naming from
as e:toas exc:inft/compat.py,registry_ops.py, andregistry_cli.py(5 instances). - Raise compat survey clone/build failure log levels from
debugtowarningfor visibility. - Rename
from_dictparameterd→datainCompatResultandCompatMetafor consistency. - Bisect extra deps install failure now returns
skipstep instead of silently continuing. - Narrow
except Exceptionto specific types inanalyze_cli.pyandft/compat.py. - Add warning logs for silent failures in
runner.py(JIT check),scan_deps.py(dir scan), andbisect.py(deps install). - Extension probe script now reports
walk_errorandskipped_modulesinstead of silently passing. - Guard all
yaml.safe_loadcall sites againstYAMLError: addsafe_load_yamlutility toio_utils.py, protectregistry.py,registry_ops.py,analyze_cli.py,migrations.py, andbench/config.py. - Add 120-second timeout to git subprocess calls in
registry syncto prevent indefinite hangs. - Add
ignore_errors=Truetoshutil.rmtreein venv refresh to prevent cleanup failures masking results. - Log
PermissionErrorat WARNING inkill_process_groupinstead of silently ignoring. - Add
from Noneto exception chains inbench_cli.pyfor cleaner tracebacks. - Include exception details in connection error log in
resolve.py. - Guard rename in
shield_source_dirfinally block against missing source file. - Add
load_json_fileutility toio_utils.pyand guard 5 unprotectedjson.loadscall sites againstJSONDecodeError. - Add
JSONDecodeErrorto exception handling inbench/system.pyPython profile probe. - Add
KeyErrorhandling for missing probe output fields inft/compat.py. - Include exception details in schema.yaml parse warning in
registry.py. - Catch
ValueErrorfrom malformedtracking.jsoninbench_cli.pytrack commands. - Surface skipped-package counts in
ft/runner.py,bench/runner.py, andbench/tracking.pyso users know when results are incomplete. - Extract
dataclass_from_dictutility toio_utils.pyand deduplicate 13 identicalfrom_dictimplementations acrossbench/andft/modules. - Use
atomic_write_textforft/results.pyJSONL output andrunner.pysummary file to prevent corruption on interruption. - Remove 3 unnecessary private re-exports (
_EXTRAS_RE,_SELF_INSTALL_RE,_TAG_PATTERNS) fromrunner.py. - Move
dataclass_from_dictimports from method bodies to module level across 9 files. - Deduplicate
extract_minor_version: move canonical implementation toio_utils.py, delegate fromrunner.pyandanalyze.py. - Remove deprecated
extract_python_minor_versionwrapper fromrunner.py; callers incli.pyandcompat_cli.pynow useio_utils.extract_minor_versiondirectly. - Delegate
format_signal_nameinformatting.pytocrash.signal_nameinstead of duplicating signal conversion logic. - Replace inline JSON write in
bench/tracking.pywithwrite_meta_json. - Guard
_read_linescall sites inregistry_ops.pyagainstOSError/UnicodeDecodeErrorto prevent batch operations from crashing on corrupt files. - Promote
list_installed_packageserror log frominfotowarninginrunner.py. - Promote
iter_jsonlper-line error log fromdebugtowarninginio_utils.py. - Guard build log
write_textincompat.pyagainstOSErrorto prevent filesystem errors from aborting runs. - Catch
CalledProcessErrorandTimeoutExpiredfrom git clone/fetch inbisect.pyto provide clear error messages. - Include exception details in malformed series warning in
bench/tracking.py. - Replace
getattrwith direct attribute access on typed dataclass objects inanalyze.py,summary.py,bench/runner.py, andft/runner.py. - Migrate
registry_cli.pyfromsys.exit(1)toraise click.ClickException(...)for validation errors and sync failures, matching the rest of the CLI surface. - Extract
_HOST_LABELSand_INSTALL_LABELSto module-level constants inanalyze_cli.py, removing duplicates between terminal and markdown formatters. - Migrate
RunMeta.from_dictandPackageResult.from_dictinanalyze.pyto usedataclass_from_dictutility. - Use
safe_load_yamlinmigrations.pyinstead of duplicating the YAML load-and-validate pattern. - Wire up
setup_loggingfor--verboseflags inanalyze registry,analyze run, andregistry synccommands that previously accepted but ignored the flag. - Add
load_yaml_strictutility toio_utils.pyand migrate 4 inline YAML load-and-validate patterns inregistry.py,compat.py, andbench/config.py. - Use
atomic_write_textforbench/results.pyJSONL batch write and remove unusedto_jsonl_line/from_jsonl_linemethods. - Use
dataclass_from_dictinBenchIteration.from_dictinstead of inline field-filtering reimplementation. - Use
load_json_fileinbench/tracking.pyload_seriesinstead of inline JSON parsing. - Add
exc_info=Trueto 14log.error()calls insideexceptblocks acrossrunner.py,resolve.py,bench/runner.py, andft/runner.pyto preserve tracebacks for debugging. - Explicitly set
import_ok = Falseinft/compat.pyJSONDecodeError,KeyError, andOSErrorhandlers for robustness.
- Add missing
help=text to 21 CLI options acrosscli.py,analyze_cli.py,ft_cli.py,compat_cli.py, andbench_cli.py.
- Add 11 tests for compat survey execution pipeline:
_prepare_source,_classify_install_result,_check_import_result. - Add 5 tests for
run_ftorchestrator and_select_packages(was 0% coverage). - Add 3 tests for
_survey_packageintegration (build_ok, build_fail, timeout paths). - Add 3 tests for
_align_sdist_version(source mode, sdist with tag, sdist without tag). - Add 26 tests for
bench tracksubcommands: init, add, show, pin, unpin, list, trend, alert. - Add CLI tests for
registrysubcommands: rename-field, set-field (--all, --where, --packages), validate (--packages filter), migrate (list, unknown, missing name), sync (clone, pull, failure, non-git), add-index-field, remove-index-field, and group help. - Add
test_cli_utils.pywith 14 tests forparse_env_pairsandparse_csv_list. - Add 12 behavioral tests for
ft compare,ft report, andft exportCLI commands. - Add
test_repo_ops.pywith 31 tests forclone_repo,pull_repo,checkout_revision,parse_package_specs, andparse_repo_overrides. - Add 35 tests to
test_io_utils.pyforload_yaml_strict,iter_jsonl,load_jsonl,append_jsonl,dataclass_from_dict,extract_minor_version,generate_run_id, andwrite_meta_json. - Add 11 tests for
kill_process_groupandrun_in_process_groupcovering process group kills, timeout handling, and fallback toproc.kill(). - Add
test_cli.pywith 21 tests forresolve,run,bisect, andscan-depsCLI commands covering parameter validation, error paths, and output formatting.
- Update CLAUDE.md: fix stale test count (546 → 2068) and add 13 missing modules to architecture section.
labeille analyze registrynow generates a comprehensive three-tier report: summary (default), detailed (--detail), and verbose (--detail --verbose).--export-markdownflag foranalyze registrygenerates a Markdown document suitable for inclusion in a repository.-o/--outputflag to write report output to a file.--python-versionis now repeatable for multi-version analysis.- New report sections: enrichment progress, per-version readiness, compatibility blockers (PyO3, Cython, Meson, CMake, Fortran, removed C API), repository hosting distribution, install command complexity, and download tier coverage.
RegistryReportdataclass with sub-reports:EnrichmentProgress,VersionAnalysis,RepoHostStats,InstallComplexity,CompatBlockers,DownloadTierCoverage.generate_registry_report()function for comprehensive registry analysis in a single pass.- Helper classifiers:
_classify_repo_host(),_classify_install_complexity(),_classify_compat_blocker(). labeille registry synccommand to clone or update the laruche registry into the default location.- Schema version checking via
schema.yamlat the registry root. labeille checks this at load time and gives an actionable error if the registry schema is incompatible. default_registry_dir()andLABEILLE_REGISTRY_DIRenvironment variable for configuring the registry location.RegistrySchemaErrorexception for incompatible registry schemas.--adaptiveflag forlabeille bench run: stop iterating early when wall-time measurements converge (RSE below threshold).--adaptive-thresholdoption (default 0.005 = 0.5% RSE) and--adaptive-min-iterationsoption (default 5) for fine-tuning convergence behavior.adaptive,adaptive_threshold, andadaptive_min_iterationsfields in YAML benchmark profiles.converged_earlyfield onBenchConditionResult, recorded inbench_results.jsonl.relative_standard_error()function inbench/stats.py.- Adaptive convergence support in all three execution strategies (block, alternating, interleaved).
- Quick mode (
--quick) now enables adaptive convergence by default. - Convergence indicators in benchmark display: checkmark in table, count in quality summary, config line.
--trust-ft-wheelsflag forlabeille ft run: packages with free-threaded wheels (cpXYtABI tag) for the target Python version are classified ascompatible_by_wheelwithout running tests.--trust-ft-wheels-any-versionflag forlabeille ft run: like--trust-ft-wheelsbut trusts free-threaded wheels built for any Python version. Implies--trust-ft-wheels.COMPATIBLE_BY_WHEELcategory inFailureCategorywith⊕symbol and severity 0.has_ft_wheel()function inclassifier.pyto detect free-threaded wheels in PyPI release metadata.ft_wheel_foundandft_wheel_versionfields onFTPackageResultfor provenance tracking.trust_ft_wheelsandtrust_ft_wheels_any_versionfields onFTRunConfig, recorded inft_meta.json.- FT wheel check reuses PyPI metadata for sdist version lookup when both
--trust-ft-wheelsand--install-from sdistare active. --install-from {source|sdist}option forlabeille runandlabeille ft run: install packages from PyPI source distributions while running tests from cloned git repos.- Sdist version alignment:
fetch_latest_pypi_version()queries PyPI,checkout_matching_tag()aligns the repo to the matching release tag. - Source directory shielding:
shield_source_dir()temporarily renames flat-layout source dirs to prevent local imports from shadowing the sdist-installed package. - Install command splitting:
split_install_command()andbuild_sdist_install_commands()separate self-install segments from test dependency segments. install_from,sdist_version, andsdist_tag_matchedfields onPackageResult,FTPackageResult, and analysisPackageResult.labeille compatcommand group for C extension compatibility surveys:survey,show,diff, andpatternssubcommands.- ~30 built-in error classification patterns across 10+ categories (removed_c_api, cython_incompatible, pyo3_incompatible, numpy_c_api, missing_system_lib, etc.).
- YAML-based custom error pattern support with override semantics.
- Survey diff for tracking regressions and fixes between Python versions.
- Markdown export for sharing compatibility survey results.
- Optional uv integration for faster venv creation and package installation via
--installerflag (auto/uv/pip). InstallerBackendenum,detect_uv(),resolve_installer(), and_rewrite_install_command()inrunner.py.install_with_fallback()for automatic pip fallback when uv install fails.installer_backendfield onPackageResultand in run metadata.--installerCLI option onrun,bench run, andbisectcommands.- Scaled registry from ~350 to 1500 packages: 720 active, 362 skip_versions (3.15 blockers), 418 fully skipped, with 86.4% working test harness coverage.
- 5-tier test directory detection in
_auto_detect_test_dirs(): standard dirs (t/,spec/), package-named/internal dirs, monorepo subdirs, root-level test files, and scattered test files in package source. - Multi-forge URL normalization via
_normalize_forge_url()supporting GitHub, GitLab, Bitbucket, and Codeberg. - Expanded
extract_repo_url()with all-valuesproject_urlsscan anddescriptionfield scanning as last resort. recover-no-tests-foundandrecover-no-repo-urlregistry migrations for recovering falsely skipped packages.trends.pymodule inbenchsubpackage withPackageTrend,RegressionAlert, andSeriesTrenddataclasses for longitudinal benchmark analysis.compute_package_trend()with configurable regression/trend thresholds and sustained-change detection.analyze_series_trends()for full series analysis: loads all runs, computes per-package trends, generates regression alerts.- Five alert types:
new_regression,sustained_regression,recovery,new_instability,new_improvement. labeille bench track trendcommand for viewing trend analysis with table, CSV, and Markdown output.labeille bench track alertcommand for viewing regression alerts since the last run.export_trend_markdown()andexport_trend_csv()inbench/export.pyfor trend report generation.format_series_trend()andformat_regression_alerts()inbench/display.pyfor terminal output.tracking.pymodule inbenchsubpackage withTrackingSeriesandTrackingRunEntrydataclasses for longitudinal benchmark tracking.compute_config_fingerprint()for comparing benchmark configurations across runs (ignores package list and system profile).- Series management:
init_series(),add_run_to_series(),pin_baseline(),unpin_baseline(),load_series(),save_series(),list_series(). labeille bench trackCLI subgroup withinit,add,show,list,pin, andunpincommands.- Symlink-based run storage within tracking directories to avoid data duplication.
constraints.pymodule inbenchsubpackage withResourceConstraintsdataclass and ulimit/taskset command wrapping.- Resource constraints as part of the condition abstraction:
--memory-limit,--cpu-affinity, and--cpu-time-limitCLI flags forlabeille bench run. - Per-condition constraint specification in YAML benchmark profiles.
- Inline condition constraint parsing (e.g.
--condition "constrained:memory_limit=1024,cpu_affinity=0+1"). - OOM detection via
detect_oom_from_result()with new"oom"iteration status. constraints_appliedandoom_detectedfields onBenchIteration.constraintsfield onConditionDeffor per-condition resource limits.cache.pymodule inbenchsubpackage for filesystem cache management during benchmarks.--drop-cachesflag forlabeille bench runto drop filesystem caches between iterations for cold-cache benchmarking.--warm-vs-coldflag to automatically compare warm-cache and cold-cache performance.--run-dangerously-as-rootsafety flag — labeille refuses to run as root without it.labeille bench setup-cache-dropcommand showing setup instructions for the sudoers-based cache-dropping helper.generate_drop_caches_script()andformat_setup_instructions()helpers for cache-drop setup.caches_droppedfield onBenchIterationto record cache state per iteration.- Per-test timing capture via pytest
--durations=0output, enabled with--per-test-timingflag onlabeille bench run. TestTimingandPerTestTimingsdataclasses inbench/timing.pywith pytest output parser.compare_per_test()inbench/compare.pyfor per-test overhead analysis between conditions.--per-test <package>option forlabeille bench showandlabeille bench compareto display per-test timing breakdown.anomaly.pymodule inbenchsubpackage withPackageAnomalyandAnomalyReportdataclasses for proactive measurement-quality assessment.detect_anomalies()with five anomaly types:high_cv,bimodal,outlier_heavy,status_mixed, andtrend.is_bimodal()gap-analysis heuristic andhas_monotonic_trend()Spearman rank correlation for pure-Python distribution analysis.--anomaliesflag forlabeille bench showto display measurement anomaly report.- Anomaly summary in
labeille bench compareoutput when anomalies are detected. ## Anomaliessection in Markdown export when anomalies are present.ft/compare.pymodule withPackageTransition,FTComparisonResultdataclasses andcompare_ft_runs()for cross-run comparison: category transitions, pass rate changes, new/resolved crashes, aggregate deltas.ft/export.pymodule withexport_csv(),export_json(), andgenerate_report()for CSV, JSON, and markdown report export of free-threading results.ft/display.pymodule with terminal formatting for free-threading results: compatibility summaries, package tables, flakiness profiles, triage lists, GIL comparison reports, and progress output.ft_cli.pymodule withlabeille ftCLI subgroup:run,show,compare,compat,flaky,report,exportcommands for free-threading compatibility testing.ft/analysis.pymodule withFlakyTest,FlakinessProfile,GILComparisonResult,TriageEntry,DurationAnomaly, andFTAnalysisReportdataclasses for free-threading result analysis.analyze_flakiness()for detailed flakiness profiling with failure pattern classification and consecutive streak detection.compare_gil_modes()for GIL-enabled vs free-threaded result comparison.prioritize_triage()severity-scored triage with extension and TSAN bonuses.detect_duration_anomalies()using statistical outlier detection from bench/stats.py.analyze_ft_run()full analysis pipeline producingFTAnalysisReport.ft/runner.pymodule withFTRunConfig,OutputMonitor,run_single_iteration(),run_package_ft(), andrun_ft()for free-threading test execution with crash/deadlock/TSAN detection and pytest output parsing.ft/results.pymodule withFailureCategoryenum,IterationOutcome,FTPackageResult,FTRunMeta, andFTRunSummarydataclasses for free-threading result storage, categorization, and JSONL/JSON serialization.categorize_package()priority-ordered classification (install failure > import failure > deadlock > crash > TSAN > GIL fallback > compatible > incompatible > intermittent).ftsubpackage withcompat.pymodule for extension GIL compatibility detection: runtime probe viasys._is_gil_enabled()and source scan forPy_mod_gildeclarations.ExtensionInfo,SourceScanResult,ModGilDeclaration, andExtensionCompatdataclasses with JSON serialization.probe_gil_fallback()runtime probe,scan_source_for_mod_gil()source scanner,assess_extension_compat()combined assessment, andformat_extension_compat()display helper.guess_import_name()with_IMPORT_NAME_OVERRIDEStable for PyPI-to-import name resolution.benchsubpackage withsystem.pymodule for capturing system profiles (CPU, RAM, OS, disk) and Python interpreter profiles (version, JIT/GIL state, build flags) for benchmark reproducibility.SystemProfile,PythonProfile,StabilityCheck, andSystemSnapshotdataclasses with JSON serialization and terminal display formatting.check_stability()pre-benchmark validation (load average, available RAM).stats.pymodule inbenchsubpackage with pure-Python statistical functions:describe(),welch_ttest(),cohens_d(),bootstrap_ci(),detect_outliers(), andcompute_overhead()for benchmark comparison.DescriptiveStats,TTestResult,EffectSize,BootstrapCI, andOverheadResultdataclasses with scipy fallback for t-test p-values.timing.pymodule inbenchsubpackage withrun_timed()andrun_timed_in_venv()for capturing wall time, CPU time (viaresource.getrusagedelta), and peak RSS (via GNU/usr/bin/timewithru_maxrssfallback).results.pymodule inbenchsubpackage withBenchIteration,BenchConditionResult,BenchPackageResult,ConditionDef, andBenchMetadataclasses for the full benchmark result hierarchy, plus JSONL/JSON serialization viasave_bench_run(),load_bench_run(), andappend_package_result().config.pymodule inbenchsubpackage withBenchConfigdataclass, YAML profile loading, inline condition parsing, test command resolution, environment/deps merging, and configuration validation.runner.pymodule inbenchsubpackage withBenchRunnerclass orchestrating the full benchmark lifecycle: system profiling, stability checks, package setup (clone/venv/install per condition), timed iteration execution, and incremental JSONL result writing. Supports block, alternating, and interleaved execution strategies with progress callbacks.BenchProgressdataclass andquick_config()helper for rapid iteration during development.display.pymodule inbenchsubpackage with terminal formatting for benchmark results: per-package timing tables, multi-condition comparison tables with overhead/CI/significance, measurement quality summaries, and aggregate comparison summaries.compare.pymodule inbenchsubpackage with structured comparison analysis:PackageOverheadandComparisonReportdataclasses with anomaly flags (high CV, status mismatch, outliers),compare_conditions()for within-run comparison, andcompare_runs()for cross-run comparison.export.pymodule inbenchsubpackage with CSV (long-format per-iteration and summary) and Markdown export for external analysis tools and reports.bench_cli.pymodule withlabeille benchCLI subgroup:run(execute benchmarks from profiles or inline conditions),show(display saved results),compare(compare conditions within or across runs),system(print system characterization), andexport(CSV/Markdown/CSV-summary export).labeille bisectcommand to binary-search a package's git history and find the first commit that introduced a crash.bisect.pymodule withBisectConfig,BisectStep,BisectResultdataclasses and therun_bisectalgorithm with skip-neighbor handling for unbuildable commits.- Commit-aware run comparison:
analyze compareandanalyze runshow git commit changes alongside status changes with heuristic annotations (e.g. "unchanged — likely a CPython/JIT regression"). PackageComparisondataclass withcommit_changed/commit_unchangedproperties for per-package comparison data.- New crash summary statistics in compare output showing repo unchanged/changed/unknown counts.
- The package registry has been moved to its own repository: laruche. Use
labeille registry syncto fetch it. - Default
--registry-dirchanged fromregistry/(local) to~/.local/share/labeille/registry/(user-level). Override withLABEILLE_REGISTRY_DIRenv var. - All documentation updated to reflect the split. Enrichment docs now live in laruche.
- Added
Literaltypes to 6 string-constrained fields:IterationOutcome.status,RunnerConfig.installer/install_from,PackageEntry.extension_type/install_method/test_framework. Mypy now catches invalid values at type-check time. - Extracted shared
kill_process_group()intoio_utils.py, replacing 3 independent implementations inrunner.py,bench/timing.py, andft/runner.py. The FT runner now correctly usesos.getpgid()andsignal.SIGKILLinstead of rawos.killpg(pid, 9). - Extracted
_build_bench_config()helper frombench_cli.run, reducing the command body from ~130 lines to ~15 lines. Organized 35 Click options into labeled sections (profile, execution, package selection, paths, stability, adaptive, advanced). - Added
utc_now_iso()helper toio_utils.py, unifying 15+ timestamp generation sites across 8 modules to a single canonical UTC format with Z suffix. - Registry
save_index()andsave_package()now useatomic_write_text()for crash-safe writes, preventing corruption of the most sensitive files. - Added
Literaltypes toPackageResult.status,ResolveResult.action,BenchIteration.status, andEffectSize.classificationfor compile-time typo detection. - Made
DescriptiveStats,TTestResult,EffectSize,BootstrapCI,OverheadResult, andCrashInfodataclasses frozen (immutable). - Narrowed 23
except Exceptionhandlers inbench/system.pyto_PROBE_ERRORStuple, preventing accidental swallowing of programming errors while preserving best-effort system probing. - Added logging to 8 previously silent exception handlers in
ft/runner.py,bisect.py,cli.py,runner.py, andbench/timing.py. - Added error handling to
save_crash_stderr()withmkdirfor the crashes directory. - Fixed
ScanResultforward reference incli.py— now usesTYPE_CHECKINGimport, removing 4 suppression markers and 2 runtime asserts. - Standardized
--outputoptions toclick.Path(path_type=Path)inft_cli.pyandanalyze_cli.py. - Added
type=inttoft run --timeoutfor consistency with other commands. - Added docstrings to 7 undocumented public APIs in
compat.py(properties,to_dict,from_dict). - Fixed
cli.pymodule docstring to list all 9 subcommand groups. - Added
test_logging.py(8 tests) andtest_io_utils.py(10 tests) for previously untested foundation modules. - Derived
_KNOWN_FIELDSand_FIELD_TYPESinregistry_ops.pyfromPackageEntrydataclass metadata, preventing drift. - Added
PackageResult.to_dict()method, simplifyingappend_result()from 22-field manual dict toasdict(). - Eliminated redundant
_atomic_writewrappers inregistry_ops.pyandmigrations.py. - Added
encoding="utf-8"tobench/runner.pymid-run metadata write. - Extracted shared
atomic_write_text()utility inio_utils.py, replacing duplicate implementations inregistry_ops.py,migrations.py, andbench/tracking.py. - Promoted
_clean_env()to public API asclean_env(), replacing inline env sanitization inft/compat.pyandbench/runner.py. - Unified logger acquisition: all 20 bench/ and ft/ modules now use
get_logger()with per-module names (e.g.,bench.runner,ft.runner) for filterable log output. - Added
encoding="utf-8"to all bench/ file I/O calls for cross-platform consistency. - Added
show_default=Trueto--timeoutoptions inbench_cli.pyandft_cli.py. - Standardized
click.Path(path_type=Path)acrossbench_cli.pyandft_cli.py, removing manualPath()wrapping. - Added
write_meta_json(),append_jsonl(),load_jsonl(), anditer_jsonl()utilities toio_utils.py, unifying persistence patterns across all four subsystems (runner, bench, ft, compat). - All meta.json writes now use
atomic_write_text()for crash safety (previously only registry files were atomic). - All JSONL loads now use streaming iteration with error tolerance for malformed trailing lines.
- Extracted
_ensure_repo(),_run_install(), and_analyze_test_result()from_run_package_inner()inrunner.py, reducing the 420-line monolith to ~180 lines and eliminating duplicated install error handling. - Narrowed remaining
except Exceptionblocks:ft/runner.py:397toOSError,ft/runner.py:700to(TimeoutExpired, SubprocessError, OSError),bisect.pyto(FileNotFoundError, OSError),cli.pyto(OSError, ValueError, KeyError, TypeError). - Merged duplicate
load_package()calls inbisect.pyinto a single call with warning-level logging. - Raised
get_installed_packageslogging from debug to info for better diagnostics. - Added
from __future__ import annotationsto all test files for consistency with source modules. labeille analyze registryshows percentages alongside all counts.- Download tier coverage (top 100, 500, 1000, 2000) shows what fraction of the most-downloaded packages are testable.
- Version readiness section shows per-Python-version active/skipped counts with top skip reasons.
--format countsis preserved as a backward-compatible alias for the summary format.- Updated 63 registry packages with accurate skip reasons from compat analysis: cleared vague 3.15 skip_versions, added specific failure categories (Meson, CMake, removed APIs), reclassified non-3.15 issues as skip with precise reasons, and unskipped 5 packages that now build on 3.15.
BenchRunner._run_iteration()applies resource constraints via command wrapping before execution.BenchRunner.run()now checks for root execution and refuses unless--run-dangerously-as-rootis passed.- macOS support for system profiling: CPU info from
sysctl, memory fromvm_stat, OS fromsw_vers, disk type fromdiskutil. All existing Linux code paths preserved unchanged. check_stability()andSystemSnapshot.capture()now use cross-platform_get_available_ram_gb()helper instead of Linux-only/proc/meminfo.format_system_profile()no longer hardcodes "Linux" in the OS line; shows platform-appropriate output.- Switched build backend from setuptools to hatchling for better src layout support and lighter build dependencies.
- Added minimum version pins to runtime dependencies (click>=8.0, pyyaml>=6.0, requests>=2.28).
- Added
py.typedmarker for PEP 561 type checker support. - Added sdist/wheel exclusion rules to keep distribution lean (no tests, registry, results, or docs).
- Added Installation section to README with pipx, pip, and from-source instructions.
- Added
Environment :: ConsoleandTopic :: Software Development :: Quality Assuranceclassifiers. - Renamed
IssuesURL key toBug Trackerin project metadata for PyPI display consistency. - Replaced
raise SystemExit(130) # noqa: B904withraise SystemExit(130) from Noneinbench_cli.py, removing the suppression. - Narrowed
except Exceptioninbench/system.pyJIT detection toexcept (AttributeError, TypeError). - Added explanatory comment for
except BaseExceptioninio_utils.pyatomic write. - Narrowed 5
except Exceptioncatches inbench/runner.pyand 1 inft/runner.pyto specific exception tuples (OSError,subprocess.SubprocessError,ValueError,KeyError), removing allnoqa: BLE001suppressions. - Restructured
cli.pysubgroup registration into_register_subcommands()function, eliminating 5noqa: E402suppressions. - Fixed
type: ignore[operator]inft/display.pyby adding explicitNoneguard, inbench_cli.pyby restructuring conditionals, andtype: ignore[no-any-return]inresolve.pyby using intermediate variable. - Extracted 7 helper functions from
run_package_ftinft/runner.py(360→~80 lines):_check_ft_wheel_trust,_clone_and_align_ft,_create_venv_and_install_ft,_install_sdist_mode,_install_source_mode,_run_ft_iterations,_run_gil_comparison. - Split
runner.py(1972→1445 lines): extracted data models torunner_models.pyand git/repo/sdist operations torepo_ops.py, with re-exports preserving all existing imports. - Reduced
type: ignoremarkers in test files from 59 to 2 by typing builder kwargs asAny, usingassert x is not Nonefor type narrowing, adding explicit generic parameters, and typing mock parameters asMagicMock.
- Dead code:
_log2()frombisect.py,RegistryStats/analyze_registry()fromanalyze.py(superseded byRegistryReport/generate_registry_report()),load_ft_summary()fromft/results.py,format_progress()/format_gil_comparison()fromft/display.py, unused_MOD_GIL_MENTION_PATTERNregex fromft/compat.py. - Removed 12 unused
log = get_logger(...)variables and their imports from modules that had logger scaffolding but no log calls.
- Narrowed 3 bare
except Exceptionhandlers:ft/compat.pyGIL probe (toImportError, OSError, AttributeError), submodule import loop (toImportError, OSError), andregistry.pyschema parsing (toyaml.YAMLError, OSError). Programming errors now propagate instead of being silently swallowed. bench/timing.pynow sanitizes subprocess environment viaclean_env(), preventing PYTHONHOME/PYTHONPATH pollution in benchmark runs.ft/runner.pynow usesstart_new_session=Trueinstead of deprecatedpreexec_fn=os.setpgrp, fixing a thread-safety hazard withThreadPoolExecutor.- Top-level error handlers in
runner.py,ft/runner.py, andresolve.pynow preserve tracebacks viaexc_info=True. - Replaced
SystemExit(1)withclick.ClickExceptioninbench_cli.pyfor consistent error formatting. - Manual review of all 1,798 enriched registry packages: corrected invalid
extension_typevalues, added missing-p no:xdistflags, fixed inconsistent skip states, corrected repo URLs, collapsed multiline YAML, and fixed test commands. update_index_from_packages()no longer crashes whenskip_versionsisNone.- Bench runner
install_packagenow receives a complete environment (starting fromos.environ) instead of bare condition env vars, fixing install failures when build backends needPATHto find tools likegit. run_meta.jsonnow stores actual CLI argument strings (sys.argv[1:]) instead of parameter names, making runs reproducible from metadata.build_reproduce_commandusesexport PATHfor venv activation instead of fragile.venv/bin/prefix string replacement.- Deduplicated
_signal_name(3 copies →format_signal_nameinformatting.py). - Deduplicated
_result_detail(3 copies → publicresult_detailinanalyze.py). - Made
_extract_minor_versionpublic asextract_minor_versioninanalyze.py. - Removed redundant zero-check in
compare_runsduration percentage calculation. - Fixed timeout documentation (300s → 600s) in
doc/enrichment.md. _quote_yaml_scalarnow quotes all numeric strings (integers, scientific notation, octal-like), tilde, and additional YAML special characters.find_field_extentno longer consumes trailing blank lines after scalar fields, fixinginsert_field_afterplacement near blank lines.- Rewrote
batch_set_fieldto use line-level manipulation instead of PyYAML round-trip, preserving YAML formatting. - Added
set_field_valuetoyaml_lines.pyfor in-place field value replacement. format_yaml_valueandparse_default_valuenow handleNone/nullvalues._is_version_specific_skipnow uses word-boundary regex patterns to prevent false positives (e.g. "trust" no longer matches the "rust" pattern).scan-depsnow warns about namespace packages (google,azure,zope, etc.) where pip resolution is uncertain, and tries full import paths before falling back to top-level modules.IndexEntrynow tracksskip_versions_keysfor fast version-skip filtering without loading full YAML files.filter_packagesuses index-levelskip_versions_keysto skip packages before loading YAML._dict_to_packagecoercesnotes: nullto empty string for type safety._dict_to_packagelogs unknown YAML keys at debug level to surface typos.validate_registrychecksuses_xdist/-p no:xdistconsistency in both directions.check_jit_enablednow uses explicitsys.flags.jitcheck instead of nonexistentsys._jit, with exact stdout comparison._parse_install_packagesnow handlespython -m pip install,python3 -m pip install, and path-qualified pip invocations._package_to_dictacceptsomit_defaultsparameter to exclude default-valued fields from output.run_test_commandandinstall_packagenow kill the entire process tree on timeout viaos.killpg, preventing orphaned grandchild processes from accumulating during batch runs.RunData.result_for()now uses O(1) dict lookup instead of O(N) linear scan, with lazily-built_results_by_pkgcache.compare_runsand_compute_status_changesuseresult_for()instead of building ad-hoc dicts.- Subprocess helpers (
build_env,check_jit_enabled,create_venv,validate_target_python) now stripPYTHONHOME/PYTHONPATHvia_clean_env()to prevent environment pollution. - CLI warns when only one of
--repos-dir/--venvs-diris set, since the other will use a temporary directory. update_index_from_packagesaccepts optionalmodified_packagesset to avoid O(N) disk reads when only a few packages changed._PLATFORM_INDICATORSnow detects barelinux_x86_64/linux_aarch64wheels in addition tomanylinux/musllinux.fetch_pypi_metadataandresolve_packageaccept an optionalrequests.Sessionfor connection reuse;resolve_alluses shared/thread-local sessions._is_import_error_handlerno longer treatsexcept Exceptionas an import error handler, reducing false conditional import flags._parse_install_packagesuses a regex instead of chained.split()to handle all PEP 440 specifiers (~=,!=,;markers).pull_repousesgit fetch+reset --hard FETCH_HEAD+clean -fdxinstead ofgit pull --ff-only, handling dirty working trees left by test suites.
--extra-depsoption to inject additional packages into every venv after the package's own dependencies.--test-command-overrideoption to replace the test command for all packages in a run.--test-command-suffixoption to append flags to each package's existing test command.--repo-override PKG=URLoption (repeatable) to test forks or PR branches without modifying registry.--clone-depthand--no-shallowCLI options to override per-package clone depth;--clone-depth=0or--no-shallowfor full clones.- Per-package git revision support via
--packages=pkg@revisionsyntax; accepts commit hashes, branches, tags, or relative refs likeHEAD~10. checkout_revisionhelper for checking out specific git refs after cloning.parse_package_specsfunction for parsingname@revisionpackage spec syntax.requested_revisionfield inPackageResultto distinguish explicitly requested revisions from HEAD.- 350 enriched package configurations with full test commands, install commands, and metadata.
- Applied
skip-to-skip-versionsmigration on 36 packages (PyO3, maturin, Cython, JIT crashes). - Config fixes for python-dateutil, pyyaml, msgpack, hatchling, openai, numpy, pytz, sqlalchemy, and 3 archived google packages.
registry/migrations.logtracking applied migration history.labeille registry migratecommand with a migration framework for registry schema transformations.skip-to-skip-versionsmigration to convert 3.15-specificskip:trueentries toskip_versions["3.15"].- Migration log (
migrations.log) to track applied migrations and prevent re-application. - Dry-run support for migrations with preview of affected packages.
labeille scan-depscommand for static test dependency discovery via AST-based import analysis.import_map.pymodule with 100+ import-name-to-pip-package mappings for common mismatches (PIL->Pillow, yaml->PyYAML, etc.).- Three output formats for scan-deps: human-readable (default), JSON, and pip (for direct shell use).
- Automatic test directory detection and local module filtering in scan-deps.
- Comparison against existing install_command to identify missing deps.
- Registry cross-referencing for import_name resolution in scan-deps.
labeille analyzeCLI subgroup with five subcommands:registry,run,compare,history,package.formatting.pyshared formatting module (tables, histograms, sparklines, duration, status icons).analyze.pydata loading and analysis module (run data, registry stats, comparison, flaky detection).labeille registryCLI subgroup for batch registry management (add-field, remove-field, rename-field, set-field, validate, add-index-field, remove-index-field).- Line-level YAML manipulation (
yaml_lines.py) preserving exact formatting. - Batch operations module (
registry_ops.py) with filtering, atomic writes, and dry-run previews. - Registry validation against the PackageEntry schema with
labeille registry validate. skip_versionsregistry field for per-Python-version skip reasons (e.g.3.15: "PyO3 not supported").--force-runflag to overrideskipandskip_versionsfor debugging.--workers Noption for parallel package testing inlabeille run.--workers Noption for parallel PyPI resolution inlabeille resolve.- Cancellation support for
--stop-after-crashin parallel mode. clone_depthregistry field for packages needing git tags (e.g. setuptools-scm).import_nameregistry field for packages whose import name differs from PyPI name.summary.pymodule for formatting run results.- Enrichment best practices documented in CONTRIBUTING.md.
--refresh-venvsflag to delete and recreate existing venvs, ensuring updated install commands take effect.- Initial project scaffolding.
- CLI skeleton with
resolveandrunsubcommands. - Registry schema and data structures.
- Standalone guides for resolve-run workflow, benchmarking, free-threaded testing, and compatibility analysis (doc/workflow.md, doc/benchmarking.md, doc/free-threaded.md, doc/compat.md).
- README sections for benchmarking, free-threaded testing, and compatibility analysis features with command examples and links to standalone guides.
- Updated README Status and Project Structure sections to reflect bench, ft, and compat features.
- Added Anthropic support acknowledgment section to README.md.
- Added security warnings to README.md, runner.py module docstring, and CLAUDE.md.
- Added Gemini acknowledgment to CREDITS.md.
- Comprehensive enrichment guide with manual workflow, troubleshooting, and Claude Code prompts (doc/enrichment.md).
- Updated README with enrichment overview and link to guide.
- Parallel execution guidance, resource considerations, and ASAN vs non-ASAN trade-offs.
- Refactored
summary.pyto use shared formatters fromformatting.py. - Improved repo URL resolution with secondary keys (bug tracker, issues, changelog) and legacy field fallbacks (home_page, download_url).
- Run summary shows version-skipped count separately when skip_versions is active.
- Progress reporting adapted for parallel execution with per-completion status lines.
- Rich end-of-run summary with per-package table, timing stats, and crash details.
- Quiet mode shows only crash information; default mode hides passing packages.
- Post-install import validation catches broken installs before running tests.
- Add
--work-dir,--repos-dir, and--venvs-diroptions torunfor persistent clone/venv directories that survive across runs. - Reuse existing repo clones (pull instead of re-clone) and venvs (skip create+install).
- Log repo and venv paths in default output for each package.
- Verbose mode (
-v) now shows test subprocess stdout/stderr, resolved commands, install output, installed dependency list, git operations, and per-phase timing.