Releases: ProfineAI/profine-cli
Releases · ProfineAI/profine-cli
v0.5.1 — telemetry priors-flywheel fix
Telemetry bugfix release. Two NULL columns the priors flywheel needed were never being emitted; the materialized view's success_rate column has been NULL for every user since telemetry shipped, silently breaking the suggester's failure-avoidance filter.
pip install -U profineFixed
_loss_ok_from_benchwas reading a non-existent key. The function readcorrectness.verdictfrombenchmark_comparison.json, but the correctness sub-dict has apassed(bool) key, not averdict. (verdictis a top-levelBenchmarkComparisonfield.) The bug emittedloss_ok=Nonefor every run sent to the telemetry backend. The test fixture was buggy in the same way as the production code (it also used theverdictkey), explaining the silent regression. Adds a focused regression test (test_loss_ok_reads_correctness_passed_not_verdict) that pins both the real and old-buggy shapes._gather_outcomesnever setruntime_seconds. The field is inALLOWED_OUTCOME_FIELDSbut was missing from every emit-row dict literal, sorun_optimizations.runtime_secondswas NULL across the entire dataset. The primary outcome now carries the profile's wall-clock runtime; stacked/skipped rows still omit it (no per-entry attribution available — same convention asspeedup_factor).
v0.5.0 — multi-rep mingpt fixes + telemetry resilience
Multi-rep mingpt benchmark surfaced four product bugs + a breaking CLI change + a telemetry-resilience overhaul. All bugs fixed, 9 regression tests added, telemetry no longer silently drops rows when the backend is cold.
pip install -U profine⚠️ Breaking change
--hardwareis now required onprofile,benchmark, andrun-all. The previousautodefault silently chose a "smallest preset that fits" using a heuristic that mis-sized GPUs for unknown architectures; making it explicit prevents that footgun. Pick one of:1x_t4,1x_l4,1x_a10g,1x_a100,1x_h100. Theauto_select_hardware()helper and the param-bucket preset table have been removed.
If you were running profine run-all train.py, change it to profine run-all train.py --hardware 1x_a100 (or your preferred preset).
Added
profine telemetry doctor. Synchronous probe of the telemetry endpoint that reports consent state, endpoint URL, HTTP status code, and per-attempt latency. Use this to verify the round-trip works (or to warm a sleeping Render dyno before a real run).- Update-check nudge on CLI startup. Profine now checks PyPI for the latest release once every 24 hours (cached in
~/.profine/) and prints a one-line nudge if your installed version is behind. Silenced viaPROFINE_NO_UPDATE_CHECK=1. - Low-sample warning. Benchmark reports surface a warning when fewer than 10 step samples survive warmup stripping — so users notice when the median is built on thin data.
PROFINE_TELEMETRY_RETRY_BACKOFFenv var. Test-and-CI knob for the telemetry retry backoff. Defaults to 2.0s in production.
Changed
- Telemetry HTTP transport: timeout 5s → 15s, one retry with 2s backoff. The anon endpoint is hosted on Render's free/starter tier, where the first request after idle takes ~9s to wake the dyno. Under the old 5s timeout that first POST was always silently dropped. Final-attempt failures now log at WARNING (was DEBUG) so silent data loss is no longer invisible.
- Verdict string for fast-but-wrong runs now reads
FAIL (correctness; speedup measured but loss diverged)instead of leading withPASS. A run that ships incorrect numerics is not a pass, regardless of its step time. - README results section replaced with a median-of-3 multi-GPU table (A10G + A100). Honest framing of variance + range rather than a single fast-run headline.
Fixed
_projected_savingsdivide-by-zero when speedup approached 100% (zero-sample candidate). Clampedfraction_savedto 0.99._maybe_adaptstep-time estimate poisoned by torch.compile cold-start. The adaptive step controller previously usedelapsed / steps_completed, which is dominated by a 2.8s first-step compile when the steady state is ~17ms. Now uses median of recorded step times when available._strip_warmupcould strip more samples than existed, producing a zero-sample comparison with a bogus "100% faster / ∞× speedup" result. Capped to keep at least 3 samples on bothbenchmarker.benchmarkerandprofiler.orchestrator.--edit-diroutside--outputnow correctly resolves the suggest report viaedit_dir.parent / "suggest". Without this, the BF16-aware tolerance widening never fired on standalonebenchmarkinvocations, and every BF16-stack benchmark spuriously failed correctness._resolve_hardwareintelemetry/emit.pynow prefers the explicithardware_nameargument overprofile_record.hardware_name. Batch / replay callers re-emitting from on-disk artifacts for a different GPU than the one that produced the profile record were having their rows mis-tagged.
Internal
- 9 new regression tests pinning each surface bug above; 584 tests total.
- Six empty package directories deleted (
heuristics/,modifiers/,output/,preflight/,search/,resources/) — vestigial scaffolding from a past refactor. - LLM backends (
profine/llm/backend.py) gained exponential-backoff retry for transient API errors (timeouts, 5xx, rate limits), bounded at 3 attempts and env-tunable. - Modal executor (
profine/modal/executor.py) filters benign Inductor autotune log spam (No valid triton configs,OutOfMemoryError: out of resource: triton_mm) so successful autotune sweeps don't read as crashes; also wiresPROFINE_WALL_CLOCK_LIMITso the script'sStepControllerstays below Modal's container timeout. - Stacked edits in
profine/editor/editor.pyare wrapped in try/except so one bad LLM candidate surfaces as a non-appliedEditResultinstead of blowing away previously-successful edits. - Reader feeds sibling modules to the analyzer LLM, so defaults defined in imported files (e.g.
mingpt/model.py) no longer come back as "guessed" zeros. - File-not-found errors now hint that a sibling
prepare.pyneeds to run when the missing path looks like a tokenized dataset (nanoGPT/minGPT layout).