CHANGELOG

v1.28.0 (2026-04-03)

Bug Fixes

Eliminate ~76k test warnings (0 remaining) (df637e6)
Flatten ideas_tracker aliases (list[dict]) to MemoryCard (list[str]) (f6e620d)
Lint + format pre-existing errors in experiment files (9241296)
Lint errors in ablation_v3_no_deep.py, update prereg_commit (a118bbf)
MemoryCard.aliases type list[str] → list[Any] (a31443f)

Code Style

Ruff format (d05507d)

Refactoring

Remove all 27 type: ignore comments from codebase (91a175f)
Rename memory test files to describe what they test (68828e2)

Testing

Integration test for ideas_tracker dict aliases (Bug #2, PR #161) (ff54673)

v1.27.0 (2026-04-02)

Bug Fixes

Format card_conversion.py (228b8f3)
Lint import sorting in A_mem + GAM_root (pre-existing) (5e3baa3)
memory: Address chaos-hacker findings on public API (91aec06)
memory: Correct concept_to_card return type annotation (78327d0)

Features

Add gigaevo.memory public API exports (7790b82)

Refactoring

Replace 50 print() with loguru in A_mem + GAM_root (59853df)
memory: Add future annotations, reduce hasattr/getattr usage (ae4e403)
memory: Consolidate 20 test files into tests/memory/ (e6f8480)

v1.26.0 (2026-04-02)

Features

Dict → Pydantic migration complete — normalize_memory_card returns AnyCard (f2ea951)

Refactoring

Normalize_memory_card returns AnyCard (Pydantic models) (5926631)
Replace print() with loguru, remove sys.path hacks in ideas_tracker (8663644)

v1.25.0 (2026-04-02)

Bug Fixes

Add break condition for processing when no new ideas are present (a6a3a18)
Add break condition for processing when no new ideas are present (0e20d87)
Changed cooccurrence threshold agressive scaling to fixed minimum (5c5c29b)
Changed cooccurrence threshold agressive scaling to fixed minimum (c8a7e5b)
Circular import in logger (55e3b1f)
Circular import in logger (522d28e)
Clean up memory PR merge — lint, format, junk dirs, broken imports (21035ab)
Correct serialization of dict and lists in pd columns (f7a6bbe)
Correct serialization of dict and lists in pd columns (2dd2eba)
Dead retry in _decide_card_action — parse_llm_card_decision returns None for garbage (80649dd)
Eliminate RuntimeWarning in generate_mutations tests (ba4195d)
Handle parent_ids as string in ideas_tracker (7601ed9)
Handle parent_ids as string in ideas_tracker (3148878)
IncomingIdeas update logic fix (b9a1781)
IncomingIdeas update logic fix (2947dda)
Lint and format errors for CI (ruff check + ruff format) (b884547)
Phase 1 — 3 confirmed bugs fixed in memory system (5e9addd)
Remove debug print (bf4a981)
Remove debug print (cda32cb)
Remove short id separate storage and generation (d0fd1a6)
Remove short id separate storage and generation (d1fdde8)
Restore RedisRunConfig + fetch_evolution_dataframe re-export in tools/utils.py (30bc8ea)
Wrong key name fix (66cab68)
Wrong key name fix (9cc9912)

Chores

Removed unused prompts (5cff8f1)
Removed unused prompts (52852cc)
Update docstrings (3123cba)
Update docstrings (6a44189)

Features

Add best idea extraction based on top_k selection by fitness and delta fitness (04ab7c6)
Add best idea extraction based on top_k selection by fitness and delta fitness (995929e)
Add changes extraction to mutation agent (cd924c7)
Add changes extraction to mutation agent (ad342ca)
Add extended record card dataclass (8aeab4e)
Add extended record card dataclass (2d80776)
Add idea description rewriting logic (0201eb3)
Add idea description rewriting logic (ea8ddd3)
Add idea origin analysis script and minor refactor ideas_tracker.py (ab184b8)
Add idea origin analysis script and minor refactor ideas_tracker.py (5a38c1d)
Add idea tracker (e9e911d)
Add idea tracker (eb9701b)
Add logging for idea tracker (21b6f58)
Add logging for idea tracker (0563796)
Add ProgramCard, ConnectedIdea, AnyCard Pydantic models (edbdd1e)
Add update logic for extended record card (0bb5038)
Add update logic for extended record card (8529a30)
Csv loading to IdeaTracker (3cc0605)
Csv loading to IdeaTracker (5b610ca)
Experimental ml pipeline for impact estimation based on linear regression feature weights (ffaac97)
Experimental ml pipeline for impact estimation based on linear regression feature weights (bda2e79)
Implement idea enrichment with LLM-generated keywords and summaries (1d4f350)
Implement idea enrichment with LLM-generated keywords and summaries (102cb74)
Support for extended record card (a7cb492)
Support for extended record card (0549fe7)
Task description loading (bbfc57a)
Task description loading (2c3ff68)
Update main logic to work with extended record card (b1ed1ac)
Update main logic to work with extended record card (a98d6b1)

Refactoring

Add Protocol types, fix mypy errors, CardDict alias (2f09116)
Extract _note_fields_changed, remove stale comments and blank lines (c3fbede)
Extract card_conversion.py + utils.py, add MemoryNoteProtocol typing (406f2cd)
Extract DEFAULT_MODEL_NAME constant, remove ad-hoc string (0f67bc1)
Extract more pure functions + memory_write_config.py (7e1bdbe)
Phase 2 — import cleanup, context manager, remove del (dd31c4f)
Phase 3 — extract _ConceptApiClient + utilities (8dacf30)
Record card extended minor refactor (bbda084)
Record card extended minor refactor (b5eac06)
Remove debug code (5a9f8ec)
Remove debug code (117a325)
Rename test files and classes to professional naming (6ae0134)
Replace all print() with loguru logger across memory module (5091f37)
Replace ML impact pipeline with origin analysis computation and improve docstring clarity (bcb2ff7)
Replace ML impact pipeline with origin analysis computation and improve docstring clarity (3f57d5f)

Testing

Chaos-hacker bug exposure tests (16 tests) (1e62411)
Cycle 10 (final) — API search, LLM synthesis, close() (21 tests) (2d63100)
Cycle 11 — fake agentic memory infrastructure + 24 tests (72c49e5)
Cycle 12 — fake Chroma/GAM + full dedup pipeline (15 tests) (d94a674)
Cycle 13 — 7 realistic E2E scenarios + 2 unpatched real-memory tests (282f7f6)
Cycle 2 — API client, dedup decision, truncate (28 tests) (77aa25d)
Cycle 3 — deeper AmemGamMemory internals (21 tests) (543b2e2)
Cycle 4 — integration tests + chaos-hacker regression fixes (25 tests) (4b92198)
Cycle 5 — mutation operator memory flow, sync_from_api, API body checks (17 tests) (fcbc352)
Cycle 6 — 8 e2e scenarios + data_components (64 new tests) (1040f02)
Cycle 7 — contract tests + engine interaction (34 new tests) (616f88c)
Cycle 8 — full-loop evolution with memory (11 E2E tests) (d7ef94d)
Cycle 9 — LLMMutationOperator real constructor + memory (14 tests) (9035068)
P0 exhaustive tests for memory module core (211 tests) (572c28a)
P1 dedup edge cases + OpenAI inference tests (100 tests) (9945c0e)
P2 memory_write_example edge cases (22 tests) (70375bb)

v1.24.1 (2026-04-01)

Bug Fixes

Remove last 4 dead .claude/rules/ references from CLAUDE.md (9caf0c3)

Chores

Remove GitNexus from CLAUDE.md, skills, and gitignore (b9f90ed)

Documentation

Add Quick Start sections with runnable commands to all feature docs (4d9a809)

Refactoring

Rename scheduling/lpt_ridge → lpt_chain, clarify scope (10cf394)

v1.24.0 (2026-04-01)

Highlights

This release focuses on performance infrastructure, experiment tooling maturity, and repository hygiene. Two experiments were completed (hover/steady-state-v2: POSITIVE, hover/map-elites-topology: NULL), and the framework gained production-grade load balancing, scheduling, and monitoring.

New Features

Steady-state evolution engine — continuous mutation/evaluation interleaving that eliminates the generational barrier. Two async loops (producer + consumer) with backpressure via asyncio.Semaphore(max_in_flight). Opt-in: evolution=steady_state. Expected throughput: ~8-9x improvement over step-wise generations.
LPT scheduling for DAG evaluation (#136) — longest-processing-time-first scheduling assigns expensive programs to evaluation slots first, reducing tail latency. Discrete-event simulation benchmarks in tools/benchmarks/.
LLM load balancer (llm=balanced) — Redis-coordinated endpoint pool with least-connections routing. Mutation servers shared across all runs via Redis DB 15. Replaces manual llm_base_url per-run configuration.
LiteLLM proxy integration — bash tools/litellm.sh auto-generates config from experiments/infrastructure.yaml and starts a LiteLLM proxy for chain server load balancing. All chain requests route through INTERNAL_IP:4000.
Chain feature extraction — ChainFeatureExtractor computes structural behavior coordinates (DAG depth, retrieval count, step count) from real chain programs for MAP-Elites behavioral characterization.
Experiment diagnostics — /experiment-diagnose skill: automated failure analysis for running experiments. Checks Redis health, PID liveness, log errors, and Hydra config overrides.
Experiment restart — /experiment-restart skill: kill all processes, flush Redis, and re-launch cleanly.
Throughput monitoring — tools/throughput_plot.py and 6-panel dashboard in watchdog: mutation rate, eval throughput, fitness distributions, validity panels. Posted hourly to experiment PRs.
Fitness vs wall-clock time — tools/fitness_vs_time.py plots fitness trajectories against real time instead of generation number.
Prompt co-evolution — user prompt co-evolution alongside system prompts (prompt_fetcher=coevolved).

Bug Fixes

120s read timeout killed 96% of chain evaluations — removed read timeout (timeout=None, keep connect=30s) to allow long-running chains under load.
CancelledError orphans — except Exception didn't catch BaseException in steady-state engine, leaving programs persisted but IDs lost. Fixed with persisted_id sentinel + except BaseException.
Mutation LLM double-escaping — LLMs using with_structured_output() sometimes double-escape quotes in code fields. Fixed by _fix_double_escaped_quotes() in mutation agent.
Frontier metric recomputation — when NO_CACHE stages re-evaluate programs, frontier is now recomputed correctly using clear_series() + full rewrite instead of appending stale values.
TOCTOU races in SteadyStateEngine — scoped drain + TOCTOU-safe ingest_batch, add_elite with optimistic locking and WatchError retry.
Ghost program detection — mirrors parent engine's _await_idle() logic to clean up orphaned program IDs.
Proxy bypass — added mutation server IPs to NO_PROXY to prevent Squid proxy from blocking LLM calls.

Experiments

Experiment	Result	PR
hover/steady-state-v2	POSITIVE — continuous interleaving improves throughput	#138
hover/map-elites-topology	NULL — 3D structural BC (dag_depth, n_deep_retrieval, n_steps) did not improve fitness	#142

Repository Cleanup

Removed leaked vartodd/circuit_evolve code — problems, configs, custom/, gf2lib/, npy/, launch scripts (12,800+ lines deleted)
Removed experiment runtime artifacts — PNGs, pids.txt, cfg_run_*.txt from all completed experiments
Consolidated tools hierarchy — experiment-specific scripts (archive, preflight, protocol gates) now live in tools/experiment/; general tools in tools/
Removed all hardcoded paths — skills, agents, tools, and docs now use $PROJ (git root) and $GIGAEVO_PYTHON (env var) instead of /workspace-SR008.fs2/... or /home/jovyan/...
Fixed .gitignore contradictions — .claude/ and CLAUDE.md were tracked but gitignored
Cleaned root directory — moved benchmarks/ → tools/benchmarks/, demos/ → docs/demos/
Rewrote Redis data model docs — complete key namespace reference with all metric tags, archive persistence, iteration vs generation glossary

Documentation

CLAUDE.md — added tools index, skills table (12 skills), agents table (9 agents), @tools/README.md include for Redis data model
tools/README.md — structured tool index with categories (general, experiment lifecycle, infrastructure, benchmarking, scaffolding), accurate Redis appendix
Removed dead references — .claude/rules/*.md files that never existed on main, redirect stub docs/redis_schema.md

Testing

56+ new tests: race conditions, streaming, failure modes, mutation-killing, TOCTOU guards, NaN handling
Removed deprecated test classes (TestSafetyMechanismBreakage, TestEngineGenerationTimeout)
Full suite: ~3500 tests, all passing

v1.23.0 (2026-03-15)

Bug Fixes

bugs: Round-2 — migration KeyError on None island + DAG empty-nodes crash (54810b0)
bugs: Round-4 — 5 junior-researcher attack surface bugs (073cc33)
bugs: Round-5 — H1 sentinel bypass + TOCTOU dag_runner + H2-H4 guard tests (d039cf1)
tests: Update test_evolution_engine.py for get_all_by_status migration (a33604b)

Chores

generalization: Add launch script and run_status.sh (61e76af)
generalization: Add launch script and run_status.sh (96bbb42)
generalization: Add test eval script, PR description, gitignore indexes (6405a22)
generalization: Backfill pre-registration commit hash in 03_plan.md (3bd7fea)
generalization: Gen-1 smoke check — all 4 runs alive, split bias OK (17b25db)
generalization: Record binding prompt review sign-off in 03_plan.md (6a0d2e6)

Documentation

memory: Chaos-hacker round-5 findings summary (4c742fc)
memory: Restructure Claude memory + propagate gen-count fix + add closeout step (aca5f0d)

Features

generalization: Implement static_holdout_f1 problem + generalization prompts (1dbb05c)

Refactoring

tests: Move round2/round3 tests to semantic locations (3b3117e)

Testing

integration: 21 new integration tests — DAG ordering + engine edge cases (1bb5235)
round3: Regression tests for Bug A and B fixes (f778ad7)
security: Fix safe_mode bypass + add regression tests from audit (ca2d4cf)

v1.22.1 (2026-03-14)

Bug Fixes

results_report: Remove stray ESC character (U+001B) (e2b01fa)
status: Use run_state Redis key for generation count (e7648ab)

Chores

gemini_mutation: Pre-merge cleanup — environment freeze + PR description (a36f29d)

Documentation

hotpotqa: Add LaTeX results report for paper (02adea3)
hotpotqa: Make results_report.tex self-contained compilable document (bf0ff3f)

v1.22.0 (2026-03-13)

Bug Fixes

resume: Make redis.resume produce a contiguous run (07091fb)

Features

gemini_mutation: Pre-register experiment — Gemini-3-Flash as mutation LLM (0f42851)

v1.21.0 (2026-03-12)

Bug Fixes

build_colbert_index: Cap num_partitions=32768, kmeans_niters=4 for tractable CPU k-means (1b605b3)
colbert: Replace faiss GPU k-means with PyTorch batched k-means (765e8aa)
colbert: Simplify build script — patch applied directly to colbert source (f86d3a4)
colbert_feedback: Export HOTPOTQA_COLBERT_SERVER_URL in run_test_eval.sh (ac1e423)

Chores

Fill pre-registration commit hash and PR number in 03_plan.md (454f817)

Documentation

colbert_feedback: Amendment 5 — gap investigation results (efdf087)
colbert_feedback: Record index build completion in 03_plan.md (a8f4e9a)

Features

chains/hotpotqa: Add ColBERT+rich-feedback experiment (colbert_feedback, Phase 3) (20c8314)
colbert_feedback: Add ColBERT search server + update launch/validate/plan (d7683d4)
colbert_feedback: Watchdog + benchmark server-url support (1272c1e)

v1.20.0 (2026-03-09)

Features

chains: Hotpotqa: add Retriever class and colbertv2 retriever (b681195)

v1.19.0 (2026-03-09)

Chores

Add cold_start entry to INDEX.md + create experiment branch (b0b47af)
Fill PIDs into run_status.sh — T1=3812756 T2=3812757 T3=3812758 T4=3812759 watchdog=3813084 (cef1b24)
Launch.sh, run_watchdog.py, run_status.sh for cold_start experiment (53bbb1f)

Features

Add baseline initial_programs to static_f1_600 for cold-start support (f5adb9f)

v1.18.3 (2026-03-08)

v1.18.2 (2026-03-08)

Bug Fixes

Watchdog gen count — use log file instead of Redis s field (1d229e4)

Chores

Backfill pre-reg commit hash + add crossover entry to INDEX.md (2733e86)
Fill PIDs into run_status.sh — P=3660148 Q=3660149 R=3660150 S=3660151 watchdog=3660461 (d52fe31)
Launch.sh, run_status.sh, run_watchdog.py for crossover experiment (e2c3629)

v1.18.1 (2026-03-07)

Bug Fixes

12 infra correctness fixes from codebase audit (ef89ba9)
Check_experiment_complete.sh SIGPIPE bug + environment_freeze.txt (815b47e)
Extend prompts_dir to all pipeline YAMLs + docstring accuracy pass (939ec7d)
Gen10_test_eval.py val_em gap correct for F1 runs (9c9e65f)
Move analyze_test_results.py to push experiment tools dir (e6367be)
Pin push run_test_eval.sh sha256 in 03_plan.md (was val_gap hash) (7763287)
Propagate known bugs to templates and docs to prevent recurrence (b545282)
Raise chain LLM HTTP timeout 120s→600s + hard reset all runs (Amendment 4) (c0186a8)
Remove stale failures[:10] cap from docstrings and pipeline comments (ac3c07d)
Tighten APPROVED grep + correct agent memories for Phase 5 readiness (3ee5854)
Update PR_DESCRIPTION.md template — val EM → val fitness (e99f891)
Watchdog PROJ path (3→4 parents) + stale Run D config (a3dae8f)

Chores

Add run_status.sh template for push experiment monitoring (6a1e86c)
Infra improvements while runs are live (466db87)
Launch.sh for push experiment + CONTEXT.md updates (c2da4a7)
Pre-fill 05_results.md skeleton + analysis script + INDEX.md entry (a8e288b)
Replace Run D EM+NLP+600 → F1+NLP+600 (Amendment 3) (58ec1fa)
Update INDEX.md and CONTEXT.md naming consistency (05bce0a)
Watchdog + run_status.sh for push experiment (81807ea)

Documentation

Hotpotqa_asi.yaml is required for ALL hotpotqa variants, not just static_a/ra (856125c)
Update all experiments// → experiments/// (1b67951)

v1.18.0 (2026-03-06)

Bug Fixes

Wire stage_timeout through DefaultPipelineBuilder + validation speedup (55772ba)

Chores

Update agent memories (push experiment + path fixes) (6f50092)

Documentation

Fix INDEX.md — hotpotqa_thinking test EM ~60% not 62.3% (e8407d5)
Set pre-registration commit hash in 03_plan.md (push experiment) (f47847f)
Update INDEX.md — drop pre-protocol exps, close out nlp_prompts + val_gap (7ed9ba9)

Features

Pre-registration 03_plan.md + static_f1_600 problem directory (push experiment) (9173c10)

v1.17.0 (2026-03-06)

Bug Fixes

Amendment 1 review fixes — F1 objective, EM=0 criterion, rationale, Gate E (866f106)
Distinguish timeouts from generic failures in stage logs and status monitoring (2228dd8)
Launch.sh preflight loops use CHAIN_URL_F (not removed CHAIN_URL_P) (bb79e6f)
Replace dry_run=true with --cfg job in launch.sh; update CLAUDE.md (13e968a)
Status.py gen count bug + add Redis schema doc + run_status.sh (b85e388)

Chores

Record Amendment 1 commit hash in 03_plan.md (866f106) (2f64837)
Update PIDs in run_watchdog.py — launch 2026-03-05 12:21 UTC (acec7c1)

Documentation

Fill pre-registration commit hash in 03_plan.md (77f3ef6)
Move task_hotpotqa.md → experiments/hotpotqa/CONTEXT.md + CLAUDE.md lookup table (e920581)
Split task-specific content out of CLAUDE.md into task_hotpotqa.md (91410b3)
Update 04_launch.md for dry_run removal and crontab unavailability (7757d6d)

Features

Add static_600 and static_r600 problem directories for val_gap experiment (6dfa6d9)
Amendment 1 — replace Run P with Run F (fixed-300, F1 fitness) (5c4370b)
Gap_analysis.py + lineage.py + eval_checkpoint.py + README onboarding fixes (45faa45)
Launch script and watchdog for hotpotqa_val_gap experiment (fa6a14d)

Refactoring

Nest hotpotqa experiments under experiments/hotpotqa/ project dir (25652c0)

v1.16.2 (2026-03-05)

Bug Fixes

Shutdown worker pool before event loop closes on Ctrl+C (6bbbd1e)

v1.16.1 (2026-03-05)

Bug Fixes

Remove hardcoded /home/jovyan paths from shared scripts (407f8d3)
Replace hardcoded gh path with command -v gh in tools (d30bf8a)

v1.16.0 (2026-03-05)

v1.15.1 (2026-03-04)

Bug Fixes

Correct @package directive in prompts/default.yaml (40954d0)

v1.15.0 (2026-03-02)

Bug Fixes

chains: Address reviewer fixes (#68, 4c84ba5)

Chores

Update coverage badge to 86% [skip ci] (e2c0813)
Update coverage badge to 87% [skip ci] (cbe7155)
Update coverage badge to 87% [skip ci] (1dd894d)

Documentation

Fix changelog link — point README to root CHANGELOG.md (43964dc)
Update README test structure and coverage badge to 85% (ba5ad09)

Features

chains: Speed-up chain_runner, add aime,hotpotqa_full,hotpotqa_qa,hover,ifbench,papillon chain problems. (#68, 4c84ba5)
chains: Speed-up chain_runner, add new chains problems (#68, 4c84ba5)

Refactoring

Rename test files from _adversarial/_extended to _edge_cases (60dc53b)

Testing

Comprehensive test coverage expansion with audit hardening (5fb12ac)
Deep audit hardening with 207 new mutation-analysis tests (c101646)

v1.14.2 (2026-02-25)

Bug Fixes

prompts: Download bug (9e6da70)
prompts: Fix broken import (0c7e63f)

Chores

Update coverage badge to 78% [skip ci] (82b4f00)

Testing

Add extended test suites for coverage-gap modules (c2bf999)

v1.14.1 (2026-02-25)

Bug Fixes

prompts: Remove single-step exp.; add full chains evolution (#63, 8a9ec44)
prompts: Removed wrong directories (#63, 8a9ec44)

v1.14.0 (2026-02-25)

Bug Fixes

Timeout polish for optuna stage (b9d914a)

Features

Add time-budget deadline to Optuna trial loop (6c98665)

v1.13.0 (2026-02-25)

Bug Fixes

ci: Sync release job with latest origin/main before semantic-release (34efe54)

Chores

Update coverage badge to 77% [skip ci] (8824566)

Features

Filter optimization stage errors from mutation/LLM prompts (6107559)
ci: Add self-updating coverage badge to README (61a9ef1)

v1.12.0 (2026-02-24)

Bug Fixes

Add cwd to exec runner (003eddb)
Add metrics storage in redis (29d4bb7)
Add missing file (5b0e661)
Bug fix for zero fitnesses (e30d4b7)
Close subprocess transports to prevent "Event loop is closed" warnings (075cd4c)
Cloudpickle 'register_pickle_by_value' for correct root imports handling (9b70499)
Cma deps (e3cc526)
Comprehensive wizard (51eb90a)
Fix bug in caching behavior (362dcb6)
Fix bug in dag cache handling logic (61e98ad)
Fix faulty caching for programs with optional input (ca55919)
Fix missing traceback from lineage (28eeff4)
Fixed exec runner to handle project directory (568b51d)
Grammar errors (a0b779b)
Logging (7597117)
Minor boundary fixes (a0dca3f)
Minor optuna polish and done (87e02f0)
Move exec runner; speed up python execution via worker pool (6bce277)
Optuna stage patching (8bfba7f)
Optuna stage polish (7c628e7)
Pickle to cloudpickle (7c47ac5)
Remove indices from constants, fix fitness descriptions (c5e2649)
Remove unnecessary wizard configs (997f094)
Replace deprecated class Config with model_config = ConfigDict() (10a8fa1)
Restore Optuna prompt constraints and remove reasoning max_length (98771fa)
Undo default endpoint (4b4adc1)
Update three alphaevolve problems (#51, aaaea70)
Windows compatibility (b96caa2)
ci: Fix semantic-release not updating CHANGELOG (a7da06a)
ci: Remove orphaned v1.12.0 tag to unblock semantic-release (f60e8b6)
ci: Use startsWith instead of contains for release skip filter (bfb0e2a)
prompt: Remove .nltk artifacts, add dependencies, upd. .gitignore (53cc178)
prompt: Remove debug lines (203ba94)

Chores

Add santa challenge problem directory (477a205)
Modify santa challenge problem directory (6ed6881)
Polish comparison scripts (b6e9e89)
Refactor optuna stage (fd1ca3a)
Santa2025 problem for n=100 (ef864fc)
Slightly polish code (86fd5a8)

Code Style

Clean up stages — loguru placeholders, builtin generics, type annotations, constants (72a447a)

Documentation

Add Testing section to README (245464c)
Update README test section with current structure and run instructions (da5cd54)

Features

1. add new caching system (based on change in the inputs 2) structured output for mutation operator 3) slightly polish insights (dfb7a73)
Add artifact from validation support (91a4402)
Add cma-es parameters tuning stage (76a32a1)
Add first half of missing problems (4fca319)
Add global stats to context (57cee54)
Add missing alphaevolve problems (#46, f8a15e1)
Add optuna optimization stage (8da53df)
Add Optuna payload routing and bypass for direct optimization output (8e37b0c)
Add second half of alphaevolve problems (574d555)
Add token counters to metrics (63323c3)
Boltzmann/weighted elite selectors, Optuna int preservation, profiler (1011e38)
Dynamic space, more ram stability (bb3bd4b)
Normalize fitness to [0,1] in FitnessProportionalEliteSelector, fix greedy collapse (90589b4)
Polish storage code with claude (6826922)
Removed bad problems, fixed first half of valid ones (063858b)
Small efficiency improvements (f71f9d8)
Unconstrained insights categories (a4ba5fe)
comparison: Improve style and polish (4375952)
prompt: Add gsm8k, aime, ifbench, pupa, and hotpotqa problems (44ecb07)
prompts: Add shared functionality; add aime & jigsaw problems (7dfdce8)
prompts: Refactor single-prompt evolution, added chains (utils+hotpotqa) (#62, d63343d)

Performance Improvements

Pre-compute DAG inputs and improve stage resilience (e6e3566)
Reduce Redis round-trips and eliminate deep copies in hot paths (2c556e8)

Refactoring

Refactor wizard specs to follow pydantic (151f120)
Rewrite uncertainty_inequality (3f88ed0)
Simplify program state machine — remove EVOLVING, rename states (70b9b34)

Testing

Add comprehensive coverage tests for 12 modules (3086 lines) (4546525)
Add comprehensive test suite (1132 tests) and reorganize into subdirectories (8d47344)
Add comprehensive test suite for core modules (7628e91)
Fix flaky ScalarTournamentEliteSelector tests (5155932)

v1.11.1 (2025-11-18)

Bug Fixes

Set flush_at and flush_interval via client instead of constructor (887232d)

Refactoring

Optimize Langfuse integration (6876a9d)
Pass flush_at and flush_interval to CallbackHandler constructor (fc6fbd2)
Remove redundant try-except for CallbackHandler initialization (72884b5)
Remove unused flush_traces method (3b5ccb3)

v1.11.0 (2025-11-18)

Chores

Add terminal gif (4286fd2)
Add terminal gif (398056b)
Add terminal gif (ed4aba1)
Fix license (5592e05)
Fix license (1394967)
Remove emoji (7c4e0f5)

Features

Better stage scheduling (d2e35b0)

v1.10.0 (2025-11-17)

v1.9.1 (2025-11-17)

v1.9.0 (2025-11-15)

Chores

Removed legacy fields, upd. FunctionSignature note (f7c832e)
Removed wizard example problem (41f2c77)

Documentation

Updated wizard documentation (a6d115e)

Features

Add problem scaffolding wizard (e12e566)

Refactoring

Moved wizard configs, made wizard a module (e2af84b)
Updated wizard code functionality (43d0ab6)

v1.8.1 (2025-11-14)

v1.8.0 (2025-11-14)

Features

Remove memory leaks and small fixes (2746ea6)

v1.7.1 (2025-11-12)

Bug Fixes

Small polish (e9abd09)

v1.7.0 (2025-11-12)

Features

Handling of langfuse errors (50adbc3)
Langfuse_tracing_less_comments (5910f96)
Simplifying_langfuse_tracing (c746de7)
Update README.md to work with langfuse (765e0e7)

v1.6.1 (2025-11-12)

Bug Fixes

Follow-up on removing task-dependent text (a72968b)
Remove task-dependent text from evolution prompts (1c6acf3)

v1.6.0 (2025-11-11)

v1.5.2 (2025-11-11)

Bug Fixes

Small fix problem name (6663ea4)

Features

More docs and stability for redis (dbcb856)

v1.5.1 (2025-11-11)

Bug Fixes

Small fix problem name (44ea85f)

v1.5.0 (2025-11-11)

Features

Better config structure, examples, and polish (643331e)

v1.4.0 (2025-11-07)

Features

Wandb support; improve pythonpath passthrough (e408cec)

v1.3.0 (2025-11-06)

Bug Fixes

Remove now unused runner (a18749b)
Remove now unused runner (0311c27)
Unify log dir (3097d4d)

Features

1. add metrics logging with tensorboard 2) fix execution ordering in evolution engine 3) fix island api 4) add proper cancelation handling for async method (a0e7d8e)

v1.2.0 (2025-11-06)

Chores

prompts: Removed unused prompt constants, moved task hints to description (b6db207)

Features

prompts: Centralize mutation prompts and remove task hints (6bd971f)

Refactoring

prompts: Add task-independent mutation prompts (e56dd49)

v1.1.0 (2025-10-31)

Features

Changed pickle serialization to cloudpickle for classes and lambdas over network (6524340)

v1.0.3 (2025-10-31)

v1.0.2 (2025-10-31)

Bug Fixes

Better error handling and logging in dag (5973d7b)

v1.0.1 (2025-10-31)

Bug Fixes

Add missing dep (acecda2)
Minor fixes to simplify hydra and fix prompt for insights (60aa776)

Chores

deps: Move hydra dependencies to main requirements (6e574d9)

Refactoring

Migrate MetaEvolve to GigaEvo (fc8c2ca)

v1.0.0 (2025-10-31)

v0.9.0 (2025-09-26)

v0.8.0 (2025-09-22)

v0.7.0 (2025-09-21)

Chores

release: V0.7.0 (5f102b4)

v0.6.0 (2025-09-20)

Initial Release

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

v1.28.0 (2026-04-03)

Bug Fixes

Code Style

Refactoring

Testing

v1.27.0 (2026-04-02)

Bug Fixes

Features

Refactoring

v1.26.0 (2026-04-02)

Features

Refactoring

v1.25.0 (2026-04-02)

Bug Fixes

Chores

Features

Refactoring

Testing

v1.24.1 (2026-04-01)

Bug Fixes

Chores

Documentation

Refactoring

v1.24.0 (2026-04-01)

Highlights

New Features

Bug Fixes

Experiments

Repository Cleanup

Documentation

Testing

v1.23.0 (2026-03-15)

Bug Fixes

Chores

Documentation

Features

Refactoring

Testing

v1.22.1 (2026-03-14)

Bug Fixes

Chores

Documentation

v1.22.0 (2026-03-13)

Bug Fixes

Features

v1.21.0 (2026-03-12)

Bug Fixes

Chores

Documentation

Features

v1.20.0 (2026-03-09)

Features

v1.19.0 (2026-03-09)

Chores

Features

v1.18.3 (2026-03-08)

v1.18.2 (2026-03-08)

Bug Fixes

Chores

v1.18.1 (2026-03-07)

Bug Fixes

Chores

Documentation

v1.18.0 (2026-03-06)

Bug Fixes

Chores

Documentation

Features

v1.17.0 (2026-03-06)

Bug Fixes

Chores

Documentation

Features