Skip to content

Latest commit

 

History

History
1479 lines (897 loc) · 65.3 KB

File metadata and controls

1479 lines (897 loc) · 65.3 KB

CHANGELOG

v1.28.0 (2026-04-03)

Bug Fixes

  • Eliminate ~76k test warnings (0 remaining) (df637e6)

  • Flatten ideas_tracker aliases (list[dict]) to MemoryCard (list[str]) (f6e620d)

  • Lint + format pre-existing errors in experiment files (9241296)

  • Lint errors in ablation_v3_no_deep.py, update prereg_commit (a118bbf)

  • MemoryCard.aliases type list[str] → list[Any] (a31443f)

Code Style

Refactoring

  • Remove all 27 type: ignore comments from codebase (91a175f)

  • Rename memory test files to describe what they test (68828e2)

Testing

  • Integration test for ideas_tracker dict aliases (Bug #2, PR #161) (ff54673)

v1.27.0 (2026-04-02)

Bug Fixes

  • Format card_conversion.py (228b8f3)

  • Lint import sorting in A_mem + GAM_root (pre-existing) (5e3baa3)

  • memory: Address chaos-hacker findings on public API (91aec06)

  • memory: Correct concept_to_card return type annotation (78327d0)

Features

  • Add gigaevo.memory public API exports (7790b82)

Refactoring

  • Replace 50 print() with loguru in A_mem + GAM_root (59853df)

  • memory: Add future annotations, reduce hasattr/getattr usage (ae4e403)

  • memory: Consolidate 20 test files into tests/memory/ (e6f8480)

v1.26.0 (2026-04-02)

Features

  • Dict → Pydantic migration complete — normalize_memory_card returns AnyCard (f2ea951)

Refactoring

  • Normalize_memory_card returns AnyCard (Pydantic models) (5926631)

  • Replace print() with loguru, remove sys.path hacks in ideas_tracker (8663644)

v1.25.0 (2026-04-02)

Bug Fixes

  • Add break condition for processing when no new ideas are present (a6a3a18)

  • Add break condition for processing when no new ideas are present (0e20d87)

  • Changed cooccurrence threshold agressive scaling to fixed minimum (5c5c29b)

  • Changed cooccurrence threshold agressive scaling to fixed minimum (c8a7e5b)

  • Circular import in logger (55e3b1f)

  • Circular import in logger (522d28e)

  • Clean up memory PR merge — lint, format, junk dirs, broken imports (21035ab)

  • Correct serialization of dict and lists in pd columns (f7a6bbe)

  • Correct serialization of dict and lists in pd columns (2dd2eba)

  • Dead retry in _decide_card_action — parse_llm_card_decision returns None for garbage (80649dd)

  • Eliminate RuntimeWarning in generate_mutations tests (ba4195d)

  • Handle parent_ids as string in ideas_tracker (7601ed9)

  • Handle parent_ids as string in ideas_tracker (3148878)

  • IncomingIdeas update logic fix (b9a1781)

  • IncomingIdeas update logic fix (2947dda)

  • Lint and format errors for CI (ruff check + ruff format) (b884547)

  • Phase 1 — 3 confirmed bugs fixed in memory system (5e9addd)

  • Remove debug print (bf4a981)

  • Remove debug print (cda32cb)

  • Remove short id separate storage and generation (d0fd1a6)

  • Remove short id separate storage and generation (d1fdde8)

  • Restore RedisRunConfig + fetch_evolution_dataframe re-export in tools/utils.py (30bc8ea)

  • Wrong key name fix (66cab68)

  • Wrong key name fix (9cc9912)

Chores

Features

  • Add best idea extraction based on top_k selection by fitness and delta fitness (04ab7c6)

  • Add best idea extraction based on top_k selection by fitness and delta fitness (995929e)

  • Add changes extraction to mutation agent (cd924c7)

  • Add changes extraction to mutation agent (ad342ca)

  • Add extended record card dataclass (8aeab4e)

  • Add extended record card dataclass (2d80776)

  • Add idea description rewriting logic (0201eb3)

  • Add idea description rewriting logic (ea8ddd3)

  • Add idea origin analysis script and minor refactor ideas_tracker.py (ab184b8)

  • Add idea origin analysis script and minor refactor ideas_tracker.py (5a38c1d)

  • Add idea tracker (e9e911d)

  • Add idea tracker (eb9701b)

  • Add logging for idea tracker (21b6f58)

  • Add logging for idea tracker (0563796)

  • Add ProgramCard, ConnectedIdea, AnyCard Pydantic models (edbdd1e)

  • Add update logic for extended record card (0bb5038)

  • Add update logic for extended record card (8529a30)

  • Csv loading to IdeaTracker (3cc0605)

  • Csv loading to IdeaTracker (5b610ca)

  • Experimental ml pipeline for impact estimation based on linear regression feature weights (ffaac97)

  • Experimental ml pipeline for impact estimation based on linear regression feature weights (bda2e79)

  • Implement idea enrichment with LLM-generated keywords and summaries (1d4f350)

  • Implement idea enrichment with LLM-generated keywords and summaries (102cb74)

  • Support for extended record card (a7cb492)

  • Support for extended record card (0549fe7)

  • Task description loading (bbfc57a)

  • Task description loading (2c3ff68)

  • Update main logic to work with extended record card (b1ed1ac)

  • Update main logic to work with extended record card (a98d6b1)

Refactoring

  • Add Protocol types, fix mypy errors, CardDict alias (2f09116)

  • Extract _note_fields_changed, remove stale comments and blank lines (c3fbede)

  • Extract card_conversion.py + utils.py, add MemoryNoteProtocol typing (406f2cd)

  • Extract DEFAULT_MODEL_NAME constant, remove ad-hoc string (0f67bc1)

  • Extract more pure functions + memory_write_config.py (7e1bdbe)

  • Phase 2 — import cleanup, context manager, remove del (dd31c4f)

  • Phase 3 — extract _ConceptApiClient + utilities (8dacf30)

  • Record card extended minor refactor (bbda084)

  • Record card extended minor refactor (b5eac06)

  • Remove debug code (5a9f8ec)

  • Remove debug code (117a325)

  • Rename test files and classes to professional naming (6ae0134)

  • Replace all print() with loguru logger across memory module (5091f37)

  • Replace ML impact pipeline with origin analysis computation and improve docstring clarity (bcb2ff7)

  • Replace ML impact pipeline with origin analysis computation and improve docstring clarity (3f57d5f)

Testing

  • Chaos-hacker bug exposure tests (16 tests) (1e62411)

  • Cycle 10 (final) — API search, LLM synthesis, close() (21 tests) (2d63100)

  • Cycle 11 — fake agentic memory infrastructure + 24 tests (72c49e5)

  • Cycle 12 — fake Chroma/GAM + full dedup pipeline (15 tests) (d94a674)

  • Cycle 13 — 7 realistic E2E scenarios + 2 unpatched real-memory tests (282f7f6)

  • Cycle 2 — API client, dedup decision, truncate (28 tests) (77aa25d)

  • Cycle 3 — deeper AmemGamMemory internals (21 tests) (543b2e2)

  • Cycle 4 — integration tests + chaos-hacker regression fixes (25 tests) (4b92198)

  • Cycle 5 — mutation operator memory flow, sync_from_api, API body checks (17 tests) (fcbc352)

  • Cycle 6 — 8 e2e scenarios + data_components (64 new tests) (1040f02)

  • Cycle 7 — contract tests + engine interaction (34 new tests) (616f88c)

  • Cycle 8 — full-loop evolution with memory (11 E2E tests) (d7ef94d)

  • Cycle 9 — LLMMutationOperator real constructor + memory (14 tests) (9035068)

  • P0 exhaustive tests for memory module core (211 tests) (572c28a)

  • P1 dedup edge cases + OpenAI inference tests (100 tests) (9945c0e)

  • P2 memory_write_example edge cases (22 tests) (70375bb)

v1.24.1 (2026-04-01)

Bug Fixes

  • Remove last 4 dead .claude/rules/ references from CLAUDE.md (9caf0c3)

Chores

  • Remove GitNexus from CLAUDE.md, skills, and gitignore (b9f90ed)

Documentation

  • Add Quick Start sections with runnable commands to all feature docs (4d9a809)

Refactoring

  • Rename scheduling/lpt_ridge → lpt_chain, clarify scope (10cf394)

v1.24.0 (2026-04-01)

Highlights

This release focuses on performance infrastructure, experiment tooling maturity, and repository hygiene. Two experiments were completed (hover/steady-state-v2: POSITIVE, hover/map-elites-topology: NULL), and the framework gained production-grade load balancing, scheduling, and monitoring.

New Features

  • Steady-state evolution engine — continuous mutation/evaluation interleaving that eliminates the generational barrier. Two async loops (producer + consumer) with backpressure via asyncio.Semaphore(max_in_flight). Opt-in: evolution=steady_state. Expected throughput: ~8-9x improvement over step-wise generations.

  • LPT scheduling for DAG evaluation (#136) — longest-processing-time-first scheduling assigns expensive programs to evaluation slots first, reducing tail latency. Discrete-event simulation benchmarks in tools/benchmarks/.

  • LLM load balancer (llm=balanced) — Redis-coordinated endpoint pool with least-connections routing. Mutation servers shared across all runs via Redis DB 15. Replaces manual llm_base_url per-run configuration.

  • LiteLLM proxy integrationbash tools/litellm.sh auto-generates config from experiments/infrastructure.yaml and starts a LiteLLM proxy for chain server load balancing. All chain requests route through INTERNAL_IP:4000.

  • Chain feature extractionChainFeatureExtractor computes structural behavior coordinates (DAG depth, retrieval count, step count) from real chain programs for MAP-Elites behavioral characterization.

  • Experiment diagnostics/experiment-diagnose skill: automated failure analysis for running experiments. Checks Redis health, PID liveness, log errors, and Hydra config overrides.

  • Experiment restart/experiment-restart skill: kill all processes, flush Redis, and re-launch cleanly.

  • Throughput monitoringtools/throughput_plot.py and 6-panel dashboard in watchdog: mutation rate, eval throughput, fitness distributions, validity panels. Posted hourly to experiment PRs.

  • Fitness vs wall-clock timetools/fitness_vs_time.py plots fitness trajectories against real time instead of generation number.

  • Prompt co-evolution — user prompt co-evolution alongside system prompts (prompt_fetcher=coevolved).

Bug Fixes

  • 120s read timeout killed 96% of chain evaluations — removed read timeout (timeout=None, keep connect=30s) to allow long-running chains under load.

  • CancelledError orphansexcept Exception didn't catch BaseException in steady-state engine, leaving programs persisted but IDs lost. Fixed with persisted_id sentinel + except BaseException.

  • Mutation LLM double-escaping — LLMs using with_structured_output() sometimes double-escape quotes in code fields. Fixed by _fix_double_escaped_quotes() in mutation agent.

  • Frontier metric recomputation — when NO_CACHE stages re-evaluate programs, frontier is now recomputed correctly using clear_series() + full rewrite instead of appending stale values.

  • TOCTOU races in SteadyStateEngine — scoped drain + TOCTOU-safe ingest_batch, add_elite with optimistic locking and WatchError retry.

  • Ghost program detection — mirrors parent engine's _await_idle() logic to clean up orphaned program IDs.

  • Proxy bypass — added mutation server IPs to NO_PROXY to prevent Squid proxy from blocking LLM calls.

Experiments

Experiment Result PR
hover/steady-state-v2 POSITIVE — continuous interleaving improves throughput #138
hover/map-elites-topology NULL — 3D structural BC (dag_depth, n_deep_retrieval, n_steps) did not improve fitness #142

Repository Cleanup

  • Removed leaked vartodd/circuit_evolve code — problems, configs, custom/, gf2lib/, npy/, launch scripts (12,800+ lines deleted)
  • Removed experiment runtime artifacts — PNGs, pids.txt, cfg_run_*.txt from all completed experiments
  • Consolidated tools hierarchy — experiment-specific scripts (archive, preflight, protocol gates) now live in tools/experiment/; general tools in tools/
  • Removed all hardcoded paths — skills, agents, tools, and docs now use $PROJ (git root) and $GIGAEVO_PYTHON (env var) instead of /workspace-SR008.fs2/... or /home/jovyan/...
  • Fixed .gitignore contradictions.claude/ and CLAUDE.md were tracked but gitignored
  • Cleaned root directory — moved benchmarks/tools/benchmarks/, demos/docs/demos/
  • Rewrote Redis data model docs — complete key namespace reference with all metric tags, archive persistence, iteration vs generation glossary

Documentation

  • CLAUDE.md — added tools index, skills table (12 skills), agents table (9 agents), @tools/README.md include for Redis data model
  • tools/README.md — structured tool index with categories (general, experiment lifecycle, infrastructure, benchmarking, scaffolding), accurate Redis appendix
  • Removed dead references.claude/rules/*.md files that never existed on main, redirect stub docs/redis_schema.md

Testing

  • 56+ new tests: race conditions, streaming, failure modes, mutation-killing, TOCTOU guards, NaN handling
  • Removed deprecated test classes (TestSafetyMechanismBreakage, TestEngineGenerationTimeout)
  • Full suite: ~3500 tests, all passing

v1.23.0 (2026-03-15)

Bug Fixes

  • bugs: Round-2 — migration KeyError on None island + DAG empty-nodes crash (54810b0)

  • bugs: Round-4 — 5 junior-researcher attack surface bugs (073cc33)

  • bugs: Round-5 — H1 sentinel bypass + TOCTOU dag_runner + H2-H4 guard tests (d039cf1)

  • tests: Update test_evolution_engine.py for get_all_by_status migration (a33604b)

Chores

  • generalization: Add launch script and run_status.sh (61e76af)

  • generalization: Add launch script and run_status.sh (96bbb42)

  • generalization: Add test eval script, PR description, gitignore indexes (6405a22)

  • generalization: Backfill pre-registration commit hash in 03_plan.md (3bd7fea)

  • generalization: Gen-1 smoke check — all 4 runs alive, split bias OK (17b25db)

  • generalization: Record binding prompt review sign-off in 03_plan.md (6a0d2e6)

Documentation

  • memory: Chaos-hacker round-5 findings summary (4c742fc)

  • memory: Restructure Claude memory + propagate gen-count fix + add closeout step (aca5f0d)

Features

  • generalization: Implement static_holdout_f1 problem + generalization prompts (1dbb05c)

Refactoring

  • tests: Move round2/round3 tests to semantic locations (3b3117e)

Testing

  • integration: 21 new integration tests — DAG ordering + engine edge cases (1bb5235)

  • round3: Regression tests for Bug A and B fixes (f778ad7)

  • security: Fix safe_mode bypass + add regression tests from audit (ca2d4cf)

v1.22.1 (2026-03-14)

Bug Fixes

  • results_report: Remove stray ESC character (U+001B) (e2b01fa)

  • status: Use run_state Redis key for generation count (e7648ab)

Chores

  • gemini_mutation: Pre-merge cleanup — environment freeze + PR description (a36f29d)

Documentation

  • hotpotqa: Add LaTeX results report for paper (02adea3)

  • hotpotqa: Make results_report.tex self-contained compilable document (bf0ff3f)

v1.22.0 (2026-03-13)

Bug Fixes

  • resume: Make redis.resume produce a contiguous run (07091fb)

Features

  • gemini_mutation: Pre-register experiment — Gemini-3-Flash as mutation LLM (0f42851)

v1.21.0 (2026-03-12)

Bug Fixes

  • build_colbert_index: Cap num_partitions=32768, kmeans_niters=4 for tractable CPU k-means (1b605b3)

  • colbert: Replace faiss GPU k-means with PyTorch batched k-means (765e8aa)

  • colbert: Simplify build script — patch applied directly to colbert source (f86d3a4)

  • colbert_feedback: Export HOTPOTQA_COLBERT_SERVER_URL in run_test_eval.sh (ac1e423)

Chores

  • Fill pre-registration commit hash and PR number in 03_plan.md (454f817)

Documentation

  • colbert_feedback: Amendment 5 — gap investigation results (efdf087)

  • colbert_feedback: Record index build completion in 03_plan.md (a8f4e9a)

Features

  • chains/hotpotqa: Add ColBERT+rich-feedback experiment (colbert_feedback, Phase 3) (20c8314)

  • colbert_feedback: Add ColBERT search server + update launch/validate/plan (d7683d4)

  • colbert_feedback: Watchdog + benchmark server-url support (1272c1e)

v1.20.0 (2026-03-09)

Features

  • chains: Hotpotqa: add Retriever class and colbertv2 retriever (b681195)

v1.19.0 (2026-03-09)

Chores

  • Add cold_start entry to INDEX.md + create experiment branch (b0b47af)

  • Fill PIDs into run_status.sh — T1=3812756 T2=3812757 T3=3812758 T4=3812759 watchdog=3813084 (cef1b24)

  • Launch.sh, run_watchdog.py, run_status.sh for cold_start experiment (53bbb1f)

Features

  • Add baseline initial_programs to static_f1_600 for cold-start support (f5adb9f)

v1.18.3 (2026-03-08)

v1.18.2 (2026-03-08)

Bug Fixes

  • Watchdog gen count — use log file instead of Redis s field (1d229e4)

Chores

  • Backfill pre-reg commit hash + add crossover entry to INDEX.md (2733e86)

  • Fill PIDs into run_status.sh — P=3660148 Q=3660149 R=3660150 S=3660151 watchdog=3660461 (d52fe31)

  • Launch.sh, run_status.sh, run_watchdog.py for crossover experiment (e2c3629)

v1.18.1 (2026-03-07)

Bug Fixes

  • 12 infra correctness fixes from codebase audit (ef89ba9)

  • Check_experiment_complete.sh SIGPIPE bug + environment_freeze.txt (815b47e)

  • Extend prompts_dir to all pipeline YAMLs + docstring accuracy pass (939ec7d)

  • Gen10_test_eval.py val_em gap correct for F1 runs (9c9e65f)

  • Move analyze_test_results.py to push experiment tools dir (e6367be)

  • Pin push run_test_eval.sh sha256 in 03_plan.md (was val_gap hash) (7763287)

  • Propagate known bugs to templates and docs to prevent recurrence (b545282)

  • Raise chain LLM HTTP timeout 120s→600s + hard reset all runs (Amendment 4) (c0186a8)

  • Remove stale failures[:10] cap from docstrings and pipeline comments (ac3c07d)

  • Tighten APPROVED grep + correct agent memories for Phase 5 readiness (3ee5854)

  • Update PR_DESCRIPTION.md template — val EM → val fitness (e99f891)

  • Watchdog PROJ path (3→4 parents) + stale Run D config (a3dae8f)

Chores

  • Add run_status.sh template for push experiment monitoring (6a1e86c)

  • Infra improvements while runs are live (466db87)

  • Launch.sh for push experiment + CONTEXT.md updates (c2da4a7)

  • Pre-fill 05_results.md skeleton + analysis script + INDEX.md entry (a8e288b)

  • Replace Run D EM+NLP+600 → F1+NLP+600 (Amendment 3) (58ec1fa)

  • Update INDEX.md and CONTEXT.md naming consistency (05bce0a)

  • Watchdog + run_status.sh for push experiment (81807ea)

Documentation

  • Hotpotqa_asi.yaml is required for ALL hotpotqa variants, not just static_a/ra (856125c)

  • Update all experiments// → experiments/// (1b67951)

v1.18.0 (2026-03-06)

Bug Fixes

  • Wire stage_timeout through DefaultPipelineBuilder + validation speedup (55772ba)

Chores

  • Update agent memories (push experiment + path fixes) (6f50092)

Documentation

  • Fix INDEX.md — hotpotqa_thinking test EM ~60% not 62.3% (e8407d5)

  • Set pre-registration commit hash in 03_plan.md (push experiment) (f47847f)

  • Update INDEX.md — drop pre-protocol exps, close out nlp_prompts + val_gap (7ed9ba9)

Features

  • Pre-registration 03_plan.md + static_f1_600 problem directory (push experiment) (9173c10)

v1.17.0 (2026-03-06)

Bug Fixes

  • Amendment 1 review fixes — F1 objective, EM=0 criterion, rationale, Gate E (866f106)

  • Distinguish timeouts from generic failures in stage logs and status monitoring (2228dd8)

  • Launch.sh preflight loops use CHAIN_URL_F (not removed CHAIN_URL_P) (bb79e6f)

  • Replace dry_run=true with --cfg job in launch.sh; update CLAUDE.md (13e968a)

  • Status.py gen count bug + add Redis schema doc + run_status.sh (b85e388)

Chores

  • Record Amendment 1 commit hash in 03_plan.md (866f106) (2f64837)

  • Update PIDs in run_watchdog.py — launch 2026-03-05 12:21 UTC (acec7c1)

Documentation

  • Fill pre-registration commit hash in 03_plan.md (77f3ef6)

  • Move task_hotpotqa.md → experiments/hotpotqa/CONTEXT.md + CLAUDE.md lookup table (e920581)

  • Split task-specific content out of CLAUDE.md into task_hotpotqa.md (91410b3)

  • Update 04_launch.md for dry_run removal and crontab unavailability (7757d6d)

Features

  • Add static_600 and static_r600 problem directories for val_gap experiment (6dfa6d9)

  • Amendment 1 — replace Run P with Run F (fixed-300, F1 fitness) (5c4370b)

  • Gap_analysis.py + lineage.py + eval_checkpoint.py + README onboarding fixes (45faa45)

  • Launch script and watchdog for hotpotqa_val_gap experiment (fa6a14d)

Refactoring

  • Nest hotpotqa experiments under experiments/hotpotqa/ project dir (25652c0)

v1.16.2 (2026-03-05)

Bug Fixes

  • Shutdown worker pool before event loop closes on Ctrl+C (6bbbd1e)

v1.16.1 (2026-03-05)

Bug Fixes

  • Remove hardcoded /home/jovyan paths from shared scripts (407f8d3)

  • Replace hardcoded gh path with command -v gh in tools (d30bf8a)

v1.16.0 (2026-03-05)

v1.15.1 (2026-03-04)

Bug Fixes

  • Correct @package directive in prompts/default.yaml (40954d0)

v1.15.0 (2026-03-02)

Bug Fixes

Chores

  • Update coverage badge to 86% [skip ci] (e2c0813)

  • Update coverage badge to 87% [skip ci] (cbe7155)

  • Update coverage badge to 87% [skip ci] (1dd894d)

Documentation

  • Fix changelog link — point README to root CHANGELOG.md (43964dc)

  • Update README test structure and coverage badge to 85% (ba5ad09)

Features

  • chains: Speed-up chain_runner, add aime,hotpotqa_full,hotpotqa_qa,hover,ifbench,papillon chain problems. (#68, 4c84ba5)

  • chains: Speed-up chain_runner, add new chains problems (#68, 4c84ba5)

Refactoring

  • Rename test files from _adversarial/_extended to _edge_cases (60dc53b)

Testing

  • Comprehensive test coverage expansion with audit hardening (5fb12ac)

  • Deep audit hardening with 207 new mutation-analysis tests (c101646)

v1.14.2 (2026-02-25)

Bug Fixes

  • prompts: Download bug (9e6da70)

  • prompts: Fix broken import (0c7e63f)

Chores

  • Update coverage badge to 78% [skip ci] (82b4f00)

Testing

  • Add extended test suites for coverage-gap modules (c2bf999)

v1.14.1 (2026-02-25)

Bug Fixes

  • prompts: Remove single-step exp.; add full chains evolution (#63, 8a9ec44)

  • prompts: Removed wrong directories (#63, 8a9ec44)

v1.14.0 (2026-02-25)

Bug Fixes

  • Timeout polish for optuna stage (b9d914a)

Features

  • Add time-budget deadline to Optuna trial loop (6c98665)

v1.13.0 (2026-02-25)

Bug Fixes

  • ci: Sync release job with latest origin/main before semantic-release (34efe54)

Chores

  • Update coverage badge to 77% [skip ci] (8824566)

Features

  • Filter optimization stage errors from mutation/LLM prompts (6107559)

  • ci: Add self-updating coverage badge to README (61a9ef1)

v1.12.0 (2026-02-24)

Bug Fixes

  • Add cwd to exec runner (003eddb)

  • Add metrics storage in redis (29d4bb7)

  • Add missing file (5b0e661)

  • Bug fix for zero fitnesses (e30d4b7)

  • Close subprocess transports to prevent "Event loop is closed" warnings (075cd4c)

  • Cloudpickle 'register_pickle_by_value' for correct root imports handling (9b70499)

  • Cma deps (e3cc526)

  • Comprehensive wizard (51eb90a)

  • Fix bug in caching behavior (362dcb6)

  • Fix bug in dag cache handling logic (61e98ad)

  • Fix faulty caching for programs with optional input (ca55919)

  • Fix missing traceback from lineage (28eeff4)

  • Fixed exec runner to handle project directory (568b51d)

  • Grammar errors (a0b779b)

  • Logging (7597117)

  • Minor boundary fixes (a0dca3f)

  • Minor optuna polish and done (87e02f0)

  • Move exec runner; speed up python execution via worker pool (6bce277)

  • Optuna stage patching (8bfba7f)

  • Optuna stage polish (7c628e7)

  • Pickle to cloudpickle (7c47ac5)

  • Remove indices from constants, fix fitness descriptions (c5e2649)

  • Remove unnecessary wizard configs (997f094)

  • Replace deprecated class Config with model_config = ConfigDict() (10a8fa1)

  • Restore Optuna prompt constraints and remove reasoning max_length (98771fa)

  • Undo default endpoint (4b4adc1)

  • Update three alphaevolve problems (#51, aaaea70)

  • Windows compatibility (b96caa2)

  • ci: Fix semantic-release not updating CHANGELOG (a7da06a)

  • ci: Remove orphaned v1.12.0 tag to unblock semantic-release (f60e8b6)

  • ci: Use startsWith instead of contains for release skip filter (bfb0e2a)

  • prompt: Remove .nltk artifacts, add dependencies, upd. .gitignore (53cc178)

  • prompt: Remove debug lines (203ba94)

Chores

  • Add santa challenge problem directory (477a205)

  • Modify santa challenge problem directory (6ed6881)

  • Polish comparison scripts (b6e9e89)

  • Refactor optuna stage (fd1ca3a)

  • Santa2025 problem for n=100 (ef864fc)

  • Slightly polish code (86fd5a8)

Code Style

  • Clean up stages — loguru placeholders, builtin generics, type annotations, constants (72a447a)

Documentation

  • Add Testing section to README (245464c)

  • Update README test section with current structure and run instructions (da5cd54)

Features

    1. add new caching system (based on change in the inputs 2) structured output for mutation operator 3) slightly polish insights (dfb7a73)
  • Add artifact from validation support (91a4402)

  • Add cma-es parameters tuning stage (76a32a1)

  • Add first half of missing problems (4fca319)

  • Add global stats to context (57cee54)

  • Add missing alphaevolve problems (#46, f8a15e1)

  • Add optuna optimization stage (8da53df)

  • Add Optuna payload routing and bypass for direct optimization output (8e37b0c)

  • Add second half of alphaevolve problems (574d555)

  • Add token counters to metrics (63323c3)

  • Boltzmann/weighted elite selectors, Optuna int preservation, profiler (1011e38)

  • Dynamic space, more ram stability (bb3bd4b)

  • Normalize fitness to [0,1] in FitnessProportionalEliteSelector, fix greedy collapse (90589b4)

  • Polish storage code with claude (6826922)

  • Removed bad problems, fixed first half of valid ones (063858b)

  • Small efficiency improvements (f71f9d8)

  • Unconstrained insights categories (a4ba5fe)

  • comparison: Improve style and polish (4375952)

  • prompt: Add gsm8k, aime, ifbench, pupa, and hotpotqa problems (44ecb07)

  • prompts: Add shared functionality; add aime & jigsaw problems (7dfdce8)

  • prompts: Refactor single-prompt evolution, added chains (utils+hotpotqa) (#62, d63343d)

Performance Improvements

  • Pre-compute DAG inputs and improve stage resilience (e6e3566)

  • Reduce Redis round-trips and eliminate deep copies in hot paths (2c556e8)

Refactoring

  • Refactor wizard specs to follow pydantic (151f120)

  • Rewrite uncertainty_inequality (3f88ed0)

  • Simplify program state machine — remove EVOLVING, rename states (70b9b34)

Testing

  • Add comprehensive coverage tests for 12 modules (3086 lines) (4546525)

  • Add comprehensive test suite (1132 tests) and reorganize into subdirectories (8d47344)

  • Add comprehensive test suite for core modules (7628e91)

  • Fix flaky ScalarTournamentEliteSelector tests (5155932)

v1.11.1 (2025-11-18)

Bug Fixes

  • Set flush_at and flush_interval via client instead of constructor (887232d)

Refactoring

  • Optimize Langfuse integration (6876a9d)

  • Pass flush_at and flush_interval to CallbackHandler constructor (fc6fbd2)

  • Remove redundant try-except for CallbackHandler initialization (72884b5)

  • Remove unused flush_traces method (3b5ccb3)

v1.11.0 (2025-11-18)

Chores

Features

  • Better stage scheduling (d2e35b0)

v1.10.0 (2025-11-17)

v1.9.1 (2025-11-17)

v1.9.0 (2025-11-15)

Chores

  • Removed legacy fields, upd. FunctionSignature note (f7c832e)

  • Removed wizard example problem (41f2c77)

Documentation

  • Updated wizard documentation (a6d115e)

Features

  • Add problem scaffolding wizard (e12e566)

Refactoring

  • Moved wizard configs, made wizard a module (e2af84b)

  • Updated wizard code functionality (43d0ab6)

v1.8.1 (2025-11-14)

v1.8.0 (2025-11-14)

Features

  • Remove memory leaks and small fixes (2746ea6)

v1.7.1 (2025-11-12)

Bug Fixes

v1.7.0 (2025-11-12)

Features

  • Handling of langfuse errors (50adbc3)

  • Langfuse_tracing_less_comments (5910f96)

  • Simplifying_langfuse_tracing (c746de7)

  • Update README.md to work with langfuse (765e0e7)

v1.6.1 (2025-11-12)

Bug Fixes

  • Follow-up on removing task-dependent text (a72968b)

  • Remove task-dependent text from evolution prompts (1c6acf3)

v1.6.0 (2025-11-11)

v1.5.2 (2025-11-11)

Bug Fixes

Features

  • More docs and stability for redis (dbcb856)

v1.5.1 (2025-11-11)

Bug Fixes

v1.5.0 (2025-11-11)

Features

  • Better config structure, examples, and polish (643331e)

v1.4.0 (2025-11-07)

Features

  • Wandb support; improve pythonpath passthrough (e408cec)

v1.3.0 (2025-11-06)

Bug Fixes

Features

    1. add metrics logging with tensorboard 2) fix execution ordering in evolution engine 3) fix island api 4) add proper cancelation handling for async method (a0e7d8e)

v1.2.0 (2025-11-06)

Chores

  • prompts: Removed unused prompt constants, moved task hints to description (b6db207)

Features

  • prompts: Centralize mutation prompts and remove task hints (6bd971f)

Refactoring

  • prompts: Add task-independent mutation prompts (e56dd49)

v1.1.0 (2025-10-31)

Features

  • Changed pickle serialization to cloudpickle for classes and lambdas over network (6524340)

v1.0.3 (2025-10-31)

v1.0.2 (2025-10-31)

Bug Fixes

  • Better error handling and logging in dag (5973d7b)

v1.0.1 (2025-10-31)

Bug Fixes

  • Add missing dep (acecda2)

  • Minor fixes to simplify hydra and fix prompt for insights (60aa776)

Chores

  • deps: Move hydra dependencies to main requirements (6e574d9)

Refactoring

  • Migrate MetaEvolve to GigaEvo (fc8c2ca)

v1.0.0 (2025-10-31)

v0.9.0 (2025-09-26)

v0.8.0 (2025-09-22)

v0.7.0 (2025-09-21)

Chores

v0.6.0 (2025-09-20)

  • Initial Release