-
Eliminate ~76k test warnings (0 remaining) (
df637e6) -
Flatten ideas_tracker aliases (list[dict]) to MemoryCard (list[str]) (
f6e620d) -
Lint + format pre-existing errors in experiment files (
9241296) -
Lint errors in ablation_v3_no_deep.py, update prereg_commit (
a118bbf) -
MemoryCard.aliases type list[str] → list[Any] (
a31443f)
- Ruff format
(
d05507d)
-
Remove all 27 type: ignore comments from codebase (
91a175f) -
Rename memory test files to describe what they test (
68828e2)
- Integration test for ideas_tracker dict aliases (Bug #2, PR #161)
(
ff54673)
-
Format card_conversion.py (
228b8f3) -
Lint import sorting in A_mem + GAM_root (pre-existing) (
5e3baa3) -
memory: Address chaos-hacker findings on public API (
91aec06) -
memory: Correct concept_to_card return type annotation (
78327d0)
- Add gigaevo.memory public API exports
(
7790b82)
-
Replace 50 print() with loguru in A_mem + GAM_root (
59853df) -
memory: Add future annotations, reduce hasattr/getattr usage (
ae4e403) -
memory: Consolidate 20 test files into tests/memory/ (
e6f8480)
- Dict → Pydantic migration complete — normalize_memory_card returns AnyCard
(
f2ea951)
-
Normalize_memory_card returns AnyCard (Pydantic models) (
5926631) -
Replace print() with loguru, remove sys.path hacks in ideas_tracker (
8663644)
-
Add break condition for processing when no new ideas are present (
a6a3a18) -
Add break condition for processing when no new ideas are present (
0e20d87) -
Changed cooccurrence threshold agressive scaling to fixed minimum (
5c5c29b) -
Changed cooccurrence threshold agressive scaling to fixed minimum (
c8a7e5b) -
Circular import in logger (
55e3b1f) -
Circular import in logger (
522d28e) -
Clean up memory PR merge — lint, format, junk dirs, broken imports (
21035ab) -
Correct serialization of dict and lists in pd columns (
f7a6bbe) -
Correct serialization of dict and lists in pd columns (
2dd2eba) -
Dead retry in _decide_card_action — parse_llm_card_decision returns None for garbage (
80649dd) -
Eliminate RuntimeWarning in generate_mutations tests (
ba4195d) -
Handle parent_ids as string in ideas_tracker (
7601ed9) -
Handle parent_ids as string in ideas_tracker (
3148878) -
IncomingIdeas update logic fix (
b9a1781) -
IncomingIdeas update logic fix (
2947dda) -
Lint and format errors for CI (ruff check + ruff format) (
b884547) -
Phase 1 — 3 confirmed bugs fixed in memory system (
5e9addd) -
Remove debug print (
bf4a981) -
Remove debug print (
cda32cb) -
Remove short id separate storage and generation (
d0fd1a6) -
Remove short id separate storage and generation (
d1fdde8) -
Restore RedisRunConfig + fetch_evolution_dataframe re-export in tools/utils.py (
30bc8ea) -
Wrong key name fix (
66cab68) -
Wrong key name fix (
9cc9912)
-
Removed unused prompts (
5cff8f1) -
Removed unused prompts (
52852cc) -
Update docstrings (
3123cba) -
Update docstrings (
6a44189)
-
Add best idea extraction based on top_k selection by fitness and delta fitness (
04ab7c6) -
Add best idea extraction based on top_k selection by fitness and delta fitness (
995929e) -
Add changes extraction to mutation agent (
cd924c7) -
Add changes extraction to mutation agent (
ad342ca) -
Add extended record card dataclass (
8aeab4e) -
Add extended record card dataclass (
2d80776) -
Add idea description rewriting logic (
0201eb3) -
Add idea description rewriting logic (
ea8ddd3) -
Add idea origin analysis script and minor refactor ideas_tracker.py (
ab184b8) -
Add idea origin analysis script and minor refactor ideas_tracker.py (
5a38c1d) -
Add idea tracker (
e9e911d) -
Add idea tracker (
eb9701b) -
Add logging for idea tracker (
21b6f58) -
Add logging for idea tracker (
0563796) -
Add ProgramCard, ConnectedIdea, AnyCard Pydantic models (
edbdd1e) -
Add update logic for extended record card (
0bb5038) -
Add update logic for extended record card (
8529a30) -
Csv loading to IdeaTracker (
3cc0605) -
Csv loading to IdeaTracker (
5b610ca) -
Experimental ml pipeline for impact estimation based on linear regression feature weights (
ffaac97) -
Experimental ml pipeline for impact estimation based on linear regression feature weights (
bda2e79) -
Implement idea enrichment with LLM-generated keywords and summaries (
1d4f350) -
Implement idea enrichment with LLM-generated keywords and summaries (
102cb74) -
Support for extended record card (
a7cb492) -
Support for extended record card (
0549fe7) -
Task description loading (
bbfc57a) -
Task description loading (
2c3ff68) -
Update main logic to work with extended record card (
b1ed1ac) -
Update main logic to work with extended record card (
a98d6b1)
-
Add Protocol types, fix mypy errors, CardDict alias (
2f09116) -
Extract _note_fields_changed, remove stale comments and blank lines (
c3fbede) -
Extract card_conversion.py + utils.py, add MemoryNoteProtocol typing (
406f2cd) -
Extract DEFAULT_MODEL_NAME constant, remove ad-hoc string (
0f67bc1) -
Extract more pure functions + memory_write_config.py (
7e1bdbe) -
Phase 2 — import cleanup, context manager, remove del (
dd31c4f) -
Phase 3 — extract _ConceptApiClient + utilities (
8dacf30) -
Record card extended minor refactor (
bbda084) -
Record card extended minor refactor (
b5eac06) -
Remove debug code (
5a9f8ec) -
Remove debug code (
117a325) -
Rename test files and classes to professional naming (
6ae0134) -
Replace all print() with loguru logger across memory module (
5091f37) -
Replace ML impact pipeline with origin analysis computation and improve docstring clarity (
bcb2ff7) -
Replace ML impact pipeline with origin analysis computation and improve docstring clarity (
3f57d5f)
-
Chaos-hacker bug exposure tests (16 tests) (
1e62411) -
Cycle 10 (final) — API search, LLM synthesis, close() (21 tests) (
2d63100) -
Cycle 11 — fake agentic memory infrastructure + 24 tests (
72c49e5) -
Cycle 12 — fake Chroma/GAM + full dedup pipeline (15 tests) (
d94a674) -
Cycle 13 — 7 realistic E2E scenarios + 2 unpatched real-memory tests (
282f7f6) -
Cycle 2 — API client, dedup decision, truncate (28 tests) (
77aa25d) -
Cycle 3 — deeper AmemGamMemory internals (21 tests) (
543b2e2) -
Cycle 4 — integration tests + chaos-hacker regression fixes (25 tests) (
4b92198) -
Cycle 5 — mutation operator memory flow, sync_from_api, API body checks (17 tests) (
fcbc352) -
Cycle 6 — 8 e2e scenarios + data_components (64 new tests) (
1040f02) -
Cycle 7 — contract tests + engine interaction (34 new tests) (
616f88c) -
Cycle 8 — full-loop evolution with memory (11 E2E tests) (
d7ef94d) -
Cycle 9 — LLMMutationOperator real constructor + memory (14 tests) (
9035068) -
P0 exhaustive tests for memory module core (211 tests) (
572c28a) -
P1 dedup edge cases + OpenAI inference tests (100 tests) (
9945c0e) -
P2 memory_write_example edge cases (22 tests) (
70375bb)
- Remove last 4 dead .claude/rules/ references from CLAUDE.md
(
9caf0c3)
- Remove GitNexus from CLAUDE.md, skills, and gitignore
(
b9f90ed)
- Add Quick Start sections with runnable commands to all feature docs
(
4d9a809)
- Rename scheduling/lpt_ridge → lpt_chain, clarify scope
(
10cf394)
This release focuses on performance infrastructure, experiment tooling maturity, and repository hygiene. Two experiments were completed (hover/steady-state-v2: POSITIVE, hover/map-elites-topology: NULL), and the framework gained production-grade load balancing, scheduling, and monitoring.
-
Steady-state evolution engine — continuous mutation/evaluation interleaving that eliminates the generational barrier. Two async loops (producer + consumer) with backpressure via
asyncio.Semaphore(max_in_flight). Opt-in:evolution=steady_state. Expected throughput: ~8-9x improvement over step-wise generations. -
LPT scheduling for DAG evaluation (#136) — longest-processing-time-first scheduling assigns expensive programs to evaluation slots first, reducing tail latency. Discrete-event simulation benchmarks in
tools/benchmarks/. -
LLM load balancer (
llm=balanced) — Redis-coordinated endpoint pool with least-connections routing. Mutation servers shared across all runs via Redis DB 15. Replaces manualllm_base_urlper-run configuration. -
LiteLLM proxy integration —
bash tools/litellm.shauto-generates config fromexperiments/infrastructure.yamland starts a LiteLLM proxy for chain server load balancing. All chain requests route throughINTERNAL_IP:4000. -
Chain feature extraction —
ChainFeatureExtractorcomputes structural behavior coordinates (DAG depth, retrieval count, step count) from real chain programs for MAP-Elites behavioral characterization. -
Experiment diagnostics —
/experiment-diagnoseskill: automated failure analysis for running experiments. Checks Redis health, PID liveness, log errors, and Hydra config overrides. -
Experiment restart —
/experiment-restartskill: kill all processes, flush Redis, and re-launch cleanly. -
Throughput monitoring —
tools/throughput_plot.pyand 6-panel dashboard in watchdog: mutation rate, eval throughput, fitness distributions, validity panels. Posted hourly to experiment PRs. -
Fitness vs wall-clock time —
tools/fitness_vs_time.pyplots fitness trajectories against real time instead of generation number. -
Prompt co-evolution — user prompt co-evolution alongside system prompts (
prompt_fetcher=coevolved).
-
120s read timeout killed 96% of chain evaluations — removed read timeout (
timeout=None, keepconnect=30s) to allow long-running chains under load. -
CancelledError orphans —
except Exceptiondidn't catchBaseExceptionin steady-state engine, leaving programs persisted but IDs lost. Fixed withpersisted_idsentinel +except BaseException. -
Mutation LLM double-escaping — LLMs using
with_structured_output()sometimes double-escape quotes in code fields. Fixed by_fix_double_escaped_quotes()in mutation agent. -
Frontier metric recomputation — when NO_CACHE stages re-evaluate programs, frontier is now recomputed correctly using
clear_series()+ full rewrite instead of appending stale values. -
TOCTOU races in SteadyStateEngine — scoped drain + TOCTOU-safe
ingest_batch,add_elitewith optimistic locking and WatchError retry. -
Ghost program detection — mirrors parent engine's
_await_idle()logic to clean up orphaned program IDs. -
Proxy bypass — added mutation server IPs to
NO_PROXYto prevent Squid proxy from blocking LLM calls.
| Experiment | Result | PR |
|---|---|---|
| hover/steady-state-v2 | POSITIVE — continuous interleaving improves throughput | #138 |
| hover/map-elites-topology | NULL — 3D structural BC (dag_depth, n_deep_retrieval, n_steps) did not improve fitness | #142 |
- Removed leaked vartodd/circuit_evolve code — problems, configs, custom/, gf2lib/, npy/, launch scripts (12,800+ lines deleted)
- Removed experiment runtime artifacts — PNGs, pids.txt, cfg_run_*.txt from all completed experiments
- Consolidated tools hierarchy — experiment-specific scripts (archive, preflight, protocol gates) now live in
tools/experiment/; general tools intools/ - Removed all hardcoded paths — skills, agents, tools, and docs now use
$PROJ(git root) and$GIGAEVO_PYTHON(env var) instead of/workspace-SR008.fs2/...or/home/jovyan/... - Fixed .gitignore contradictions —
.claude/andCLAUDE.mdwere tracked but gitignored - Cleaned root directory — moved
benchmarks/→tools/benchmarks/,demos/→docs/demos/ - Rewrote Redis data model docs — complete key namespace reference with all metric tags, archive persistence, iteration vs generation glossary
- CLAUDE.md — added tools index, skills table (12 skills), agents table (9 agents),
@tools/README.mdinclude for Redis data model - tools/README.md — structured tool index with categories (general, experiment lifecycle, infrastructure, benchmarking, scaffolding), accurate Redis appendix
- Removed dead references —
.claude/rules/*.mdfiles that never existed on main, redirect stubdocs/redis_schema.md
- 56+ new tests: race conditions, streaming, failure modes, mutation-killing, TOCTOU guards, NaN handling
- Removed deprecated test classes (TestSafetyMechanismBreakage, TestEngineGenerationTimeout)
- Full suite: ~3500 tests, all passing
-
bugs: Round-2 — migration KeyError on None island + DAG empty-nodes crash (
54810b0) -
bugs: Round-4 — 5 junior-researcher attack surface bugs (
073cc33) -
bugs: Round-5 — H1 sentinel bypass + TOCTOU dag_runner + H2-H4 guard tests (
d039cf1) -
tests: Update test_evolution_engine.py for get_all_by_status migration (
a33604b)
-
generalization: Add launch script and run_status.sh (
61e76af) -
generalization: Add launch script and run_status.sh (
96bbb42) -
generalization: Add test eval script, PR description, gitignore indexes (
6405a22) -
generalization: Backfill pre-registration commit hash in 03_plan.md (
3bd7fea) -
generalization: Gen-1 smoke check — all 4 runs alive, split bias OK (
17b25db) -
generalization: Record binding prompt review sign-off in 03_plan.md (
6a0d2e6)
-
memory: Chaos-hacker round-5 findings summary (
4c742fc) -
memory: Restructure Claude memory + propagate gen-count fix + add closeout step (
aca5f0d)
- generalization: Implement static_holdout_f1 problem + generalization prompts
(
1dbb05c)
- tests: Move round2/round3 tests to semantic locations
(
3b3117e)
-
integration: 21 new integration tests — DAG ordering + engine edge cases (
1bb5235) -
round3: Regression tests for Bug A and B fixes (
f778ad7) -
security: Fix safe_mode bypass + add regression tests from audit (
ca2d4cf)
-
results_report: Remove stray ESC character (U+001B) (
e2b01fa) -
status: Use run_state Redis key for generation count (
e7648ab)
- gemini_mutation: Pre-merge cleanup — environment freeze + PR description
(
a36f29d)
-
hotpotqa: Add LaTeX results report for paper (
02adea3) -
hotpotqa: Make results_report.tex self-contained compilable document (
bf0ff3f)
- resume: Make redis.resume produce a contiguous run
(
07091fb)
- gemini_mutation: Pre-register experiment — Gemini-3-Flash as mutation LLM
(
0f42851)
-
build_colbert_index: Cap num_partitions=32768, kmeans_niters=4 for tractable CPU k-means (
1b605b3) -
colbert: Replace faiss GPU k-means with PyTorch batched k-means (
765e8aa) -
colbert: Simplify build script — patch applied directly to colbert source (
f86d3a4) -
colbert_feedback: Export HOTPOTQA_COLBERT_SERVER_URL in run_test_eval.sh (
ac1e423)
- Fill pre-registration commit hash and PR number in 03_plan.md
(
454f817)
-
colbert_feedback: Amendment 5 — gap investigation results (
efdf087) -
colbert_feedback: Record index build completion in 03_plan.md (
a8f4e9a)
-
chains/hotpotqa: Add ColBERT+rich-feedback experiment (colbert_feedback, Phase 3) (
20c8314) -
colbert_feedback: Add ColBERT search server + update launch/validate/plan (
d7683d4) -
colbert_feedback: Watchdog + benchmark server-url support (
1272c1e)
- chains: Hotpotqa: add Retriever class and colbertv2 retriever
(
b681195)
-
Add cold_start entry to INDEX.md + create experiment branch (
b0b47af) -
Fill PIDs into run_status.sh — T1=3812756 T2=3812757 T3=3812758 T4=3812759 watchdog=3813084 (
cef1b24) -
Launch.sh, run_watchdog.py, run_status.sh for cold_start experiment (
53bbb1f)
- Add baseline initial_programs to static_f1_600 for cold-start support
(
f5adb9f)
- Watchdog gen count — use log file instead of Redis s field
(
1d229e4)
-
Backfill pre-reg commit hash + add crossover entry to INDEX.md (
2733e86) -
Fill PIDs into run_status.sh — P=3660148 Q=3660149 R=3660150 S=3660151 watchdog=3660461 (
d52fe31) -
Launch.sh, run_status.sh, run_watchdog.py for crossover experiment (
e2c3629)
-
12 infra correctness fixes from codebase audit (
ef89ba9) -
Check_experiment_complete.sh SIGPIPE bug + environment_freeze.txt (
815b47e) -
Extend prompts_dir to all pipeline YAMLs + docstring accuracy pass (
939ec7d) -
Gen10_test_eval.py val_em gap correct for F1 runs (
9c9e65f) -
Move analyze_test_results.py to push experiment tools dir (
e6367be) -
Pin push run_test_eval.sh sha256 in 03_plan.md (was val_gap hash) (
7763287) -
Propagate known bugs to templates and docs to prevent recurrence (
b545282) -
Raise chain LLM HTTP timeout 120s→600s + hard reset all runs (Amendment 4) (
c0186a8) -
Remove stale failures[:10] cap from docstrings and pipeline comments (
ac3c07d) -
Tighten APPROVED grep + correct agent memories for Phase 5 readiness (
3ee5854) -
Update PR_DESCRIPTION.md template — val EM → val fitness (
e99f891) -
Watchdog PROJ path (3→4 parents) + stale Run D config (
a3dae8f)
-
Add run_status.sh template for push experiment monitoring (
6a1e86c) -
Infra improvements while runs are live (
466db87) -
Launch.sh for push experiment + CONTEXT.md updates (
c2da4a7) -
Pre-fill 05_results.md skeleton + analysis script + INDEX.md entry (
a8e288b) -
Replace Run D EM+NLP+600 → F1+NLP+600 (Amendment 3) (
58ec1fa) -
Update INDEX.md and CONTEXT.md naming consistency (
05bce0a) -
Watchdog + run_status.sh for push experiment (
81807ea)
-
Hotpotqa_asi.yaml is required for ALL hotpotqa variants, not just static_a/ra (
856125c) -
Update all experiments// → experiments/// (
1b67951)
- Wire stage_timeout through DefaultPipelineBuilder + validation speedup
(
55772ba)
- Update agent memories (push experiment + path fixes)
(
6f50092)
-
Fix INDEX.md — hotpotqa_thinking test EM ~60% not 62.3% (
e8407d5) -
Set pre-registration commit hash in 03_plan.md (push experiment) (
f47847f) -
Update INDEX.md — drop pre-protocol exps, close out nlp_prompts + val_gap (
7ed9ba9)
- Pre-registration 03_plan.md + static_f1_600 problem directory (push experiment)
(
9173c10)
-
Amendment 1 review fixes — F1 objective, EM=0 criterion, rationale, Gate E (
866f106) -
Distinguish timeouts from generic failures in stage logs and status monitoring (
2228dd8) -
Launch.sh preflight loops use CHAIN_URL_F (not removed CHAIN_URL_P) (
bb79e6f) -
Replace dry_run=true with --cfg job in launch.sh; update CLAUDE.md (
13e968a) -
Status.py gen count bug + add Redis schema doc + run_status.sh (
b85e388)
-
Record Amendment 1 commit hash in 03_plan.md (866f106) (
2f64837) -
Update PIDs in run_watchdog.py — launch 2026-03-05 12:21 UTC (
acec7c1)
-
Fill pre-registration commit hash in 03_plan.md (
77f3ef6) -
Move task_hotpotqa.md → experiments/hotpotqa/CONTEXT.md + CLAUDE.md lookup table (
e920581) -
Split task-specific content out of CLAUDE.md into task_hotpotqa.md (
91410b3) -
Update 04_launch.md for dry_run removal and crontab unavailability (
7757d6d)
-
Add static_600 and static_r600 problem directories for val_gap experiment (
6dfa6d9) -
Amendment 1 — replace Run P with Run F (fixed-300, F1 fitness) (
5c4370b) -
Gap_analysis.py + lineage.py + eval_checkpoint.py + README onboarding fixes (
45faa45) -
Launch script and watchdog for hotpotqa_val_gap experiment (
fa6a14d)
- Nest hotpotqa experiments under experiments/hotpotqa/ project dir
(
25652c0)
- Shutdown worker pool before event loop closes on Ctrl+C
(
6bbbd1e)
-
Remove hardcoded /home/jovyan paths from shared scripts (
407f8d3) -
Replace hardcoded gh path with command -v gh in tools (
d30bf8a)
- Correct @package directive in prompts/default.yaml
(
40954d0)
-
Update coverage badge to 86% [skip ci] (
e2c0813) -
Update coverage badge to 87% [skip ci] (
cbe7155) -
Update coverage badge to 87% [skip ci] (
1dd894d)
-
Fix changelog link — point README to root CHANGELOG.md (
43964dc) -
Update README test structure and coverage badge to 85% (
ba5ad09)
-
chains: Speed-up chain_runner, add aime,hotpotqa_full,hotpotqa_qa,hover,ifbench,papillon chain problems. (#68,
4c84ba5) -
chains: Speed-up chain_runner, add new chains problems (#68,
4c84ba5)
- Rename test files from _adversarial/_extended to _edge_cases
(
60dc53b)
-
Comprehensive test coverage expansion with audit hardening (
5fb12ac) -
Deep audit hardening with 207 new mutation-analysis tests (
c101646)
- Update coverage badge to 78% [skip ci]
(
82b4f00)
- Add extended test suites for coverage-gap modules
(
c2bf999)
- Timeout polish for optuna stage
(
b9d914a)
- Add time-budget deadline to Optuna trial loop
(
6c98665)
- ci: Sync release job with latest origin/main before semantic-release
(
34efe54)
- Update coverage badge to 77% [skip ci]
(
8824566)
-
Filter optimization stage errors from mutation/LLM prompts (
6107559) -
ci: Add self-updating coverage badge to README (
61a9ef1)
-
Add cwd to exec runner (
003eddb) -
Add metrics storage in redis (
29d4bb7) -
Add missing file (
5b0e661) -
Bug fix for zero fitnesses (
e30d4b7) -
Close subprocess transports to prevent "Event loop is closed" warnings (
075cd4c) -
Cloudpickle 'register_pickle_by_value' for correct root imports handling (
9b70499) -
Cma deps (
e3cc526) -
Comprehensive wizard (
51eb90a) -
Fix bug in caching behavior (
362dcb6) -
Fix bug in dag cache handling logic (
61e98ad) -
Fix faulty caching for programs with optional input (
ca55919) -
Fix missing traceback from lineage (
28eeff4) -
Fixed exec runner to handle project directory (
568b51d) -
Grammar errors (
a0b779b) -
Logging (
7597117) -
Minor boundary fixes (
a0dca3f) -
Minor optuna polish and done (
87e02f0) -
Move exec runner; speed up python execution via worker pool (
6bce277) -
Optuna stage patching (
8bfba7f) -
Optuna stage polish (
7c628e7) -
Pickle to cloudpickle (
7c47ac5) -
Remove indices from constants, fix fitness descriptions (
c5e2649) -
Remove unnecessary wizard configs (
997f094) -
Replace deprecated class Config with model_config = ConfigDict() (
10a8fa1) -
Restore Optuna prompt constraints and remove reasoning max_length (
98771fa) -
Undo default endpoint (
4b4adc1) -
Windows compatibility (
b96caa2) -
ci: Fix semantic-release not updating CHANGELOG (
a7da06a) -
ci: Remove orphaned v1.12.0 tag to unblock semantic-release (
f60e8b6) -
ci: Use startsWith instead of contains for release skip filter (
bfb0e2a) -
prompt: Remove .nltk artifacts, add dependencies, upd. .gitignore (
53cc178) -
prompt: Remove debug lines (
203ba94)
-
Add santa challenge problem directory (
477a205) -
Modify santa challenge problem directory (
6ed6881) -
Polish comparison scripts (
b6e9e89) -
Refactor optuna stage (
fd1ca3a) -
Santa2025 problem for n=100 (
ef864fc) -
Slightly polish code (
86fd5a8)
- Clean up stages — loguru placeholders, builtin generics, type annotations, constants
(
72a447a)
-
Add Testing section to README (
245464c) -
Update README test section with current structure and run instructions (
da5cd54)
-
- add new caching system (based on change in the inputs 2) structured output for mutation
operator 3) slightly polish insights
(
dfb7a73)
- add new caching system (based on change in the inputs 2) structured output for mutation
operator 3) slightly polish insights
(
-
Add artifact from validation support (
91a4402) -
Add cma-es parameters tuning stage (
76a32a1) -
Add first half of missing problems (
4fca319) -
Add global stats to context (
57cee54) -
Add optuna optimization stage (
8da53df) -
Add Optuna payload routing and bypass for direct optimization output (
8e37b0c) -
Add second half of alphaevolve problems (
574d555) -
Add token counters to metrics (
63323c3) -
Boltzmann/weighted elite selectors, Optuna int preservation, profiler (
1011e38) -
Dynamic space, more ram stability (
bb3bd4b) -
Normalize fitness to [0,1] in FitnessProportionalEliteSelector, fix greedy collapse (
90589b4) -
Polish storage code with claude (
6826922) -
Removed bad problems, fixed first half of valid ones (
063858b) -
Small efficiency improvements (
f71f9d8) -
Unconstrained insights categories (
a4ba5fe) -
comparison: Improve style and polish (
4375952) -
prompt: Add gsm8k, aime, ifbench, pupa, and hotpotqa problems (
44ecb07) -
prompts: Add shared functionality; add aime & jigsaw problems (
7dfdce8) -
prompts: Refactor single-prompt evolution, added chains (utils+hotpotqa) (#62,
d63343d)
-
Pre-compute DAG inputs and improve stage resilience (
e6e3566) -
Reduce Redis round-trips and eliminate deep copies in hot paths (
2c556e8)
-
Refactor wizard specs to follow pydantic (
151f120) -
Rewrite uncertainty_inequality (
3f88ed0) -
Simplify program state machine — remove EVOLVING, rename states (
70b9b34)
-
Add comprehensive coverage tests for 12 modules (3086 lines) (
4546525) -
Add comprehensive test suite (1132 tests) and reorganize into subdirectories (
8d47344) -
Add comprehensive test suite for core modules (
7628e91) -
Fix flaky ScalarTournamentEliteSelector tests (
5155932)
- Set flush_at and flush_interval via client instead of constructor
(
887232d)
-
Optimize Langfuse integration (
6876a9d) -
Pass flush_at and flush_interval to CallbackHandler constructor (
fc6fbd2) -
Remove redundant try-except for CallbackHandler initialization (
72884b5) -
Remove unused flush_traces method (
3b5ccb3)
-
Add terminal gif (
4286fd2) -
Add terminal gif (
398056b) -
Add terminal gif (
ed4aba1) -
Fix license (
5592e05) -
Fix license (
1394967) -
Remove emoji (
7c4e0f5)
- Better stage scheduling
(
d2e35b0)
-
Removed legacy fields, upd. FunctionSignature note (
f7c832e) -
Removed wizard example problem (
41f2c77)
- Updated wizard documentation
(
a6d115e)
- Add problem scaffolding wizard
(
e12e566)
- Remove memory leaks and small fixes
(
2746ea6)
- Small polish
(
e9abd09)
-
Handling of langfuse errors (
50adbc3) -
Langfuse_tracing_less_comments (
5910f96) -
Simplifying_langfuse_tracing (
c746de7) -
Update README.md to work with langfuse (
765e0e7)
-
Follow-up on removing task-dependent text (
a72968b) -
Remove task-dependent text from evolution prompts (
1c6acf3)
- Small fix problem name
(
6663ea4)
- More docs and stability for redis
(
dbcb856)
- Small fix problem name
(
44ea85f)
- Better config structure, examples, and polish
(
643331e)
- Wandb support; improve pythonpath passthrough
(
e408cec)
-
- add metrics logging with tensorboard 2) fix execution ordering in evolution engine 3) fix
island api 4) add proper cancelation handling for async method
(
a0e7d8e)
- add metrics logging with tensorboard 2) fix execution ordering in evolution engine 3) fix
island api 4) add proper cancelation handling for async method
(
- prompts: Removed unused prompt constants, moved task hints to description
(
b6db207)
- prompts: Centralize mutation prompts and remove task hints
(
6bd971f)
- prompts: Add task-independent mutation prompts
(
e56dd49)
- Changed pickle serialization to cloudpickle for classes and lambdas over network
(
6524340)
- Better error handling and logging in dag
(
5973d7b)
- deps: Move hydra dependencies to main requirements
(
6e574d9)
- Migrate MetaEvolve to GigaEvo
(
fc8c2ca)
- release: V0.7.0
(
5f102b4)
- Initial Release