2026-03-03 11:15:11

danvoronov · danvoronov · commit c63f67677c4e · 2026-03-03T11:15:12.000+02:00
diff --git a/eng_2026/03/2026-03-03-07-41.md b/eng_2026/03/2026-03-03-07-41.md
@@ -0,0 +1,35 @@
+Two years ago, programming models behaved like a *genie*—you’d ask them for something, and they’d do it technically correctly but with a catch. To combat this, many "harnesses" (wrappers) were devised. Apps like Cursor were pioneers in exploring how to do this effectively.
+
+2026 models have become significantly more obedient, so, as I wrote earlier, the `AGENTS.md` file is no longer as critical. Another recent example is Vercel, which removed 80% of specialized tools from its internal text-to-SQL agent, leaving only a single "execute bash" in a sandbox (https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools).
+
+We are learning to **simplify** the architectures we over-engineered over the past two years, using minimal tools to avoid hindering powerful models.
+
+**NxCode Team on AI Agent Operations**
+https://www.nxcode.io/resources/news/harness-engineering-complete-guide-ai-agent-codex-2026
+Explains the harness as a "bridle + saddle + reins" for a powerful but uncontrolled "horse" (the model). An example is LangChain, which boosted a coding agent from 52.8% to 66.5% on Terminal Bench without changing the model—only through middleware (self-verification, loop detection, context mapping).
+
+Agents fail not because of model quality, but because of a poor harness.
+
+It’s important to add that an ideal harness won't save a weak model.
+
+**OpenAI on Harness Engineering**
+https://openai.com/index/harness-engineering/
+They state that in the world of agents, the engineer's role is shifting from "writing code" to "managing the environment," where humans steer the direction and agents execute.
+
+The most important thing now is not just a high-quality model, but the environment:
+– A structured `docs/` folder as the single source of truth,
+– A short `AGENTS.md` (~100 lines) instead of a massive prompt,
+– Mechanical linters + CI that check invariants (architecture rules, naming, file size, etc.),
+– A "doc-gardening" agent that automatically fixes outdated documentation.
+
+A single Codex run can last up to 6 hours (often overnight). Therefore, it’s better to have all knowledge contained within the repository (versioned artifacts). No external chats or verbal discussions.
+
+**Discussion on HN about Harness Engineering**
+https://news.ycombinator.com/item?id=46988596
+Can Bölük (author of https://github.com/can1357/oh-my-pi) took 16 different LLM models and ran them twice on the same benchmark for fixing real bugs in a React app. He changed **only one tool**—the file editing format. Instead of `apply_patch` / `str_replace`, he introduced **Hashline** (each line gets a short hash, and the model edits by hash rather than text). From this change alone, 14 out of 16 models **improved** their results.
+
+The primary skill for an IT developer now is designing the harness, not writing code manually. Many confirm that hash-line gives agents a significant boost.
+
+Conspiracy theory: "Companies intentionally keep the best harnesses secret to avoid decreasing token consumption." In recent weeks, Anthropic and Google have been banning custom harnesses; even the post's author was cut off from Gemini during his benchmark.
+
+#harness
diff --git a/ukr_2026/03/2026-03-03-09-41.md b/ukr_2026/03/2026-03-03-09-41.md
@@ -0,0 +1,36 @@
+Якщо роки 2 тому моделі в програмуванні поводилися як *джин* — ти їх просив, а вони все робили ніби правильно, але з каверзою. Щоб із ними боротися, вигадувалося багато «милиць»-обв'язок (harness). Програми на кшталт Cursor якраз досліджували, як це краще робити.
+
+Моделі 2026 року стали значно слухнянішими, тому, як я писав раніше, тепер й файл `AGENTS.md` не має такого значення. Інший свіжий приклад, це як Vercel видалили 80 % спеціалізованих інструментів у свого внутрішнього text-to-SQL агента, залишили один execute bash у sandbox.
+
+Ми вчимося **спрощувати** архітектуру (що нагородили за ці два роки), використовувати мінімальні інструменти щоб не заважити потужним моделям.  
+
+**NxCode Team про роботу ШІ агентів**
+https://www.nxcode.io/resources/news/harness-engineering-complete-guide-ai-agent-codex-2026
+Пояснює harness як «вуздечку + сідло + поводи» для потужного, але неконтрольованого «коня» (моделі). Приклад LangChain, які підняли coding-агента з 52.8 % до 66.5 % на Terminal Bench без зміни моделі — тільки через middleware (self-verification, loop detection, context mapping).
+
+Агенти провалюються не через якість моделі, а через поганий harness. 
+
+Важливо доповнити, що слабку модель навіть ідеальний harness не врятує.
+
+**OpenAI про harness engineering**
+https://openai.com/index/harness-engineering/
+Говорять, що у світі агентів роль інженера змінюється з «писання коду» на «керування середовищем», де люди керують напрямком (steer), а агенти виконують. 
+
+Найважливіше тепер — не тільки якісна модель, а середовище:
+– структурована папка `docs/` як single source of truth, 
+– короткий `AGENTS.md` (~100 рядків) замість гігантського промпту, 
+– механічні лінтери + CI, які перевіряють invariants (правила архітектури, naming, file size тощо), 
+– «doc-gardening» агент, який сам виправляє застарілу документацію.
+
+Один запуск Codex може працювати до 6 годин (часто вночі). Тому краще мати усе знання тільки всередині репозиторію (versioned artifacts). Ніяких зовнішніх чатів чи усних обговорень.
+
+
+Обговорення на ХН про harness engineering
+https://news.ycombinator.com/item?id=46988596
+Can Bölük (автор інструменту https://github.com/can1357/oh-my-pi) взяв 16 різних LLM моделей і запустив їх два рази на одному й тому ж бенчмарку виправлення реальних багів у React-аппі: змінив **лише один інструмент** — формат редагування файлів, замість apply_patch / str_replace ввів **Hashline** (кожний рядок отримує короткий хеш, модель редагує за хешем, а не за текстом). Тільки від цього 14 з 16 моделей **покращили** результати.
+
+Тепер головна навичка IT розробника — проектувати harness, а не писати код вручну. Багато хто підтверджує що hash-line дає агенту буст. 
+
+Теорія змови: «Компанії навмисно тримають найкращі harness’и в секреті, щоб не зменшувати споживання токенів». Останні тижні Anthropic і Google банять кастомні harness’и, навіть автора посту відрізали від Gemini під час бенчмарку.
+
+#harness