Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
-
Updated
Apr 20, 2026 - Python
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Official Implementation of VideoDPO
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
ZYN: Zero-Shot Reward Models with Yes-No Questions
Synthetic data for fine tuning LLM
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
distilled Self-Critique refines the outputs of a LLM with only synthetic data
RewardAnything: Generalizable Principle-Following Reward Models
Code for the paper "Improving Socratic Question Generation using Data Augmentation and Preference Optimization"
Production-ready RLAIF trading system with multi-agent Claude AI that learns from market outcomes. Features 60+ indicators, foundation models, and serverless deployment.
AI in the loop . replacing humans completely - AI in the Loop (AITL): A Systems Taxonomy for Closed-Loop Autonomous Evaluation
🧠 Enhance AI conversations with Cognio, a persistent memory server that retains context and enables meaningful semantic search across sessions.
RankPO: Rank Preference Optimization
(Stepwise controlled Understanding for Trajectories) -- “agent that learns to hunt"
RLAF: Reinforcement Learning from Agentic Feedback - A unified framework for training AI agents with multi-perspective critic ensembles
verification as a service: moving beyond llm as a judge
Provide detailed AI-driven feedback on academic economics papers with multi-agent review and consolidated reports for pre-submission quality checks
Add a description, image, and links to the rlaif topic page so that developers can more easily learn about it.
To associate your repository with the rlaif topic, visit your repo's landing page and select "manage topics."