rlaif

Here are 21 public repositories matching this topic...

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Apr 20, 2026
Python

mengdi-li / awesome-RLAIF

Star

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

alignment rl llms rlhf rlaif

Updated Aug 6, 2025

CIntellifusion / VideoDPO

Star

Official Implementation of VideoDPO

self-improvement diffusion-models aigc generative-ai rlhf rlaif videogeneration

Updated Jun 1, 2025
Python

holarissun / Prompt-OIRL

Star

code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning

inverse-reinforcement-learning irl offline-rl large-language-models llm prompt-engineering rlhf rlaif offline-irl

Updated Mar 20, 2024
Python

vicgalle / zero-shot-reward-models

Sponsor

Star

ZYN: Zero-Shot Reward Models with Yes-No Questions

reinforcement-learning zero-shot llm rlhf reward-models trlx rlaif

Updated Aug 15, 2023
Python

dannylee1020 / openpo

Star

Synthetic data for fine tuning LLM

python ai evaluation synthetic-data finetuning dpo huggingface synthetic-data-generation llm rlhf rlaif llm-evaluation ai-feedback

Updated Dec 26, 2024
Python

zhaochen0110 / Timo

Star

Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)

temporal-reasoning sota-model llms rlhf rlaif llm-as-a-judge llm-as-evaluator self-critic-framework colm2024

Updated Oct 23, 2024
Python

vicgalle / awesome-rlaif

Sponsor

Star

A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)

awesome research language-model llm rlhf rlaif

Updated Jan 24, 2024

vicgalle / distilled-self-critique

Sponsor

Star

distilled Self-Critique refines the outputs of a LLM with only synthetic data

synthetic-data llm rlaif self-critique

Updated Apr 11, 2024
Jupyter Notebook

mengdi-li / vanilla-RLAIF-pipeline

Star

An implementation of a vanilla RLAIF pipeline, utilizing GPT-2-Large for the summarization task with the TL;DR dataset.

tldr alignment gpt-2 llms rlhf rlaif

Updated Jan 30, 2025
Python

zhuohaoyu / RewardAnything

Star

RewardAnything: Generalizable Principle-Following Reward Models

evaluation alignment llm rlhf reward-models rlaif reward-modeling reasoning-language-models

Updated Jul 15, 2025
HTML

umass-ml4ed / socratic-quest-gen

Star

Code for the paper "Improving Socratic Question Generation using Data Augmentation and Preference Optimization"

finetuning-llms rlaif

Updated May 7, 2024
Python

zd87pl / rlaif-trader

Star

Production-ready RLAIF trading system with multi-agent Claude AI that learns from market outcomes. Features 60+ indicators, foundation models, and serverless deployment.