Skip to content

Comments

Add frozen model inference engine support for hosting reward models without weight update#1055

Draft
ghShu wants to merge 1 commit intoNovaSky-AI:mainfrom
ghShu:gshu/extend-inference-engine-for-frozen-reward-model
Draft

Add frozen model inference engine support for hosting reward models without weight update#1055
ghShu wants to merge 1 commit intoNovaSky-AI:mainfrom
ghShu:gshu/extend-inference-engine-for-frozen-reward-model

Conversation

@ghShu
Copy link

@ghShu ghShu commented Feb 8, 2026

This patch adds support for dedicated reward model inference engines that use frozen_model=True (no weight sync, always active). This enables:

  • LLM-as-Judge patterns (RLAIF, Constitutional AI)
  • Process Reward Models (verifiers)
  • Frozen reward models for scoring/evaluation

Changes:

  • Add RewardInferenceConfig and PlacementGenerationEnvConfig to config
  • Add pretrained_lora_path option to SkyRLLoraConfig
  • Add reward_inference section to ppo_base_config.yaml
  • Add get_reward_inference_client() method to BasePPOExp
  • Add frozen_model parameter to create_ray_wrapped_inference_engines()
  • Pass reward_inference_client to generator

This patch adds support for dedicated reward model inference engines that
use frozen_model=True (no weight sync, always active). This enables:

- LLM-as-Judge patterns (RLAIF, Constitutional AI)
- Process Reward Models (verifiers)
- Frozen reward models for scoring/evaluation

Changes:
- Add RewardInferenceConfig and PlacementGenerationEnvConfig to config
- Add pretrained_lora_path option to SkyRLLoraConfig
- Add reward_inference section to ppo_base_config.yaml
- Add get_reward_inference_client() method to BasePPOExp
- Add frozen_model parameter to create_ray_wrapped_inference_engines()
- Pass reward_inference_client to generator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant