feat(inference): support merging multiple LoRA adapters before vLLM i… by Manuscrit · Pull Request #57 · longtermrisk/openweights

Manuscrit · 2026-04-03T11:43:23Z

…nference

When lora_adapters (List[str]) is supplied, the job merges all adapters into a single combined adapter via PEFT linear combination on CPU before vLLM is initialised. This keeps the merged rank identical to the input rank so vLLM's max_lora_rank constraint is never violated.

Key changes:

InferenceConfig: new lora_adapters field; validated to require ≥ 2 entries (single adapter stays in model as before, preserving compat).
InferenceJobs.create(): client-side rank-equality assertion across all adapters, with a clear error before any GPU time is spent.
cli.py: new download_adapter() helper (handles org/repo/subfolder paths); new merge_lora_adapters() runs PEFT add_weighted_adapter (combination_type="linear") on CPU, saves the combined adapter to /tmp/merged_lora/, then frees memory before vLLM loads.

…nference When `lora_adapters` (List[str]) is supplied, the job merges all adapters into a single combined adapter via PEFT linear combination on CPU before vLLM is initialised. This keeps the merged rank identical to the input rank so vLLM's max_lora_rank constraint is never violated. Key changes: - `InferenceConfig`: new `lora_adapters` field; validated to require ≥ 2 entries (single adapter stays in `model` as before, preserving compat). - `InferenceJobs.create()`: client-side rank-equality assertion across all adapters, with a clear error before any GPU time is spent. - `cli.py`: new `download_adapter()` helper (handles org/repo/subfolder paths); new `merge_lora_adapters()` runs PEFT `add_weighted_adapter` (combination_type="linear") on CPU, saves the combined adapter to /tmp/merged_lora/, then frees memory before vLLM loads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): support merging multiple LoRA adapters before vLLM i…#57

feat(inference): support merging multiple LoRA adapters before vLLM i…#57
Manuscrit wants to merge 1 commit intolongtermrisk:v0.9from
slacki-ai:feature/multi-lora-inference

Manuscrit commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Manuscrit commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants