Skip to content

V0.9 - qwen3.5, gemma, olmo support#55

Open
nielsrolf wants to merge 25 commits intomainfrom
v0.9
Open

V0.9 - qwen3.5, gemma, olmo support#55
nielsrolf wants to merge 25 commits intomainfrom
v0.9

Conversation

@nielsrolf
Copy link
Copy Markdown
Collaborator

No description provided.

nielsrolf and others added 25 commits March 11, 2026 09:43
When `allowed_hardware` is specified, always try the first entry
(expected to be cheapest) instead of choosing randomly. This makes
GPU selection deterministic and cost-optimal — the caller orders the
list by preference, and on failure the scheduler removes the failed
entry so the next cycle naturally falls through to the next option.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allow `requires_vram_gb` to be `None` (meaning "don't filter by VRAM,
rely on allowed_hardware instead"). This is useful when the caller
specifies explicit GPU tiers via `allowed_hardware` and doesn't want
the VRAM heuristic to interfere.

Changes:
- `Job.requires_vram_gb` and `Jobs.requires_vram_gb` typed as `int | None`
- Org manager sorts and computes max VRAM with `or 0` fallback
- Worker filters jobs with `or 0` so None doesn't crash the comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests verify that determine_gpu_type always picks the first entry in
allowed_hardware (not random), parses multi-GPU configs correctly, and
falls through to VRAM-based logic when allowed_hardware is None/empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests verify that None VRAM values are handled correctly in:
- Job sorting (treated as 0, sorted last)
- Max VRAM computation (None → 0, no crash)
- Worker hardware filtering (None fits any worker)

All tests are pure-Python logic checks, no DB or RunPod needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `rl` module is imported in jobs/__init__.py but doesn't exist on
disk, causing an ImportError when the package is loaded. Remove it
from both the import statement and __all__.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The health-check thread can set self.current_process = None when
cancelling a job. If this happens while the main thread is in the
log-streaming loop or calling .wait(), it causes:
  AttributeError: 'NoneType' object has no attribute 'wait'

Fix: capture `proc = self.current_process` before the loop so the
local reference remains valid regardless of what the health-check
thread does to self.current_process.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for compatibility with newer library versions:

1. Remove tokenizer= kwarg from WeightedSFTTrainer — newer Unsloth
   patches Trainer.__init__ and captures the tokenizer via the data
   collator instead.

2. Handle BatchEncoding dict return from apply_chat_template —
   transformers 5.x returns a BatchEncoding with .input_ids instead
   of a plain tensor; extract input_ids when present.

3. Compute block length via token-count difference instead of text
   reconstruction — the old find_end_of_block approach fails when
   the tokenizer splits multi-byte UTF-8 characters across token
   boundaries, producing U+FFFD replacement characters that don't
   match the original text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep 3 core tests that verify the actual behavior change (first-entry
selection, multi-GPU parsing, None fallback). Remove 4 tests that were
either redundant (same logic with different data) or trivially covered.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep 5 core tests that verify None handling in sort, max, and worker
filtering. Remove 8 tests that either tested Python builtins (max on
ints, integer comparison) or were redundant edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep 2 AST checks (rl not in imports, rl not in __all__) that directly
verify the fix. Remove filesystem existence check and unrelated module
presence check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep 3 tests that exercise the actual race condition pattern (mid-loop
null, crash without fix, threaded scenario). Remove test_local_ref_survives_null
which only tests that Python local variables survive reassignment of the original.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep 5 tests that verify the actual code fixes (tokenizer kwarg removal,
BatchEncoding handling, len-difference approach). Remove 4 tests: basic
arithmetic checks (15-10=5, 5-5=0, 11-10=1) and a redundant AST check
for find_end_of_block that is already covered by the len-difference test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: weighted SFT compat with newer Unsloth and transformers 5.x
fix: capture process ref before health-check thread can null it
fix: remove broken rl module import from jobs/__init__.py
fix: support requires_vram_gb=None by treating it as 0
fix: pick first allowed_hardware instead of random choice
…structure

- Dockerfile: add llm_blender, --no-deps mergekit, upgrade TRL 0.24→1.0
  and vLLM 0.11.2→latest for transformers 5.x compatibility
- training.py: defer DPO/ORPO imports to avoid pydantic 2.12 + torch.Tensor
  schema generation error on SFT jobs
- orpo_ft.py: fallback import from trl.experimental.orpo for TRL 1.0
- decorators.py: add extra_exceptions param to openai_retry
- temporary_api.py: retry on NotFoundError/BadRequestError during vLLM warmup
- test_integration.py: sequential cookbook runner with fail-fast, resume
  support (--skip-until-cookbook), job detection via DB diff and subprocess
  log fallback, and run log fetching

All 13 cookbook examples verified passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dockerfile: use unsloth/unsloth:2026.3.17-pt2.9.0-vllm-0.16.0 base
  image with vLLM pre-installed, upgrade transformers to 5.3.0 and
  TRL to 1.0.0 with --no-deps
- jobs.py: update base_image to new image tag
- utils.py: unwrap Qwen3VLProcessor to get underlying tokenizer
  (unsloth returns processor for some models, which lacks pad())
- chat_template_spans.py: use underlying tokenizer for apply_chat_template
  to avoid ProcessorMixin bug in transformers 5.2-5.3; add message_index
  to EOS blocks from apply_eos_token_rule
- sft.py: tokenizer→processing_class for Trainer() (transformers 5.x API)
- CLAUDE.md: add RunPod API safety rule

All 13 cookbook examples verified passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove pinned base_image from both examples (v0.8/v0.7 no longer needed)
- sampling_callback.py: work around unsloth #3538 (device corruption) by
  using model.eval() instead of FastLanguageModel.for_inference(), fixing
  _per_layer_device_index on all layers, and using use_cache=False to
  bypass unsloth's buggy fast inference path
- chat_template_spans.py: use .get() for weight/role keys since blocks
  from logprob path don't have weights
- test_integration.py: add both examples back to COOKBOOK_EXAMPLES list

All 15 cookbook examples now passing (13 original + 2 previously skipped).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nielsrolf nielsrolf changed the title V0.9 V0.9 - qwen3.5, gemma, olmo support Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants