feat(launcher): add DFlash support for DeepSeek-V4-Flash target model#1379
feat(launcher): add DFlash support for DeepSeek-V4-Flash target model#1379
Conversation
Add launcher examples and infrastructure to run DeepSeek-V4-Flash as the DFlash speculative decoding target model in vLLM (container: deepseekv4-cu130). New files: - examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py Runtime patch script (P0–P11) applied to the vLLM container at job startup. Adds the EAGLE3/DFlash aux-hidden-state interface to DeepseekV4ForCausalLM, fixes KV cache group allocation for heterogeneous DFlash draft layers, handles the fp8_ds_mla dtype, and bypasses the fp8_fp4_paged_mqa_logits kernel that overflows H100 shared memory with block_size=256 × MLA 576 bytes/token. Includes a PR contribution strategy note in the module docstring. - examples/deepseek-ai/DeepSeek-V4-Flash/vllm_dflash_smoke_test_cw_dfw.yaml CW-DFW H100 smoke test: 8xH100, TP=8, fp8 KV cache, block_size=256, gpu_memory_utilization=0.85 (leaves ~4 GB headroom for Triton JIT), 15 speculative tokens, draft model z-lab/DeepSeek-V4-Flash-DFlash. - examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/ (vllm + sglang MTP smoke tests) MTP smoke test YAMLs for B200, B300, and CW-DFW H100. - common/specdec/sglang_smoke_test.sh SGLang speculative decoding smoke test. - common/specdec/read_vllm_files.sh Diagnostic: dump relevant vLLM source lines. common/specdec/vllm_smoke_test.sh: Add VLLM_PATCH_SCRIPT hook — if set, run the specified Python script before starting vLLM. Allows per-model patches without modifying the common script. common/vllm/query.sh: Port B200/data-synthesis fixes: ulimit -u unlimited, DEEPGEMM_TMPDIR redirect, COPY_MODEL_TO_TMPFS for NFS stale-handle avoidance, and FORCE_AF_V2 patch for torch inductor compatibility with DeepSeek V4 fp8 ops. core.py / slurm_config.py: Expose retries, requeue, and additional_parameters in SlurmConfig and slurm_factory so long-running jobs can use Slurm's native requeue mechanism. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: chenhany <chenhany@nvidia.com>
📝 WalkthroughWalkthroughAdds vLLM runtime and smoke-test enhancements plus DeepSeek‑V4‑Flash DFlash support: new shell utilities and runtime flags, an idempotent Python patcher that edits vLLM and related modules to enable DFlash, new example DFlash model and draft creation scripts/config, and SLURM config fields with executor wiring updates. ChangesvLLM runtime & smoke-test
DeepSeek‑V4‑Flash DFlash example & patching
SLURM configuration & executor wiring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py (2)
82-93: 💤 Low valueConsider adding error handling for write failures.
The
_patch_filefunction doesn't handle potential exceptions frompath.write_text()(e.g., permission denied, disk full). While the container environment is likely controlled, unhandled write failures could silently corrupt state.💡 Proposed defensive error handling
def _patch_file(path: pathlib.Path, old: str, new: str, sentinel: str, label: str) -> bool: src = path.read_text() if sentinel in src: print(f"{label}: already patched") return True if old not in src: print(f"WARNING: {label}: pattern not found — skipping") return False - path.write_text(src.replace(old, new, 1)) - _delete_pyc(path.stem) - print(f"{label}: OK") - return True + try: + path.write_text(src.replace(old, new, 1)) + _delete_pyc(path.stem) + print(f"{label}: OK") + return True + except OSError as e: + print(f"ERROR: {label}: write failed — {e}") + return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py` around lines 82 - 93, The _patch_file function can raise exceptions from path.write_text (e.g., permission errors or disk full) which are not currently handled; update _patch_file to catch exceptions around the write and _delete_pyc call (wrap the path.write_text and _delete_pyc(path.stem) in a try/except), log or print a clear error message including the exception and label, and return False on failure so callers know the patch failed; ensure sentinel/old checks remain unchanged and only the write/delete operations are guarded.
408-414: 💤 Low valueConsider using existing torch import instead of inline import.
The injected code uses
import torch as _tinline. Sinceflash_attn.pyalready importstorch, this redundant import could be avoided by using the existing module-leveltorch.💡 Simplified patch without redundant import
_fa_new = re.sub( r"([ \t]*)raise ValueError\(f\"Unrecognized FP8 dtype: \{kv_cache_dtype\}\"\)", lambda m: ( m.group(1) + "# _dflash_fp8_ds_mla_patch: fp8_ds_mla is e4m3fn stored by the compressor\n" + m.group(1) + "if kv_cache_dtype == \"fp8_ds_mla\":\n" - + m.group(1) + " import torch as _t; return _t.float8_e4m3fn\n" + + m.group(1) + " return torch.float8_e4m3fn\n" + m.group(1) + "raise ValueError(f\"Unrecognized FP8 dtype: {kv_cache_dtype}\")" ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py` around lines 408 - 414, The injected replacement in the re.sub lambda currently does an inline "import torch as _t" when handling kv_cache_dtype == "fp8_ds_mla"; change it to use the existing module-level torch symbol instead (e.g., replace the inline import/alias and return _t.float8_e4m3fn with a direct return of torch.float8_e4m3fn) so the patch uses the already-imported torch in flash_attn.py and avoids a redundant inline import; update the lambda string in the _fa_new assignment that matches the raise ValueError to reference torch directly.tools/launcher/core.py (1)
275-276: 💤 Low valueLong line reduces readability; consider splitting.
Line 276 combines dict unpacking, conditional expression, and getattr in a single expression. While functional, this is hard to parse at a glance.
💡 Proposed refactor for readability
retries=slurm_config.retries, - additional_parameters={**(slurm_config.additional_parameters or {}), **({"requeue": True} if getattr(slurm_config, "requeue", False) else {})}, + additional_parameters={ + **(slurm_config.additional_parameters or {}), + **({"requeue": True} if getattr(slurm_config, "requeue", False) else {}), + }, packager=packager,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/core.py` around lines 275 - 276, The one-line dict merge for additional_parameters is hard to read; extract the merge into a small, clear pre-computation: create a local variable (e.g., merged_additional_parameters) by first copying slurm_config.additional_parameters or {} and then, if getattr(slurm_config, "requeue", False) is True, set merged_additional_parameters["requeue"] = True; finally pass merged_additional_parameters to the function call instead of the long one-liner. This keeps the logic around slurm_config, requeue, and additional_parameters explicit and easy to read.tools/launcher/common/specdec/vllm_smoke_test.sh (1)
549-560: 💤 Low valueConsider using arrays for optional arguments to avoid word-splitting fragility.
The unquoted
${PARALLELISM_ARGS}and${OPTIONAL_ARGS}rely on word splitting, which works for simple flags but can break if values contain spaces or special characters. While currently functional, using arrays throughout would be more robust.♻️ Example: array-based approach
-# Build optional args -OPTIONAL_ARGS="" +# Build optional args as array +OPTIONAL_ARGS=() if [ -n "${REASONING_PARSER:-}" ]; then - OPTIONAL_ARGS="${OPTIONAL_ARGS} --reasoning-parser ${REASONING_PARSER}" + OPTIONAL_ARGS+=(--reasoning-parser "${REASONING_PARSER}") fi # ... similar for other flags ... -VLLM_CMD=(vllm serve "${MODEL}" - --max-num-batched-tokens "${MAX_BATCHED_TOKENS:-32768}" - ${PARALLELISM_ARGS} - --port "${PORT}" - ${OPTIONAL_ARGS}) +VLLM_CMD=(vllm serve "${MODEL}" + --max-num-batched-tokens "${MAX_BATCHED_TOKENS:-32768}" + "${PARALLELISM_ARGS[@]}" + --port "${PORT}" + "${OPTIONAL_ARGS[@]}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/specdec/vllm_smoke_test.sh` around lines 549 - 560, The VLLM command currently injects ${PARALLELISM_ARGS} and ${OPTIONAL_ARGS} via unquoted string expansion which is fragile; change those to arrays and append them into VLLM_CMD as array elements (e.g., ensure PARALLELISM_ARGS and OPTIONAL_ARGS are declared as arrays and replace unquoted expansions with VLLM_CMD+=( "${PARALLELISM_ARGS[@]}" ) and VLLM_CMD+=( "${OPTIONAL_ARGS[@]}" )), keep the conditional appends for --speculative-config and --compilation-config as array additions (VLLM_CMD+=(--speculative-config "$SPEC_CONFIG") etc.), and finally invoke "${VLLM_CMD[@]}" to preserve proper tokenization.tools/launcher/common/vllm/query.sh (1)
103-316: 💤 Low valueConsider extracting shared FORCE_AF_V2 patching logic.
This FORCE_AF_V2 patching block (lines 139-316) is nearly identical to the implementation in
vllm_smoke_test.sh(lines 85-316). While duplicating the logic in each script ensures independence and avoids cross-script dependencies, consider extracting to a shared Python script if maintenance burden increases.The
exec(PATCH_CODE)on line 235 is safe sincePATCH_CODEis a hardcoded string constant, not external input.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/vllm/query.sh` around lines 103 - 316, The FORCE_AF_V2 patching block is duplicated; extract the shared logic into a single reusable script/module (e.g., a Python file that exposes and installs the patch) and have both query.sh and vllm_smoke_test.sh load and run it instead of embedding PATCH_CODE inline. Move the large PATCH_CODE string and the install/exec logic (symbols: PATCH_CODE, vllm_force_af_v2_runtime, _install, exec(PATCH_CODE), and the FORCE_AF_V2 conditional) into the shared script, update each shell script to check FORCE_AF_V2 and call that shared script (or import/execute it) so the behavior remains identical while centralizing maintenance. Ensure the new shared file is installed into a location accessible to the runtime and keep the existing runtime prints and exception handling unchanged.tools/launcher/common/specdec/sglang_smoke_test.sh (1)
65-67: ⚡ Quick winAvoid unconditional pre-release
transformersupgrades in smoke runs.Line [65]-Line [67] mutates runtime dependencies on every execution (
--pre), which makes results non-reproducible and network-sensitive. Gate this behind an opt-in env var and require a tested spec/range.Suggested fix
-echo "Upgrading transformers (--pre for deepseek_v4 support)..." -pip install --upgrade --pre transformers -q || echo "WARNING: transformers upgrade failed, continuing" +if [ "${UPGRADE_TRANSFORMERS:-0}" = "1" ]; then + : "${TRANSFORMERS_SPEC:?Set TRANSFORMERS_SPEC to a tested version/range}" + echo "Upgrading transformers (${TRANSFORMERS_SPEC})..." + pip install --upgrade "${TRANSFORMERS_SPEC}" -q || { + echo "ERROR: transformers upgrade failed" + exit 1 + } +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/specdec/sglang_smoke_test.sh` around lines 65 - 67, The smoke test unconditionally runs pip install --upgrade --pre transformers which mutates runtime deps each run; change the logic around the echo/"pip install --upgrade --pre transformers -q" invocation (the upgrade step labeled "Upgrading transformers (--pre for deepseek_v4 support)...") to only run when an opt-in environment variable (e.g., UPGRADE_TRANSFORMERS or USE_PRETRANSFORMERS) is set, and require a constrained, tested spec (e.g., a VERSION or SPEC env var or documented version range) instead of always using --pre; update the message to reflect the gated behavior and skip the network upgrade by default to keep smoke runs reproducible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/launcher/common/specdec/sglang_smoke_test.sh`:
- Around line 35-42: The cleanup() function should guard use of SERVER_PID and
only remove tmpfs copies this script created: check SERVER_PID exists/ non-empty
using parameter expansion (e.g., test -n "${SERVER_PID-}" or similar) before
running kill/kil -9 to avoid failures under set -u, and introduce a
TMPFS_MODEL_OWNED (or TMPFS_MODEL_CREATED) flag when you create or copy the
tmpfs model in the code that discovers TMPFS_MODEL (the logic around the TMPFS
discovery at the block that sets TMPFS_MODEL near where you check Lines 49-51);
then in cleanup() only rm -rf "$TMPFS_MODEL" when TMPFS_MODEL_OWNED is true so
shared tmpfs paths are not deleted by concurrent jobs.
- Around line 90-104: The loop that tries to write _deepseek_v4_patch.py and
deepseek_v4.pth may fail silently and only exec(STUB) patches the current
interpreter, so the spawned "python -m sglang.launch_server" process won't get
the patch; add a boolean flag (e.g., wrote_pth = False) before iterating
site.getsitepackages() + [site.getusersitepackages()], set it to True when
writing deepseek_v4.pth succeeds, and after the loop check the flag and call
sys.exit(1) (and print an error) if no writable site-packages were found so the
script fails fast instead of continuing to exec(STUB) and spawning the server
without the .pth patch.
In
`@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/sglang_mtp_smoke_test_b200.yaml`:
- Line 24: Remove the unrecognized environment variable
SGLANG_APPLY_CONFIG_BACKUP from the YAML or, if it is intended to be supported,
add handling and documentation: update sglang_smoke_test.sh to read and act on
SGLANG_APPLY_CONFIG_BACKUP (e.g., apply a backup config path or "none"
semantics) and add a comment and README entry describing the variable and valid
values; ensure the YAML entry "- SGLANG_APPLY_CONFIG_BACKUP: \"none\"" is either
deleted or retained only after the script and docs are updated to reference
SGLANG_APPLY_CONFIG_BACKUP.
---
Nitpick comments:
In `@tools/launcher/common/specdec/sglang_smoke_test.sh`:
- Around line 65-67: The smoke test unconditionally runs pip install --upgrade
--pre transformers which mutates runtime deps each run; change the logic around
the echo/"pip install --upgrade --pre transformers -q" invocation (the upgrade
step labeled "Upgrading transformers (--pre for deepseek_v4 support)...") to
only run when an opt-in environment variable (e.g., UPGRADE_TRANSFORMERS or
USE_PRETRANSFORMERS) is set, and require a constrained, tested spec (e.g., a
VERSION or SPEC env var or documented version range) instead of always using
--pre; update the message to reflect the gated behavior and skip the network
upgrade by default to keep smoke runs reproducible.
In `@tools/launcher/common/specdec/vllm_smoke_test.sh`:
- Around line 549-560: The VLLM command currently injects ${PARALLELISM_ARGS}
and ${OPTIONAL_ARGS} via unquoted string expansion which is fragile; change
those to arrays and append them into VLLM_CMD as array elements (e.g., ensure
PARALLELISM_ARGS and OPTIONAL_ARGS are declared as arrays and replace unquoted
expansions with VLLM_CMD+=( "${PARALLELISM_ARGS[@]}" ) and VLLM_CMD+=(
"${OPTIONAL_ARGS[@]}" )), keep the conditional appends for --speculative-config
and --compilation-config as array additions (VLLM_CMD+=(--speculative-config
"$SPEC_CONFIG") etc.), and finally invoke "${VLLM_CMD[@]}" to preserve proper
tokenization.
In `@tools/launcher/common/vllm/query.sh`:
- Around line 103-316: The FORCE_AF_V2 patching block is duplicated; extract the
shared logic into a single reusable script/module (e.g., a Python file that
exposes and installs the patch) and have both query.sh and vllm_smoke_test.sh
load and run it instead of embedding PATCH_CODE inline. Move the large
PATCH_CODE string and the install/exec logic (symbols: PATCH_CODE,
vllm_force_af_v2_runtime, _install, exec(PATCH_CODE), and the FORCE_AF_V2
conditional) into the shared script, update each shell script to check
FORCE_AF_V2 and call that shared script (or import/execute it) so the behavior
remains identical while centralizing maintenance. Ensure the new shared file is
installed into a location accessible to the runtime and keep the existing
runtime prints and exception handling unchanged.
In `@tools/launcher/core.py`:
- Around line 275-276: The one-line dict merge for additional_parameters is hard
to read; extract the merge into a small, clear pre-computation: create a local
variable (e.g., merged_additional_parameters) by first copying
slurm_config.additional_parameters or {} and then, if getattr(slurm_config,
"requeue", False) is True, set merged_additional_parameters["requeue"] = True;
finally pass merged_additional_parameters to the function call instead of the
long one-liner. This keeps the logic around slurm_config, requeue, and
additional_parameters explicit and easy to read.
In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py`:
- Around line 82-93: The _patch_file function can raise exceptions from
path.write_text (e.g., permission errors or disk full) which are not currently
handled; update _patch_file to catch exceptions around the write and _delete_pyc
call (wrap the path.write_text and _delete_pyc(path.stem) in a try/except), log
or print a clear error message including the exception and label, and return
False on failure so callers know the patch failed; ensure sentinel/old checks
remain unchanged and only the write/delete operations are guarded.
- Around line 408-414: The injected replacement in the re.sub lambda currently
does an inline "import torch as _t" when handling kv_cache_dtype ==
"fp8_ds_mla"; change it to use the existing module-level torch symbol instead
(e.g., replace the inline import/alias and return _t.float8_e4m3fn with a direct
return of torch.float8_e4m3fn) so the patch uses the already-imported torch in
flash_attn.py and avoids a redundant inline import; update the lambda string in
the _fa_new assignment that matches the raise ValueError to reference torch
directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e768f4bf-2470-478f-a408-7f7a00c11644
📒 Files selected for processing (17)
tools/launcher/common/specdec/read_vllm_files.shtools/launcher/common/specdec/sglang_smoke_test.shtools/launcher/common/specdec/vllm_smoke_test.shtools/launcher/common/vllm/query.shtools/launcher/core.pytools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/hf_offline_eagle3.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/hf_offline_eagle3_data_cw_dfw.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/sglang_mtp_smoke_test_b200.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/sglang_mtp_smoke_test_b300.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/sglang_mtp_smoke_test_cw_dfw.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/vllm_mtp_smoke_test.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/vllm_mtp_smoke_test_b200.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/vllm_mtp_smoke_test_b300.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/vllm_mtp_smoke_test_cw_dfw.yamltools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.pytools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/vllm_dflash_smoke_test_cw_dfw.yamltools/launcher/slurm_config.py
| cleanup() { | ||
| kill "$SERVER_PID" 2>/dev/null || true | ||
| sleep 2 | ||
| kill -9 "$SERVER_PID" 2>/dev/null || true | ||
| if [ -n "$TMPFS_MODEL" ] && [ -d "$TMPFS_MODEL" ]; then | ||
| echo "Removing tmpfs model copy: $TMPFS_MODEL" | ||
| rm -rf "$TMPFS_MODEL" | ||
| fi |
There was a problem hiding this comment.
Guard cleanup for unset PID and avoid deleting shared tmpfs copies.
With set -u, Line [36] can fail when the script exits before Line [141] assigns SERVER_PID. Also, tmpfs copies discovered at Line [49]-Line [51] are reused but still removed at exit, which can break concurrent jobs using the same path.
Suggested fix
TMPFS_MODEL=""
+SERVER_PID=""
+TMPFS_MODEL_OWNED=0
cleanup() {
- kill "$SERVER_PID" 2>/dev/null || true
- sleep 2
- kill -9 "$SERVER_PID" 2>/dev/null || true
- if [ -n "$TMPFS_MODEL" ] && [ -d "$TMPFS_MODEL" ]; then
+ if [ -n "${SERVER_PID:-}" ] && kill -0 "${SERVER_PID}" 2>/dev/null; then
+ kill "${SERVER_PID}" 2>/dev/null || true
+ sleep 2
+ kill -9 "${SERVER_PID}" 2>/dev/null || true
+ fi
+ if [ "${TMPFS_MODEL_OWNED:-0}" = "1" ] && [ -n "${TMPFS_MODEL:-}" ] && [ -d "${TMPFS_MODEL}" ]; then
echo "Removing tmpfs model copy: $TMPFS_MODEL"
rm -rf "$TMPFS_MODEL"
fi
}
@@
if [ -d "$TMPFS_MODEL" ] && [ -f "$TMPFS_MODEL/config.json" ]; then
echo "Using existing tmpfs model copy: $TMPFS_MODEL"
+ TMPFS_MODEL_OWNED=0
else
@@
cp -r "$MODEL" "$TMPFS_MODEL"
echo "Model copy done: $TMPFS_MODEL"
+ TMPFS_MODEL_OWNED=1
fiAlso applies to: 46-57, 141-141
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/common/specdec/sglang_smoke_test.sh` around lines 35 - 42, The
cleanup() function should guard use of SERVER_PID and only remove tmpfs copies
this script created: check SERVER_PID exists/ non-empty using parameter
expansion (e.g., test -n "${SERVER_PID-}" or similar) before running kill/kil -9
to avoid failures under set -u, and introduce a TMPFS_MODEL_OWNED (or
TMPFS_MODEL_CREATED) flag when you create or copy the tmpfs model in the code
that discovers TMPFS_MODEL (the logic around the TMPFS discovery at the block
that sets TMPFS_MODEL near where you check Lines 49-51); then in cleanup() only
rm -rf "$TMPFS_MODEL" when TMPFS_MODEL_OWNED is true so shared tmpfs paths are
not deleted by concurrent jobs.
| for sp in site.getsitepackages() + [site.getusersitepackages()]: | ||
| if not os.path.isdir(sp): | ||
| continue | ||
| try: | ||
| with open(os.path.join(sp, '_deepseek_v4_patch.py'), 'w') as f: | ||
| f.write(STUB) | ||
| with open(os.path.join(sp, 'deepseek_v4.pth'), 'w') as f: | ||
| f.write('import _deepseek_v4_patch\n') | ||
| print(f"[patch] Wrote deepseek_v4.pth to {sp}") | ||
| break | ||
| except Exception as e: | ||
| print(f"[patch] Could not write to {sp}: {e}") | ||
|
|
||
| exec(STUB) | ||
| PYEOF |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n tools/launcher/common/specdec/sglang_smoke_test.sh | sed -n '85,110p'Repository: NVIDIA/Model-Optimizer
Length of output: 1169
🏁 Script executed:
cat -n tools/launcher/common/specdec/sglang_smoke_test.sh | sed -n '1,95p'Repository: NVIDIA/Model-Optimizer
Length of output: 4915
🏁 Script executed:
cat -n tools/launcher/common/specdec/sglang_smoke_test.sh | sed -n '105,150p'Repository: NVIDIA/Model-Optimizer
Length of output: 2286
Fail fast when the .pth patch cannot be installed for spawned processes.
If all writable site-packages attempts fail (lines 90–102), the script continues silently. exec(STUB) at line 103 only patches the current interpreter; the spawned python -m sglang.launch_server process (line 131) depends on the .pth file for automatic patch application at startup. Without the file, the server starts without deepseek_v4 registered, causing opaque failures downstream.
Track whether .pth was successfully written and exit with an error if all site-packages locations are non-writable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/common/specdec/sglang_smoke_test.sh` around lines 90 - 104,
The loop that tries to write _deepseek_v4_patch.py and deepseek_v4.pth may fail
silently and only exec(STUB) patches the current interpreter, so the spawned
"python -m sglang.launch_server" process won't get the patch; add a boolean flag
(e.g., wrote_pth = False) before iterating site.getsitepackages() +
[site.getusersitepackages()], set it to True when writing deepseek_v4.pth
succeeds, and after the loop check the flag and call sys.exit(1) (and print an
error) if no writable site-packages were found so the script fails fast instead
of continuing to exec(STUB) and spawning the server without the .pth patch.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1379 +/- ##
==========================================
- Coverage 74.46% 69.13% -5.34%
==========================================
Files 464 491 +27
Lines 50089 58234 +8145
==========================================
+ Hits 37300 40261 +2961
- Misses 12789 17973 +5184
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
SGLang MTP smoke tests have not been run and verified. Remove them until they are tested end-to-end on a cluster. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: chenhany <chenhany@nvidia.com>
These files belong to the draft model folder and were not created or verified in this PR. Remove until they have dedicated test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: chenhany <chenhany@nvidia.com>
The z-lab/DeepSeek-V4-Flash-DFlash checkpoint is a 4-layer Qwen3-based scaffold created from z-lab's DFlash architecture reference with random weights. The smoke test verifies the inference pipeline starts and generates tokens, not acceptance rate or generation quality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: chenhany <chenhany@nvidia.com>
Adds a 2-task pipeline variant of vllm_dflash_smoke_test_cw_dfw.yaml: - task_0 creates a randomly-initialized DFlash draft checkpoint from the bundled config.json + dflash.py (4-layer Qwen3 scaffold based on z-lab's DFlash architecture), saving it to /scratchspace/dflash_draft - task_1 runs the existing vllm_smoke_test.sh using that checkpoint Bundles dflash_draft/config.json and dflash_draft/dflash.py so the test is fully self-contained — no pre-trained draft checkpoint required. The random-weight draft validates the DFlash inference stack (patching, KV cache allocation, speculative decoding loop) without needing a trained checkpoint. Signed-off-by: chenhany <chenhany@nvidia.com>
DeepSeek-V4-Flash has compress_ratios={0:3, 4:21, 128:20} so vLLM's
is_deepseek_v4 flag fires, but the DFlash speculative decoding path
does not populate c128a_prefill_topk_indices during prefill metadata
build. This caused an AttributeError on the first inference request.
P12 in patch_vllm_dflash.py patches _forward_prefill to detect the
None case and fall back to SWA-only mode (topk_indices_buffer, top_k=0,
N=0) rather than crashing. This is sufficient for the smoke test which
validates the inference stack, not acceptance rate or quality.
Also: add draft_model to GlobalVariables in core.py, and tune smoke
test env vars (GPU_MEM_UTIL=0.80, MAX_BATCHED_TOKENS=16384) to match
values verified on CW-DFW H100 8-GPU node.
Smoke test result: 8 passed, 0 failed (cicd_1777582482 on CW-DFW).
Signed-off-by: Han-Yun Chen <chenhany@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: chenhany <chenhany@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/create_dflash_draft.sh`:
- Around line 25-33: The script currently calls AutoConfig.from_pretrained and
AutoModel.from_config with trust_remote_code=True which triggers remote-code
execution via the auto_map in the config; instead, import and instantiate the
local DFlash model class directly (the class defined in dflash.py that was
copied earlier) rather than using AutoConfig/AutoModel, or add a
caller-controlled flag to disable trust_remote_code and default it to false;
specifically replace the AutoConfig/AutoModel usage with a direct import of the
local DFlash class (referencing dflash.py and the class name used there) and
call its constructor with the config loaded without invoking trust_remote_code,
or expose a CLI/ENV variable to gate the trust_remote_code boolean when using
AutoModel.
In
`@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/dflash.py`:
- Around line 4-15: The file-level header lacks explicit third-party provenance
for the z-lab-derived scaffold; add a clear provenance block at the top of
dflash.py that states the upstream repository URL and exact commit hash used,
the original project's copyright and license (including SPDX identifier), and a
short excerpt or link to the original license text, while keeping the existing
NVIDIA Apache-2.0 header; ensure the block references "z-lab DFlash reference
implementation" and the original source commit so the file is license-compliant
and traceable.
In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py`:
- Around line 564-576: When topk_indices is None (the c128a_prefill_topk_indices
fallback) don't silently switch to SWA-only; instead require an explicit opt-in
via an environment flag (e.g., something like DEEPSEEK_ALLOW_SWA_FALLBACK)
before setting swa_only=True and using self.topk_indices_buffer; if the env flag
is absent raise or log a one-time error/warning and abort the run. Update the
branch that currently handles topk_indices is None (references: topk_indices,
topk_indices_buffer, swa_only, num_decode_tokens) to check the env flag first,
emit the one-time message (or raise), and only then assign topk_indices =
self.topk_indices_buffer[num_decode_tokens:], set swa_only and top_k/N
accordingly. Ensure the message clearly names c128a_prefill_topk_indices and the
env flag so users can opt into smoke-test behavior.
- Around line 71-75: The hardcoded VLLM path should be derived from the
installed vllm package instead of a fixed string; replace the literal VLLM =
pathlib.Path("/usr/local/.../vllm") with logic that imports the vllm module and
sets VLLM = pathlib.Path(vllm.__file__).parent (or the appropriate package
parent) and handle ImportError to surface a clear message if vllm is not
installed; update any references to VLLM accordingly (look for the VLLM symbol
in this file).
In
`@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/vllm_dflash_smoke_test_cw_dfw.yaml`:
- Around line 18-29: The tuning notes for gpu_memory_utilization and
max_num_batched_tokens are out of sync with task_1 settings: update either the
explanatory comments (gpu_memory_utilization=0.85, max_num_batched_tokens=32768)
to reflect the actual task variables GPU_MEM_UTIL=0.80 and
MAX_BATCHED_TOKENS=16384, or change task_1's GPU_MEM_UTIL and MAX_BATCHED_TOKENS
to match the documented values; ensure consistency for the same settings
referenced elsewhere (also update the repeated notes around the block mentioned
at lines 69-71) so gpu_memory_utilization / GPU_MEM_UTIL and
max_num_batched_tokens / MAX_BATCHED_TOKENS are identical across comments and
task_1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f3104469-0f5f-4db2-81fa-d14bec5bb486
📒 Files selected for processing (6)
tools/launcher/core.pytools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/create_dflash_draft.shtools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/config.jsontools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/dflash.pytools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.pytools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/vllm_dflash_smoke_test_cw_dfw.yaml
✅ Files skipped from review due to trivial changes (1)
- tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/config.json
| python3 - <<'EOF' | ||
| import os, sys, torch | ||
| sys.path.insert(0, os.environ.get("OUT", "/scratchspace/dflash_draft")) | ||
| from transformers import AutoConfig, AutoModel | ||
|
|
||
| out = os.environ.get("OUT", "/scratchspace/dflash_draft") | ||
| print(f"Initializing random DFlash draft model from config: {out}/config.json") | ||
| config = AutoConfig.from_pretrained(out, trust_remote_code=True) | ||
| model = AutoModel.from_config(config, trust_remote_code=True).to(torch.bfloat16) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "create_dflash_draft.sh" -type fRepository: NVIDIA/Model-Optimizer
Length of output: 145
🏁 Script executed:
wc -l tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/create_dflash_draft.shRepository: NVIDIA/Model-Optimizer
Length of output: 146
🏁 Script executed:
cat -n tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/create_dflash_draft.shRepository: NVIDIA/Model-Optimizer
Length of output: 1878
🏁 Script executed:
cat -n tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/dflash.pyRepository: NVIDIA/Model-Optimizer
Length of output: 11077
🏁 Script executed:
cat -n tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/config.jsonRepository: NVIDIA/Model-Optimizer
Length of output: 1470
Avoid executing checkpoint code just to initialize random weights.
This always enables remote-code execution via the auto_map field in the config. Since dflash.py is already copied locally in the same script, instantiate the class directly from that file instead of going through AutoConfig/AutoModel(..., trust_remote_code=True), or at minimum make the flag caller-controlled and default it off.
Possible fix
import os, sys, torch
+import importlib.util
+from pathlib import Path
sys.path.insert(0, os.environ.get("OUT", "/scratchspace/dflash_draft"))
-from transformers import AutoConfig, AutoModel
+from transformers import Qwen3Config
out = os.environ.get("OUT", "/scratchspace/dflash_draft")
print(f"Initializing random DFlash draft model from config: {out}/config.json")
-config = AutoConfig.from_pretrained(out, trust_remote_code=True)
-model = AutoModel.from_config(config, trust_remote_code=True).to(torch.bfloat16)
+spec = importlib.util.spec_from_file_location("dflash", Path(out) / "dflash.py")
+module = importlib.util.module_from_spec(spec)
+assert spec.loader is not None
+spec.loader.exec_module(module)
+config = Qwen3Config.from_pretrained(out)
+model = module.DFlashDraftModel(config).to(torch.bfloat16)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/create_dflash_draft.sh`
around lines 25 - 33, The script currently calls AutoConfig.from_pretrained and
AutoModel.from_config with trust_remote_code=True which triggers remote-code
execution via the auto_map in the config; instead, import and instantiate the
local DFlash model class directly (the class defined in dflash.py that was
copied earlier) rather than using AutoConfig/AutoModel, or add a
caller-controlled flag to disable trust_remote_code and default it to false;
specifically replace the AutoConfig/AutoModel usage with a direct import of the
local DFlash class (referencing dflash.py and the class name used there) and
call its constructor with the config loaded without invoking trust_remote_code,
or expose a CLI/ENV variable to gate the trust_remote_code boolean when using
AutoModel.
| # DFlash draft model architecture for DeepSeek-V4-Flash. | ||
| # | ||
| # This is a 4-layer Qwen3-based scaffold used as the DFlash speculative decoding | ||
| # draft model. The architecture is adapted from z-lab's DFlash reference implementation. | ||
| # | ||
| # Each decoder layer performs cross-attention over the target model's hidden states | ||
| # (aux outputs from selected target layers) alongside the draft model's own hidden states. | ||
| # The `fc` + `hidden_norm` project the concatenated target hidden states down to hidden_size | ||
| # before cross-attention. | ||
| # | ||
| # This file is bundled with the smoke test config so that a randomly-initialized draft | ||
| # checkpoint can be created at test time without requiring a pre-trained checkpoint. |
There was a problem hiding this comment.
Add explicit third-party provenance for the z-lab-derived scaffold.
The header says this file is adapted from z-lab's DFlash reference implementation, but the file only carries the NVIDIA Apache header. If any implementation here was copied or closely adapted, please add the upstream repo + commit and the original license/SPDX details in this file so the example stays license-compliant.
As per coding guidelines, **/*.{py,cpp,hpp,h,cc,cxx,c,ts,tsx,js,jsx}: When copying code from third-party sources, include a reference link with commit hash to the source, the original repository's Copyright/License, and the NVIDIA Apache 2.0 Copyright/License header in the file.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/dflash_draft/dflash.py`
around lines 4 - 15, The file-level header lacks explicit third-party provenance
for the z-lab-derived scaffold; add a clear provenance block at the top of
dflash.py that states the upstream repository URL and exact commit hash used,
the original project's copyright and license (including SPDX identifier), and a
short excerpt or link to the original license text, while keeping the existing
NVIDIA Apache-2.0 header; ensure the block references "z-lab DFlash reference
implementation" and the original source commit so the file is license-compliant
and traceable.
| import pathlib | ||
| import re | ||
| import sys | ||
|
|
||
| VLLM = pathlib.Path("/usr/local/lib/python3.12/dist-packages/vllm") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "patch_vllm_dflash.py" -type fRepository: NVIDIA/Model-Optimizer
Length of output: 143
🏁 Script executed:
# Once we find the file, let's read it
cat -n tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.pyRepository: NVIDIA/Model-Optimizer
Length of output: 33038
🏁 Script executed:
# Also check file size first to determine how to read it
wc -l tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.pyRepository: NVIDIA/Model-Optimizer
Length of output: 145
Resolve the vLLM package root dynamically.
Hardcoding /usr/local/lib/python3.12/dist-packages/vllm ties the script to a single Python version and installation layout. On the next base-image update or non-standard vLLM installation, this path will fail silently or not be found. Derive the path from the imported module instead.
Suggested fix
import pathlib
import re
import sys
+import vllm
-VLLM = pathlib.Path("/usr/local/lib/python3.12/dist-packages/vllm")
+VLLM = pathlib.Path(vllm.__file__).resolve().parent🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py`
around lines 71 - 75, The hardcoded VLLM path should be derived from the
installed vllm package instead of a fixed string; replace the literal VLLM =
pathlib.Path("/usr/local/.../vllm") with logic that imports the vllm module and
sets VLLM = pathlib.Path(vllm.__file__).parent (or the appropriate package
parent) and handle ImportError to surface a clear message if vllm is not
installed; update any references to VLLM accordingly (look for the VLLM symbol
in this file).
| " if topk_indices is None: # _p12_c128a_prefill_fallback_patch\n" | ||
| " # c128a_prefill_topk_indices unavailable; degrade to SWA-only.\n" | ||
| " assert self.topk_indices_buffer is not None\n" | ||
| " topk_indices = self.topk_indices_buffer[num_decode_tokens:]\n" | ||
| " swa_only = True\n" | ||
| " top_k = 0\n" | ||
| " N = 0\n" | ||
| " else:\n" | ||
| " top_k = topk_indices.shape[-1]\n" | ||
| " # Compressed region must fit the full compressed pool (seq_len //\n" | ||
| " # compress_ratio), not just top_k. top_k bounds how many indices\n" | ||
| " # the indexer selects, not the pool size it indexes into.\n" | ||
| " N = (self.max_model_len + self.compress_ratio - 1) // self.compress_ratio" |
There was a problem hiding this comment.
Gate the SWA-only fallback behind an explicit smoke-test opt-in.
When c128a_prefill_topk_indices is missing, this silently changes attention semantics and still lets the run continue. That hides real metadata bugs in any non-smoke-test usage. Please require an explicit env flag and emit a one-time warning or error otherwise.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py`
around lines 564 - 576, When topk_indices is None (the
c128a_prefill_topk_indices fallback) don't silently switch to SWA-only; instead
require an explicit opt-in via an environment flag (e.g., something like
DEEPSEEK_ALLOW_SWA_FALLBACK) before setting swa_only=True and using
self.topk_indices_buffer; if the env flag is absent raise or log a one-time
error/warning and abort the run. Update the branch that currently handles
topk_indices is None (references: topk_indices, topk_indices_buffer, swa_only,
num_decode_tokens) to check the env flag first, emit the one-time message (or
raise), and only then assign topk_indices =
self.topk_indices_buffer[num_decode_tokens:], set swa_only and top_k/N
accordingly. Ensure the message clearly names c128a_prefill_topk_indices and the
env flag so users can opt into smoke-test behavior.
| # Key config choices: | ||
| # block_size=256 — required: SWA layers use window_size=256 so the page constraint | ||
| # max(sm_page_sizes) ≤ max(all_page_sizes) forces block_size ≥ 256 | ||
| # gpu_memory_utilization=0.85 — leaves ~4 GB headroom per GPU for Triton JIT compilation | ||
| # of the DeepSeek compressor kernel on first inference | ||
| # enforce_eager — disables CUDA graph capture; avoids NoneType tensor errors | ||
| # in deepseek_v4_attention CUDA graph execution with DFlash patches | ||
| # max_num_batched_tokens=32768 — vLLM's default; required so spec-decoding token budget | ||
| # (max_num_batched_tokens minus per-request draft overhead) | ||
| # stays positive. Setting this too low (e.g. 4096) causes | ||
| # "max_num_scheduled_tokens is set to negative value" error. | ||
| # |
There was a problem hiding this comment.
Keep the tuning notes in sync with the actual task settings.
The comments describe gpu_memory_utilization=0.85 and max_num_batched_tokens=32768, but task_1 sets GPU_MEM_UTIL=0.80 and MAX_BATCHED_TOKENS=16384. Please update either the notes or the values so operators debug the right configuration.
Also applies to: 69-71
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tools/launcher/examples/deepseek-ai/DeepSeek-V4-Flash/vllm_dflash_smoke_test_cw_dfw.yaml`
around lines 18 - 29, The tuning notes for gpu_memory_utilization and
max_num_batched_tokens are out of sync with task_1 settings: update either the
explanatory comments (gpu_memory_utilization=0.85, max_num_batched_tokens=32768)
to reflect the actual task variables GPU_MEM_UTIL=0.80 and
MAX_BATCHED_TOKENS=16384, or change task_1's GPU_MEM_UTIL and MAX_BATCHED_TOKENS
to match the documented values; ensure consistency for the same settings
referenced elsewhere (also update the repeated notes around the block mentioned
at lines 69-71) so gpu_memory_utilization / GPU_MEM_UTIL and
max_num_batched_tokens / MAX_BATCHED_TOKENS are identical across comments and
task_1.
Add launcher examples and infrastructure to run DeepSeek-V4-Flash as the DFlash speculative decoding target model in vLLM (container: deepseekv4-cu130).
New files:
examples/deepseek-ai/DeepSeek-V4-Flash/patch_vllm_dflash.py Runtime patch script (P0–P11) applied to the vLLM container at job startup. Adds the EAGLE3/DFlash aux-hidden-state interface to DeepseekV4ForCausalLM, fixes KV cache group allocation for heterogeneous DFlash draft layers, handles the fp8_ds_mla dtype, and bypasses the fp8_fp4_paged_mqa_logits kernel that overflows H100 shared memory with block_size=256 × MLA 576 bytes/token. Includes a PR contribution strategy note in the module docstring.
examples/deepseek-ai/DeepSeek-V4-Flash/vllm_dflash_smoke_test_cw_dfw.yaml CW-DFW H100 smoke test: 8xH100, TP=8, fp8 KV cache, block_size=256, gpu_memory_utilization=0.85 (leaves ~4 GB headroom for Triton JIT), 15 speculative tokens, draft model z-lab/DeepSeek-V4-Flash-DFlash.
examples/deepseek-ai/DeepSeek-V4-Flash-DFlash/ (vllm + sglang MTP smoke tests) MTP smoke test YAMLs for B200, B300, and CW-DFW H100.
common/specdec/sglang_smoke_test.sh SGLang speculative decoding smoke test.
common/specdec/read_vllm_files.sh Diagnostic: dump relevant vLLM source lines.
common/specdec/vllm_smoke_test.sh:
Add VLLM_PATCH_SCRIPT hook — if set, run the specified Python script before starting vLLM. Allows per-model patches without modifying the common script.
common/vllm/query.sh:
Port B200/data-synthesis fixes: ulimit -u unlimited, DEEPGEMM_TMPDIR redirect, COPY_MODEL_TO_TMPFS for NFS stale-handle avoidance, and FORCE_AF_V2 patch for torch inductor compatibility with DeepSeek V4 fp8 ops.
core.py / slurm_config.py:
Expose retries, requeue, and additional_parameters in SlurmConfig and slurm_factory so long-running jobs can use Slurm's native requeue mechanism.
What does this PR do?
Type of change: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
New Features
Chores