[NNX] NNX migration prep (8/N): NNX native lora grpo by ecnal-cienet · Pull Request #3824 · AI-Hypercomputer/maxtext

ecnal-cienet · 2026-05-06T14:33:19Z

NNX Migration Route Map

✅ Add NNX scaffolding: pure_nnx flag, init_state_fn, TrainStateNNX, NNX utils. Linen workflow unchanged. (PR NNX migration prep (1/N): pure_nnx flag and init_state_fn scaffolding #3427)
✅ NNX sharding utilities: get_abstract_state_nnx, get_named_sharding_nnx, set_named_sharding_nnx, get_partition_spec_nnx, get_mesh_from_config. (PR NNX migration prep (2/N): NNX utils and sharding utilities #3470)
✅ NNX fully supported end-to-end: TrainStateNNX, model creation, gradient accumulation, checkpointing, and training loop dispatch. (PR NNX migration prep (3/N): TrainState, model creation, and end-to-end training loop #3500)
✅ NNX sharding diagnostics, bidirectional Linen↔NNX checkpoint conversion utilities, and post-training fixes. (PR [NNX] NNX migration prep (4/N): sharding tools, Linen↔NNX checkpoint utilities, and post-training fixes #3652)
✅ NNX correctness fixes, feature enablements, and vocab tiling on NNX. No-op while pure_nnx=False stays default. (PR [NNX] NNX migration prep (5/N): correctness fixes and feature enablements #3766)
✅ NNX-native DPO. Closes the only remaining hard NotImplementedError on the NNX path; pure_nnx=True + use_dpo=True is now supported.
✅ NNX-native MaxEngine inference. Drops the route-to-Linen path in maxengine.py; pure_nnx=True drives a real NNX inference flow end-to-end (prefill, generate, KV cache). (PR [NNX] NNX migration prep (7/N): NNX-native MaxEngine inference #3821)
🔄 [This PR] NNX-native LoRA + GRPO. Drops the LoRA NotImplementedError in MaxEngine.load_single_adapter and the GRPO pure_nnx warning log; pure_nnx=True now drives both the LoRA serving path and the GRPO trainer end-to-end.
❌ NNX-aware QK-Clip + remaining checkpoint utilities (standalone_checkpointer, generate_param_only_checkpoint, layerwise_quantization, convert_gpt3_ckpt_from_paxml).
❌ Vocab tiling custom_vjp for NNX (perf optimization, not correctness).
❌ Set NNX defaults to True; regenerate sharding goldens; flip back integration-test pure_nnx=False annotations.
❌ Delete Linen-specific code paths and NNX compatibility flags.

Description

This PR implements NNX-native LoRA serving and NNX-native GRPO by adding NNX-shape walkers and step helpers alongside the existing Linen ones, then dispatching on config.pure_nnx. Every NNX modification is gated by if config.pure_nnx:, preserving the Linen path byte-for-byte. The diff spans +551 / −84 across 5 source files, plus 2 new test files (515 lines).

Part 1: NNX-shape LoRA Walkers

New helpers in src/maxtext/utils/lora_utils.py operating on nnx.State pure trees (no {"params": ...} outer wrap):

apply_lora_on_base_params_nnx mutates base_params in place: W += B @ A * scale at target attention paths
unapply_lora_from_base_params_nnx is the symmetric inverse
get_lora_abstract_state_nnx walks the abstract state.model substate and emits a parallel tree with lora_a.kernel/lora_b.kernel leaves at target attention paths and None elsewhere
_nnx_param_subtree drops the outer TrainStateNNX wrapping

The base model stays pristine; "apply" merges the delta into the kernel, "unapply" reverses. No nnx.LoRA wrapper, no model surgery. The on-disk format (HuggingFace PEFT-style lora_a.kernel / lora_b.kernel) round-trips between Linen and NNX consumers unchanged.

Part 2: LoRA Dispatch in `setup_initial_lora_state` and `load_adapter`

Both top-level entry points in lora_utils.py branch on config.pure_nnx:

NNX init builds the abstract base via model_creation_utils.create_nnx_abstract_model + TrainStateNNX(model, optimizer)
Linen branch is the original init_initial_state + get_lora_abstract_state path, untouched

Part 3: MaxEngine LoRA Carve-out Cleared

src/maxtext/inference/maxengine/maxengine.py:

load_single_adapter no longer raises NotImplementedError on pure_nnx
apply_adapter / unapply_adapter branch on config.pure_nnx to call the _nnx siblings

Part 4: GRPO Loss and Step Helpers

src/maxtext/experimental/rl/grpo_trainer.py:

grpo_loss_fn_nnx(policy_model, config, data, dropout_rng, params, reference_model, is_train). Signature matches Linen grpo_loss_fn so callers dispatch on the same shape. dropout_rng and params are unused on NNX; reference_model is a frozen nnx.Module and the reference forward is wrapped in stop_gradient. Returns (loss, LossAux), same dataclass as Linen.
_train_step_nnx: nnx.merge(graphdef, state) to reconstruct TrainStateNNX, value_and_grad over policy params, state.apply_gradients(grads), return nnx.state(new_state, nnx.Not(nnx.Intermediate)).
_eval_step_nnx: same merge + loss-fn call, no state update.
train_step / eval_step early-dispatch on config.pure_nnx; Linen branches verbatim.

Part 5: GRPO setup_train_loop on NNX

grpo_trainer.py::setup_train_loop:

Builds training and inference models via mt.from_config(rngs=create_nnx_rngs(...))
Initializes state via create_nnx_abstract_model + TrainStateNNX(model, optimizer, reference_model=...)
Reference uses the same init seed as policy and is never updated by apply_gradients (sibling field on TrainStateNNX, not embedded in params)
The WARNING: GRPO RL trainer does not yet support pure_nnx natively log is removed

Part 6: GRPO train_loop NNX Branches

grpo_trainer.py::train_loop — three Linen-coupled spots branched on pure_nnx:

Initial reference seeding is skipped on NNX (already set up by init_state_fn)
metric_logger.write_setup_info_to_tensorboard receives a flat nnx.Param state on NNX
Checkpoint save passes the whole TrainStateNNX on NNX; the Linen _split_grpo_state(state)[0] strip is bypassed

The reshard call routes to pathways_reshard_nnx when pure_nnx. New helpers in grpo_utils.py:

compute_log_probs_nnx: NNX model is called directly; intermediates pulled via nnx.state(model, nnx.Intermediate).to_pure_dict()
pathways_reshard_nnx: splits state.model to a flat nnx.Param state, reshards onto the inference mesh, calls inference_engine.update_params(...)

Part 7: Carve-outs (NotImplementedError Sites)

Feature	Tracked In
GRPO + `gradient_accumulation_steps > 1`	Follow-up
GRPO + `scan_layers=False`	Follow-up (needs an NNX-aware unscan helper)

Tests

New unit tests (tests/unit/lora_utils_nnx_test.py, 10 tests):

5 on get_lora_abstract_state_nnx: q/k/v/o shape derivation, target-vs-non-target masking, sharding propagation, leaf type validation, error paths
3 on apply_lora_on_base_params_nnx: apply→unapply identity, target-only mutation, numerical parity vs Linen apply_lora_on_base_params on the same random inputs
2 Linen regression smoke tests on apply_lora_on_base_params and unapply_lora_from_base_params (no existing unit test for these helpers in the tree)

New unit tests (tests/unit/grpo_nnx_test.py, 8 tests):

5 on grpo_loss_fn_nnx: LossAux shape parity, signature compatibility, identical-policy/reference → zero KL, grpo_beta=0 → aux.avg_kl=None, finite policy grads
1 on compute_log_probs_nnx: shape [B, S] → [B, S-1]
2 Linen regression smoke tests on grpo_loss_fn and compute_log_probs (the existing Linen integration test is TPU-only and currently @pytest.mark.skip)

Modified test: tests/unit/maxengine_test.py swaps test_lora_raises_for_nnx (asserted NotImplementedError) for test_lora_load_single_adapter_reaches_loader_on_nnx (asserts FileNotFoundError from the loader).

Existing Linen tests: untouched and still pass; pure_nnx=False stays default.

Test results: 198 passed, 1 skipped (pre-existing CPU-only skip) across the broader NNX regression sweep — maxengine_test, dpo_nnx_test, train_nnx_test, lora_utils_nnx_test, grpo_nnx_test, train_state_nnx_test, train_utils_nnx_test, gradient_accumulation_nnx_test, linen_nnx_converter_test, compare_linen_nnx_checkpoint_test.

Linting: bash lint.sh — pyink + pylint 10.00/10.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

- Add TrainStateNNX (layers/train_state_nnx.py) with checkpoint and unit tests - Refactor model_creation_utils with create_nnx_abstract_model(); add NNX support to muon_utils - Add get_abstract_state_nnx() and get_nnx_named_sharding_with_scan_axis() to maxtext_utils.py - Wire NNX train state into train.py and train_utils.py with pure_nnx dispatch

…raining fixes Part 1 — sharding diagnostics and Linen<->NNX checkpoint utilities: - modify print_shardings_params to support NNX (maxtext_utils.py) - add --pure_nnx flag to run_sharding_dump.py - add bidirectional Linen<->NNX checkpoint conversion utility (linen_nnx_converter.py) - add checkpoint comparison utility for Linen vs NNX validation (compare_linen_nnx_checkpoint.py) Part 2 — post-training bug fixes: - models.py: unpack MultimodalInput before passing to NNXDecoder (was passing the whole object as multimodal_input= kwarg; NNXDecoder only accepts individual fields) - optimizers.py: guard adam_pax against scalar LR from optax.inject_hyperparams (callable() check before invoking learning_rate_fn) - train_distill.py: fix nested NNX transform issue (nnx.value_and_grad inside nnx.jit raises conflicting outer_index error); refactored to jax.value_and_grad + explicit nnx.split/merge pattern; teacher inference moved outside value_and_grad

Bug fixes (run as no-op while pure_nnx=False stays default): - nnx_wrappers.py: add _refresh_variable_trace_state + is_linen_initializing; call from ToLinen after nnx.update to fix "Cannot extract graph node from different trace level" when grad tracers leak into Variable._trace_state. - gpt_oss.py / olmo3.py: replace inline nn.Dropout(...) with self.dropout = linears.Dropout(...) in __init__ to fix CallCompactUnboundModuleError. - normalizations.py: Qwen3NextRMSNorm signature: eps -> epsilon, accept shard_mode/kernel_axes/parameter_memory_host_offload for callsite parity. - attentions.py / qwen3.py: callsites eps= -> epsilon=. - moe.py: per_expert_scale block moved into the unfused-kernel else branch (was scaling wo even when fused_kernel was active). - models.py: build MTP block as MultiTokenPredictionBlock(...) directly (drop the ToNNX(linen) + lazy_init wrap); pass multimodal_input whole to NNXDecoder instead of unpacking 5 fields. - gradient_accumulation.py: ZeRO-1+GA all-reduce annotation deferred until after lax.scan (reduced/unreduced PartitionSpec is rejected inside scan carry); use nnx.merge(..., copy=True) to avoid Variable reuse. - diloco.py: NNX-aware state handling — state.params -> state.model.filter (nnx.Param), step counter at state.optimizer.step, replace_nnx_model_params helper for jax.lax.cond pytree-structure parity. - train_compile.py: new _collect_nnx_activation_shardings helper (forward pass populates _ACTIVATION_SHARDINGS_DUMP — get_abstract_state_nnx only traces __init__); NNX path now passes 2-arg shaped_train_args (no rng); diloco path patched to handle the 2-vs-3 length difference. - muon_utils.py: get_model_mdn default pure_nnx=True; wrap NNX result as {"params": nnx.to_pure_dict(...)} for parity with Linen tree shape. - nnx_decoders.py: FP8+NNX scan fix — Linen FP8 ops (fp8_nanoo, fp8_gpu) retain tracers in Linen scope across re-traces. Skip jax.checkpoint and use a Python for-loop instead of jax.lax.scan when quantization is FP8. Makes FP8 quantization usable on the NNX path. - train.py (pre-train train_step): return nnx.state(new_state, nnx.Not (nnx.Intermediate)) so sowed forward-pass artifacts (e.g. max_logits for QK-Clip) don't break leaf-count parity with state_mesh_shardings. - llama2.py: pass parameter_memory_host_offload to pre_self_attention_layer _norm RMSNorm (was missing on this norm only). - base.yml: add 4 pipeline-related logical_axis_rules — layers_outside _pipeline, layers_per_stage, num_activations, circular_repeats. Additive, no-op without use_nnx_pipeline=True. NNX feature enablements (clear all 17 "Pure NNX support has not been implemented yet" NotImplementedError sites by routing Linen-coupled utilities to the Linen path; their on-disk format is Linen): - layerwise_quantization.py (2 sites): operates on Linen-format checkpoints via DeepSeek*ToLinen layers. - lora_utils.py (1 site): downstream get_lora_abstract_state expects Linen tree shape; LoRA adapters on disk are Linen. - standalone_checkpointer.py (2 sites): add_entropy_to_checkpoint accesses state.opt_state[0]._replace(mu=..., nu=...) — Linen-only. - generate_param_only_checkpoint.py (3 sites): _possibly_unroll_params and _save_decode_checkpoint use state.params["params"]["decoder"] — Linen. - convert_gpt3_ckpt_from_paxml.py (2 sites): keystr_map targets Linen tree paths (.params['params'], .opt_state.mu['params']). - maxengine.py (3 sites): inference engine uses state.params and serves Linen-format inference checkpoints. - grpo_trainer.py (4 sites): RL trainer is end-to-end Linen-shaped; route to Linen with a clear log warning since NNX-format checkpoints will fail at restore time. Vocab tiling on NNX (real implementation, not just routing): - models.py: add Transformer.logits_from_hidden_states on the NNX Transformer class — wraps NNXDecoder.apply_output_head with the token_embedder; mirrors TransformerLinenPure.logits_from_hidden_states. - vocabulary_tiling.py: add vocab_tiling_nnx_loss — chunks the vocab axis via jax.lax.scan and calls model.logits_from_hidden_states(chunk) per chunk. The NNX model carries its parameters internally so no explicit FSDP gather is needed (unlike the Linen gathered_params pattern). MVP uses default autograd; custom_vjp memory-savings optimization is a follow-up if backward memory becomes a concern. - train.py (NNX loss_fn): replace the NotImplementedError with the call to vocab_tiling_nnx_loss using hidden_states from intermediates. - pyconfig_deprecated.py / configs/types.py: drop the num_vocab_tiling > 1 and enable_nnx validation guards (no longer needed). DPO + NNX retained as NotImplementedError but with a much more informative message (points users at pure_nnx=False workaround). Full implementation is deferred — needs a new TrainState shape carrying both policy and reference NNX models plus an NNX dpo_loss_fn. Stats: 26 source files modified, +406 / -171 lines. Linen invariant verified: pure_nnx / enable_nnx / pure_nnx_decoder still default to False; Linen-path UTs unaffected (3 pre-existing failures on the parent branch remain unchanged — sharding_compare_test::deepseek2-16b, optimizers_test::test_model_integration_kimi-k2-1t, diloco_test::two _slices x2). All "Pure NNX support has not been implemented yet" NotImplementedError sites cleared (was 17, now 0).

Implements NNX-native DPO so that the pure_nnx=True training path no longer raises NotImplementedError on use_dpo runs. The Linen DPO overlay pattern (model.apply(params=..., reference_params=...)) does not translate to NNX modules, which carry their parameters internally. Instead the policy and reference models are held as separate nnx.Module instances on TrainStateNNX, and a new dpo_loss_fn_nnx runs both forwards with stop_gradient on the reference logits. TrainStateNNX: - Add optional `reference_model: nnx.Module` field. apply_gradients continues to update only `self.model`, leaving `self.reference_model` bit-identical across steps. dpo_utils.py: - Add dpo_loss_fn_nnx(policy_model, config, data, dropout_rng, params, reference_model, is_train=True). Signature mirrors the Linen dpo_loss_fn so it slots into gradient_accumulation_loss_and_grad's dispatcher (dropout_rng / params slots are unused for NNX; carried for parity, and reference_model is passed as the single extra_dpo_args entry). With nnx.value_and_grad(..., argnums=0) over the policy, no gradient flows to the reference model's nnx.Param leaves; the explicit jax.lax.stop_gradient on ref_logits is a belt-and-braces guard. - Both dpo_loss_fn (Linen) and dpo_loss_fn_nnx (NNX) now include indexer_loss=0.0 and mtp_loss=0.0 in aux so the gradient_accumulation aux pytree shape matches the non-DPO loss_fn. train.py: - Drop the NotImplementedError in train_step's NNX branch. When use_dpo, dispatch to dpo_loss_fn_nnx with state.reference_model as extra_dpo_args; otherwise use the regular loss_fn. eval_step gains the same dispatch. - diff_wrapper picks _loss_fn / extra_dpo_args from the per-path init block, so both the GA and non-GA NNX paths route DPO identically. - Checkpoint-save _split_dpo_state stripping is now Linen-only; TrainStateNNX saves whole (reference_model included) — the step-0 reload later overwrites reference_model from the step-0 checkpoint. train_utils.py: - NNX init_state_fn materializes a frozen reference_model alongside the policy when config.use_dpo. Both are constructed by _create_model_partial() with config.init_weights_seed, so they start identical (standard DPO practice) until the step-0 reload. - Step-0 checkpoint reload: copy step0_state["model"] into state["reference_model"]. Linen path unchanged. Tests: - New tests/unit/dpo_nnx_test.py (7 tests): TrainStateNNX reference_model init/hasattr semantics; apply_gradients leaves reference bit-identical; aux key set; identical policy/reference yields loss=log(2) and reward_accuracy=0.0 (strict > on equal logratios); dropout_rng/params slots are signature-compat only; nnx.value_and_grad(argnums=0) over the policy yields finite grads on policy params only. - train_nnx_test.py: drop the two stale negative tests (vocab_tiling_raises_not_implemented, train_step_dpo_raises_for_nnx) — both features are now real. Stats: 4 source files + 2 test files, +199/-22 source lines. Linen DPO path behaviorally unchanged (only adds two harmless aux-dict keys); NNX non-DPO path unchanged (all changes gated on config.use_dpo).

…e.py) PR5 audited maxengine.py and routed the inference path to the Linen implementation regardless of pure_nnx, with a comment block explaining that "the flag affects training, not inference serving." That kept the Linen serving path unchanged but meant pure_nnx=True users silently got the Linen engine. This change replaces the route with a real NNX flow: when config.pure_nnx=True, the engine builds an NNX Transformer, splits out (params, cache, rest) with nnx.split, and at every JIT body merges the model concretely with nnx.merge to run the forward pass. Linen is preserved byte-for-byte; every NNX edit is gated `if config.pure_nnx:` and pure_nnx=False is still the default. maxengine.py (__init__): - Build two abstract NNX Transformers on the NNX path: self.model with model_mode=PREFILL (batch=1, single padded prompt) and self.model_ar with model_mode=AUTOREGRESSIVE (batch=micro_batch_size_to_train_on, decode_state shape). Both are needed because NNX cache vars inherit CACHE_BATCH_PREFILL vs CACHE_BATCH from the construction model_mode, and bulk_insert searches for the substring "cache_batch" in the AR-mode logical-axes tuple. nnx.eval_shape is called directly inside nn_partitioning.axis_rules rather than through create_nnx_abstract_model to avoid the jax.set_mesh wrap that trips Flax 0.12.6 on logical-only axes like "norm" (same reason get_abstract_state_nnx avoids set_mesh). - Cache the graphdef from a 3-way nnx.split(Param, Cache, ...) so JIT bodies can pass (params, cache, rest) separately to nnx.merge. The rest slot (RNG vars etc.) is materialized concretely in load_params. maxengine.py (cache adapter + _nnx_run_model): - bulk_insert / _insert_jit / _maybe_*_prefill_result_cache walk the cache via tree_map_with_path and switch on path[-1].key (the cache variable name like "cached_prefill_key"). Linen mutable cache is a plain nested dict. NNX Cache state would expose a ".value" accessor at that position. Bridge via nnx.State.to_pure_dict() (after the model run) and nnx.replace_by_pure_dict (before nnx.merge), so the cache plumbing helpers see the same shape on both paths. - Add _nnx_run_model: nnx.merge(graphdef, params, cache, rest, copy=True) -> model(...) -> nnx.state(model, nnx.Cache).to_pure_dict(). copy=True avoids reusing Variable objects across traces (TraceContextError), mirroring train.py's diff_wrapper workaround. - Add _nnx_cache_state_template / _nnx_init_cache_dict helpers parametrised by mode so prefill (batch 1) and decode_state (batch N) pull from the right abstract model. maxengine.py (load_params): - New _load_params_nnx: accepts user-provided NNX-shape params or loads via from_pretrained. For user-provided params, materializes a concrete model once via _create_model_fn() to capture a real rest state for nnx.merge (wasteful but simple; the from_pretrained branch avoids this). Refreshes self.graphdef from the concrete model so subsequent merges line up exactly. - Builds self.abstract_params, populates self.prefill_kv_cache_annotations and self.kv_cache_annotations (using model_ar for the latter so bulk_insert's substring lookup hits), wraps both into NamedSharding. - pure_nnx + quantization, pure_nnx + LoRA, pure_nnx + stack_prefill_result_cache=True, pure_nnx + prefill_multisampling, and pure_nnx + prefill_concat raise NotImplementedError for now; the Linen path is the workaround. AOT compilation (aot_compile / _compile_generate_and_get_layouts) is not gated and may work as-is; not exercised by tests yet. maxengine.py (init_decode_state, _prefill_jit, _generate_jit): - _init_decode_state_nnx zero-initializes a pure-dict cache from model_ar (so the leading batch dim matches generate's input shape) and builds kv_cache_annotations_named per leaf by reading nnx.Cache.metadata. Tries "out_sharding", "sharding", and "sharding_names" because Flax 0.12.6 renamed these. - _prefill_jit / _generate_jit add an `if config.pure_nnx:` branch that calls _nnx_run_model in place of self.model.apply with mutable=["cache"]. existing_prefix.cache is threaded as a pure-dict cache directly (no params|{"cache":...} dict-merge — params is an nnx.State, not a dict). maxtext_utils.py: - New get_prefill_kv_cache_annotations_nnx / get_kv_cache_annotations_nnx that mirror the Linen helpers' return shape (per-leaf PartitionSpec tree). Both delegate to _nnx_cache_partition_specs which extracts nnx.Cache state via nnx.split, calls get_nnx_named_sharding_with_scan_axis inside nn_partitioning.axis_rules so logical axes ("layers", "cache_batch", "norm", ...) resolve to physical mesh axes, and converts the result to a pure-dict tree. tests/unit/maxengine_test.py: - New tests: test_init_nnx, test_basic_prefill_nnx (with NaN/inf and per-layer cache shape checks), test_basic_decode_nnx (4-step generate with next_pos advancement check), test_quantize_raises_for_nnx, test_lora_raises_for_nnx. - New test_linen_nnx_parity_prefill: bridges Linen-init params into the NNX engine via linen_nnx_converter (convert_linen_to_nnx -> _strip_value_wrappers -> nnx.replace_by_pure_dict) and asserts the NNX engine's prefill matches Linen on the same weights — logits within bf16 tolerance (rtol=0.05, atol=0.1; the test config uses bf16 compute) and exact greedy first-token argmax. - Existing Linen tests untouched. Test summary: 9 passed, 1 skipped (test_chunked_prefill is a pre-existing CPU-only skip). bash lint.sh: codespell + pylint + pyink all green.

…e_nnx warning)

codecov · 2026-05-06T17:49:34Z

Codecov Report

❌ Patch coverage is 68.36237% with 454 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ckpoint_conversion/compare_linen_nnx_checkpoint.py	64.61%	99 Missing and 10 partials ⚠️
src/maxtext/trainers/pre_train/train.py	44.75%	76 Missing and 24 partials ⚠️
src/maxtext/utils/lora_utils.py	63.79%	35 Missing and 7 partials ⚠️
src/maxtext/utils/vocabulary_tiling.py	2.43%	40 Missing ⚠️
...xtext/checkpoint_conversion/linen_nnx_converter.py	87.82%	22 Missing and 16 partials ⚠️
src/maxtext/utils/maxtext_utils.py	83.76%	14 Missing and 5 partials ⚠️
src/maxtext/layers/nnx_decoders.py	36.00%	15 Missing and 1 partial ⚠️
src/maxtext/utils/sharding.py	72.22%	6 Missing and 9 partials ⚠️
src/maxtext/trainers/diloco/diloco.py	36.36%	12 Missing and 2 partials ⚠️
src/maxtext/trainers/pre_train/train_compile.py	26.66%	10 Missing and 1 partial ⚠️
... and 11 more

📢 Thoughts on this report? Let us know!

ecnal-cienet added 6 commits May 5, 2026 13:46

NNX: native LoRA + GRPO (drop maxengine LoRA carve-out, drop GRPO pur…

e138bfa

…e_nnx warning)

ecnal-cienet changed the title ~~Feat/nnx native lora grpo~~ [NNX] NNX migration prep (8/N): Feat/nnx native lora grpo May 6, 2026

ecnal-cienet changed the title ~~[NNX] NNX migration prep (8/N): Feat/nnx native lora grpo~~ [NNX] NNX migration prep (8/N): native lora grpo May 6, 2026

ecnal-cienet changed the title ~~[NNX] NNX migration prep (8/N): native lora grpo~~ [NNX] NNX migration prep (8/N): NNX native lora grpo May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NNX] NNX migration prep (8/N): NNX native lora grpo#3824

[NNX] NNX migration prep (8/N): NNX native lora grpo#3824
ecnal-cienet wants to merge 6 commits intomainfrom
feat/nnx-native-lora-grpo

ecnal-cienet commented May 6, 2026

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ecnal-cienet commented May 6, 2026

NNX Migration Route Map

Description

Part 1: NNX-shape LoRA Walkers

Part 2: LoRA Dispatch in setup_initial_lora_state and load_adapter

Part 3: MaxEngine LoRA Carve-out Cleared

Part 4: GRPO Loss and Step Helpers

Part 5: GRPO setup_train_loop on NNX

Part 6: GRPO train_loop NNX Branches

Part 7: Carve-outs (NotImplementedError Sites)

Tests

Checklist

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Part 2: LoRA Dispatch in `setup_initial_lora_state` and `load_adapter`

codecov Bot commented May 6, 2026 •

edited

Loading