Skip to content

Comments

[tx] Introduce optimization step metrics dataclass#1191

Merged
pcmoritz merged 3 commits intoNovaSky-AI:mainfrom
pcmoritz:optim-step-metrics
Feb 20, 2026
Merged

[tx] Introduce optimization step metrics dataclass#1191
pcmoritz merged 3 commits intoNovaSky-AI:mainfrom
pcmoritz:optim-step-metrics

Conversation

@pcmoritz
Copy link
Collaborator

@pcmoritz pcmoritz commented Feb 20, 2026

This is in preparation for merging #1008 and to make it easier to introduce metrics.


Open with Devin

@pcmoritz pcmoritz added the tx label Feb 20, 2026
gemini-code-assist[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@pcmoritz pcmoritz changed the title [tx] Introduce optimization step metrics dictionary [tx] Introduce optimization step metrics dataclass Feb 20, 2026
@pcmoritz
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new OptimStepMetrics dataclass to standardize the reporting of optimization metrics in the JAX backend. It also includes a safety fix for gradient averaging and updates tests to reflect these changes. The most significant change is in the optim_step logic, which now proceeds with an optimizer update even when no gradients are accumulated, potentially applying weight decay unexpectedly.

Comment on lines +114 to +115
"skyrl.ai/grad_norm": self.grad_norm.item(),
"skyrl.ai/learning_rate": self.learning_rate.item(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metric keys "skyrl.ai/grad_norm" and "skyrl.ai/learning_rate" are hardcoded as strings. It would be better to define these as constants (e.g., in tx.tinker.types) to ensure consistency across the codebase and tests, and to avoid potential typos.

if self.accumulated_grads.counts[adapter_index] == 0:
logger.warning(f"No accumulated gradients for model {model_id}, skipping optimizer step")
return types.OptimStepOutput(metrics={"skyrl.ai/learning_rate": learning_rate})
logger.warning(f"No accumulated gradients for model {model_id}; applying step with zero gradients")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of the early return when counts[adapter_index] == 0 changes the behavior of optim_step. Previously, the step was skipped entirely. Now, the code proceeds to call _compute_grads_and_update, which applies an optimizer update with zero gradients. For optimizers like AdamW, this will still apply weight decay to the parameters, which might be unintended if the user expects the step to be a no-op when no gradients are present. If the goal is to ensure metrics are always returned, consider restoring the early return but returning a OptimStepOutput with the expected metrics.

Suggested change
logger.warning(f"No accumulated gradients for model {model_id}; applying step with zero gradients")
logger.warning(f"No accumulated gradients for model {model_id}, skipping optimizer step")
return types.OptimStepOutput(metrics={"skyrl.ai/grad_norm": 0.0, "skyrl.ai/learning_rate": learning_rate})

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the semantics of an empty gradient step are well specified, but applying an empty optimizer step seems like a reasonable thing to do and is also in line with reporting a zero gradient norm. This case shouldn't be common, and handling everything uniformly (zero gradient norm, apply optimizer step) seems like good semantics to me.

@pcmoritz pcmoritz merged commit 8b5fa9c into NovaSky-AI:main Feb 20, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant