Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/features/tracking-metrics.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ icon: "chart-line"
ART writes a metrics row every time you call `model.log(...)`. Those rows go to
`history.jsonl` in the run directory and, if W&B logging is enabled, to W&B.

Serverless training also creates W&B-backed artifacts and runs for each remote
training job so checkpoints can be traced back to their inputs. If W&B "Run
finished" notifications are enabled for your account, a multi-step
`ServerlessBackend` training loop can therefore send one notification per
`backend.train(...)` call. See [ART Backend](/fundamentals/art-backend#serverlessbackend)
for the serverless lifecycle notes and alert workaround.

Use this page for three things:

- understand the metrics ART emits automatically
Expand Down
17 changes: 17 additions & 0 deletions docs/fundamentals/art-backend.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,23 @@ backend = ServerlessBackend(

As your training job progresses, `ServerlessBackend` automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.

Each `backend.train(...)` call submits one remote training job. ART stores the
job inputs and outputs in W&B so that every trained step has its own artifacts,
metrics, and provenance. If your W&B user settings send Slack notifications
when runs finish, that can produce one notification per training step in a loop
such as:

```python
for step in range(num_steps):
groups = await art.gather_trajectory_groups(...)
result = await backend.train(model, groups, learning_rate=1e-5)
```

This is expected for the current serverless training lifecycle. To reduce alert
noise, disable W&B "Run finished" notifications for the account or use a W&B
account/team whose notification settings are dedicated to ART training jobs.
ART still records checkpoints and provenance for each step.

### LocalBackend

The `LocalBackend` class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you're already running your agent on a machine with a GPU.
Expand Down