add eval capes to sdk by luke-e-schaefer · Pull Request #460 · scaleapi/nucleus-python-client

luke-e-schaefer · 2026-05-12T18:19:19Z

resolves https://linear.app/scale-epd/issue/DE-7460

tests wont pass until https://github.com/scaleapi/scaleapi/pull/142963 is merged

Greptile Summary

This PR adds a new Evaluations V2 feature to the SDK, exposing COCO-style mAP/confusion-matrix/PR-curve metrics for model runs via a new EvaluationV2 resource and supporting DTOs.

NucleusClient gains create_evaluation_v2, get_evaluation_v2, and list_evaluations_v2; the EvaluationV2 object exposes wait_for_completion, charts, examples, delete, and refresh, all following existing SDK conventions for raw-response deletes, is not None payload guards, and DictCompatibleModel DTOs.
All previously nullable server fields (iou, prediction_metadata, item_metadata) are correctly typed as Optional in EvaluationV2MatchExample, preventing ValidationError on FN rows.
The _camelize_filter_value helper intentionally skips recursion into MetadataPredicate.value payloads, which is verified by a dedicated unit test.

Confidence Score: 5/5

Safe to merge; the new EvaluationV2 surface is additive, follows existing SDK patterns, and all previously flagged nullable-field bugs are resolved.

The implementation correctly handles Optional fields on FN/FP rows, uses is not None guards for empty-list payloads, delegates deletes through the established raw-response pathway (which still surfaces HTTP errors via handle_bad_response), and includes thorough unit-test coverage for every new method. The only finding is a wrong tag URL in the CHANGELOG heading.

CHANGELOG.md — the 0.18.5 heading links to the v0.18.4 tag.

Important Files Changed

Filename	Overview
nucleus/evaluation_v2.py	New EvaluationV2 resource with wait/charts/examples/delete/refresh; correct null checks, proper status enum comparisons, and raw-response delete pattern matching the rest of the SDK.
nucleus/data_transfer_object/evaluation_v2.py	New DTOs for filters, charts, and match examples; all nullable server-side fields correctly typed as Optional, _camelize_filter_value helper intentionally preserves predicate value payloads.
nucleus/init.py	Adds create_evaluation_v2, get_evaluation_v2, list_evaluations_v2 to NucleusClient; uses is not None for allowed_label_matches guard (correct), exports new types in all.
tests/test_evaluation_v2.py	Comprehensive unit tests for all new paths: filter serialization, from_json parsing, wait_for_completion polling, delete positional-arg passthrough, and charts/examples HTTP body construction.
CHANGELOG.md	Adds 0.18.5 entry; the release link in the heading incorrectly points to v0.18.4 instead of v0.18.5.
pyproject.toml	Version bumped from 0.18.4 to 0.18.5, no other changes.
docs/index.rst	Adds Evaluations V2 section to Sphinx docs with a correct end-to-end usage example.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant C as NucleusClient
    participant A as Nucleus API

    U->>C: create_evaluation_v2(model_run_id, ...)
    C->>A: "POST modelRun/{id}/evaluationsV2"
    A-->>C: "{evaluation_id: "evalv2_*"}"
    C->>A: "GET evaluationsV2/{evalv2_*}"
    A-->>C: EvaluationV2 payload
    C-->>U: EvaluationV2

    loop wait_for_completion
        U->>C: refresh()
        C->>A: "GET evaluationsV2/{id}"
        A-->>C: status
        C-->>U: updated EvaluationV2
    end

    U->>C: "charts(iou_threshold=0.5, filters=...)"
    C->>A: "GET evaluationsV2/{id}/charts?iouThreshold=0.5"
    A-->>C: EvaluationV2Charts JSON
    C-->>U: EvaluationV2Charts

    U->>C: "examples(match_type="TP", ...)"
    C->>A: "POST evaluationsV2/{id}/examples"
    A-->>C: "{rows: [...], total: N}"
    C-->>U: EvaluationV2ExamplesPage

    U->>C: delete()
    C->>A: "DELETE evaluationsV2/{id}"
    A-->>C: 204 No Content

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
CHANGELOG.md:8
The 0.18.5 release heading links to the `v0.18.4` tag instead of `v0.18.5`.

```suggestion
## [0.18.5](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.18.5) - 2026-05-28
```

_{Reviews (9): Last reviewed commit: "fix pyproject" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

edwinpav

Overall nice work!

Two main things:

I'd make sure that the user-facing docs/descriptions are not overly complex. Not everyone will know or even care about how the function works behind the scenes, just care what are the params, what are the returns, and the feature that the method provides.
If you want to deploy a new sdk version with these changes, two more files need to be changed and added to this pr:
1. CHANGELOG.md should be updated. The tag link that the CHANGELOG references will be created after this pr is merged into master. You'd add a new release with a new tag here: https://github.com/scaleapi/nucleus-python-client/releases. Feel free to ping for any questions! The process isn't super clear lol
2. The sdk version under tool.poetry should be updated in pyproject.toml
  (see #457 as a reference pr)

edwinpav · 2026-05-27T15:46:48Z

+        self.__dict__.update(updated.__dict__)
+        return self
+
+    def wait_for_completion(


Is this needed because this is not integrated with NucleusJobs? I thought this type of functionality comes built in for the other async functions (dedup async also uses temporal)

correct yeah I don't have any ties back to the nuc jobs currently (since this stuff isn't "technically" in nucleus)...I could set that up tho that would be simple

oh i see, ig if it's in the nucleus sdk might be worth doing that if it's simple. if it shows up on the nucleus jobs page ui that's probably fine but that's probably a call you have more context on to make

yeah i think thats fine too. I'll run that in its own PR set tho after this one (i'll have to update scaleapi too)

…ucleus-python-client into add-eval-capabilities

edwinpav

Everything looks good, just one typo I saw and pyproject.toml still needs an update. After that should be good to go!

add eval capes to sdk

4c6083e

luke-e-schaefer requested review from edwinpav and vinay553 May 12, 2026 18:19

luke-e-schaefer self-assigned this May 12, 2026

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Comment thread nucleus/data_transfer_object/evaluation_v2.py Outdated

Comment thread nucleus/__init__.py Outdated

luke-e-schaefer and others added 2 commits May 12, 2026 13:49

Apply suggestion from @greptile-apps[bot]

36f6b4a

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Apply suggestion from @greptile-apps[bot]

3caaf8d

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Comment thread nucleus/data_transfer_object/evaluation_v2.py Outdated

luke-e-schaefer and others added 3 commits May 12, 2026 14:03

run hooks

13a91b2

merge remote

cce066e

Update nucleus/data_transfer_object/evaluation_v2.py

aced4aa

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Comment thread nucleus/data_transfer_object/evaluation_v2.py

fix p1

866ac71

edwinpav reviewed May 27, 2026

View reviewed changes

luke-e-schaefer added 2 commits May 28, 2026 17:31

address comments

6582163

Merge branch 'master' into add-eval-capabilities

ff6e671

luke-e-schaefer requested a review from edwinpav May 28, 2026 22:55

luke-e-schaefer added 2 commits May 28, 2026 18:21

fix lint

f88b665

Merge branch 'add-eval-capabilities' of https://github.com/scaleapi/n…

cd38ab6

…ucleus-python-client into add-eval-capabilities

edwinpav reviewed Jun 1, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

update version

55a753f

luke-e-schaefer requested a review from edwinpav June 1, 2026 20:07

edwinpav approved these changes Jun 2, 2026

View reviewed changes

luke-e-schaefer added 2 commits June 10, 2026 09:45

Merge branch 'master' into add-eval-capabilities

33083cd

fix pyproject

087fa8b

luke-e-schaefer merged commit 9bda4ae into master Jun 10, 2026
9 checks passed

luke-e-schaefer deleted the add-eval-capabilities branch June 10, 2026 15:27

Conversation

luke-e-schaefer commented May 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwinpav left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

edwinpav May 27, 2026

Choose a reason for hiding this comment

Uh oh!

luke-e-schaefer May 28, 2026

Choose a reason for hiding this comment

Uh oh!

edwinpav May 28, 2026

Choose a reason for hiding this comment

Uh oh!

luke-e-schaefer May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwinpav left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luke-e-schaefer commented May 12, 2026 •

edited by greptile-apps Bot

Loading