add eval capes to sdk#460
Conversation
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
edwinpav
left a comment
There was a problem hiding this comment.
Overall nice work!
Two main things:
- I'd make sure that the user-facing docs/descriptions are not overly complex. Not everyone will know or even care about how the function works behind the scenes, just care what are the params, what are the returns, and the feature that the method provides.
- If you want to deploy a new sdk version with these changes, two more files need to be changed and added to this pr:
-
CHANGELOG.mdshould be updated. The tag link that the CHANGELOG references will be created after this pr is merged into master. You'd add a new release with a new tag here: https://github.com/scaleapi/nucleus-python-client/releases. Feel free to ping for any questions! The process isn't super clear lol -
The sdk
versionundertool.poetryshould be updated inpyproject.toml
(see #457 as a reference pr)
-
| self.__dict__.update(updated.__dict__) | ||
| return self | ||
|
|
||
| def wait_for_completion( |
There was a problem hiding this comment.
Is this needed because this is not integrated with NucleusJobs? I thought this type of functionality comes built in for the other async functions (dedup async also uses temporal)
There was a problem hiding this comment.
correct yeah I don't have any ties back to the nuc jobs currently (since this stuff isn't "technically" in nucleus)...I could set that up tho that would be simple
There was a problem hiding this comment.
oh i see, ig if it's in the nucleus sdk might be worth doing that if it's simple. if it shows up on the nucleus jobs page ui that's probably fine but that's probably a call you have more context on to make
There was a problem hiding this comment.
yeah i think thats fine too. I'll run that in its own PR set tho after this one (i'll have to update scaleapi too)
…ucleus-python-client into add-eval-capabilities
edwinpav
left a comment
There was a problem hiding this comment.
Everything looks good, just one typo I saw and pyproject.toml still needs an update. After that should be good to go!
resolves https://linear.app/scale-epd/issue/DE-7460
tests wont pass until https://github.com/scaleapi/scaleapi/pull/142963 is merged
Greptile Summary
This PR adds a new Evaluations V2 feature to the SDK, exposing COCO-style mAP/confusion-matrix/PR-curve metrics for model runs via a new
EvaluationV2resource and supporting DTOs.NucleusClientgainscreate_evaluation_v2,get_evaluation_v2, andlist_evaluations_v2; theEvaluationV2object exposeswait_for_completion,charts,examples,delete, andrefresh, all following existing SDK conventions for raw-response deletes,is not Nonepayload guards, andDictCompatibleModelDTOs.iou,prediction_metadata,item_metadata) are correctly typed asOptionalinEvaluationV2MatchExample, preventingValidationErroron FN rows._camelize_filter_valuehelper intentionally skips recursion intoMetadataPredicate.valuepayloads, which is verified by a dedicated unit test.Confidence Score: 5/5
Safe to merge; the new EvaluationV2 surface is additive, follows existing SDK patterns, and all previously flagged nullable-field bugs are resolved.
The implementation correctly handles Optional fields on FN/FP rows, uses is not None guards for empty-list payloads, delegates deletes through the established raw-response pathway (which still surfaces HTTP errors via handle_bad_response), and includes thorough unit-test coverage for every new method. The only finding is a wrong tag URL in the CHANGELOG heading.
CHANGELOG.md — the 0.18.5 heading links to the v0.18.4 tag.
Important Files Changed
Sequence Diagram
sequenceDiagram participant U as User participant C as NucleusClient participant A as Nucleus API U->>C: create_evaluation_v2(model_run_id, ...) C->>A: "POST modelRun/{id}/evaluationsV2" A-->>C: "{evaluation_id: "evalv2_*"}" C->>A: "GET evaluationsV2/{evalv2_*}" A-->>C: EvaluationV2 payload C-->>U: EvaluationV2 loop wait_for_completion U->>C: refresh() C->>A: "GET evaluationsV2/{id}" A-->>C: status C-->>U: updated EvaluationV2 end U->>C: "charts(iou_threshold=0.5, filters=...)" C->>A: "GET evaluationsV2/{id}/charts?iouThreshold=0.5" A-->>C: EvaluationV2Charts JSON C-->>U: EvaluationV2Charts U->>C: "examples(match_type="TP", ...)" C->>A: "POST evaluationsV2/{id}/examples" A-->>C: "{rows: [...], total: N}" C-->>U: EvaluationV2ExamplesPage U->>C: delete() C->>A: "DELETE evaluationsV2/{id}" A-->>C: 204 No ContentPrompt To Fix All With AI
Reviews (9): Last reviewed commit: "fix pyproject" | Re-trigger Greptile