Add CVDP benchmark resource server with apptainer instead of docker#928
Merged
cmunley1 merged 16 commits intoNVIDIA-NeMo:mainfrom Apr 7, 2026
Merged
Add CVDP benchmark resource server with apptainer instead of docker#928cmunley1 merged 16 commits intoNVIDIA-NeMo:mainfrom
cmunley1 merged 16 commits intoNVIDIA-NeMo:mainfrom
Conversation
cmunley1
reviewed
Mar 23, 2026
cmunley1
requested changes
Mar 23, 2026
d0044b0 to
7bec70a
Compare
cmunley1
reviewed
Mar 24, 2026
jmabry
reviewed
Mar 24, 2026
jmabry
reviewed
Mar 24, 2026
jmabry
reviewed
Mar 24, 2026
Code reviewFound 2 issues:
Gym/responses_api_agents/cvdp_agent/app.py Lines 105 to 113 in 5a0ffa9
Gym/responses_api_agents/cvdp_agent/client.py Lines 22 to 27 in 5a0ffa9 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
roclark
previously approved these changes
Mar 24, 2026
6c42b52 to
be4eff7
Compare
roclark
previously approved these changes
Apr 6, 2026
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
…a copy of simple agent and irrelevent Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
Route comprehension categories to BLEU/ROUGE scoring instead of docker-compose harness. Code-generation categories (2-5, 7, 12-14, 16) are unchanged. Also updates convert_to_gym.py to handle comprehension data and adds tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arti Jain <artij@nvidia.com>
Mirrors CVDP's validate_commercial_eda_setup() — warns at startup if eda_sim_image is not set, since categories 12/13/14 will fail at runtime when harness files reference __VERIF_EDA_IMAGE__. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arti Jain <artij@nvidia.com>
Signed-off-by: Arti Jain <artij@nvidia.com>
cmunley1
approved these changes
Apr 7, 2026
| class SimpleAgent(SimpleResponsesAPIAgent): | ||
| config: SimpleAgentConfig | ||
|
|
||
| async def responses( |
Contributor
There was a problem hiding this comment.
would be better to import rather than duplicate if you can
| @@ -0,0 +1,9 @@ | |||
| # Description | |||
|
|
|||
|
|
|||
Contributor
There was a problem hiding this comment.
could you add a sentence or two here at least pointing to the resources server readme
|
|
||
| # Licensing information | ||
| Code: Apache 2.0 | ||
| Data: N/A |
Contributor
There was a problem hiding this comment.
and clarify data availability
Contributor
There was a problem hiding this comment.
just pointing to resources server docs is fine i guess
cmunley1
pushed a commit
that referenced
this pull request
Apr 8, 2026
…928) Part of Customer Eval Bench, this is adding CVDP (non-agentic, non-commercial) support to Gym. This is a single pass evaluation using vLLM as a backend. The current code matches the existing [public CVDP infra](https://github.com/NVlabs/cvdp_benchmark) as of 3/5. resources_server/cvdp --> All the helper scripts and files needed to run the benchmark resources_server/cvdp/cvdp_lib --> contain files and code straight from CVDP Public Github that are needed for the final report generation resources_server/cvdp/scripts --> contain all the helper scripts to convert the dataset to gym, create the final report responses_api_agents/cvdp_agent --> copy of Simple Agent except added support for retries Instead of Docker, this runs with Apptainer --------- Signed-off-by: Arti Jain <artij@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: cmunley1 <cmunley@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of Customer Eval Bench, this is adding CVDP (non-agentic, non-commercial) support to Gym. This is a single pass evaluation using vLLM as a backend. The current code matches the existing public CVDP infra as of 3/5.
resources_server/cvdp --> All the helper scripts and files needed to run the benchmark
resources_server/cvdp/cvdp_lib --> contain files and code straight from CVDP Public Github that are needed for the final report generation
resources_server/cvdp/scripts --> contain all the helper scripts to convert the dataset to gym, create the final report
responses_api_agents/cvdp_agent --> copy of Simple Agent except added support for retries
Instead of Docker, this runs with Apptainer