Add CVDP benchmark resource server with apptainer instead of docker by arti4nvj · Pull Request #928 · NVIDIA-NeMo/Gym

arti4nvj · 2026-03-20T23:48:46Z

Part of Customer Eval Bench, this is adding CVDP (non-agentic, non-commercial) support to Gym. This is a single pass evaluation using vLLM as a backend. The current code matches the existing public CVDP infra as of 3/5.

resources_server/cvdp --> All the helper scripts and files needed to run the benchmark
resources_server/cvdp/cvdp_lib --> contain files and code straight from CVDP Public Github that are needed for the final report generation
resources_server/cvdp/scripts --> contain all the helper scripts to convert the dataset to gym, create the final report
responses_api_agents/cvdp_agent --> copy of Simple Agent except added support for retries

Instead of Docker, this runs with Apptainer

jmabry · 2026-03-24T04:55:41Z

Code review

Found 2 issues:

Usage double-counting in responses_api_agents/cvdp_agent/app.py — usage = model_response.usage on the first iteration stores a reference, then the if usage: block immediately adds model_response.usage.input_tokens to itself, doubling the first call's token counts. PR feat: Fix duplicated usage counting and errors on empty usage in subsequent model calls #939 (commit c7bb3191) fixed this exact pattern in simple_agent by adding model_response.usage = None immediately after capture and guarding with if usage and model_response.usage:. The cvdp_agent was copied from the pre-fix version.

Gym/responses_api_agents/cvdp_agent/app.py

Lines 105 to 113 in 5a0ffa9

    
           if not usage: 
        
               usage = model_response.usage 
        
           if usage: 
        
               usage.input_tokens += model_response.usage.input_tokens 
        
               usage.output_tokens += model_response.usage.output_tokens 
        
               usage.total_tokens += model_response.usage.total_tokens

responses_api_agents/cvdp_agent/client.py is an unmodified copy-paste from example_single_tool_call / simple_agent — it hardcodes server_name="example_single_tool_call_simple_agent", calls a get_weather tool, and uses "going out in sf tn" as example input. It has module-level executable code that runs on import and would connect to the wrong server. This file should be removed or updated for CVDP.

Gym/responses_api_agents/cvdp_agent/client.py

Lines 22 to 27 in 5a0ffa9

    
           server_client = ServerClient.load_from_global_config() 
        
           task = server_client.post( 
        
               server_name="example_single_tool_call_simple_agent", 
        
               url_path="/v1/responses", 
        
               json=NeMoGymResponseCreateParamsNonStreaming( 
        
                   input=[

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

jmabry

LG%C

roclark

LGTM, thanks @arti4nvj!

Signed-off-by: Arti Jain <artij@nvidia.com>

…a copy of simple agent and irrelevent Signed-off-by: Arti Jain <artij@nvidia.com>

Signed-off-by: Arti Jain <artij@nvidia.com>

Route comprehension categories to BLEU/ROUGE scoring instead of docker-compose harness. Code-generation categories (2-5, 7, 12-14, 16) are unchanged. Also updates convert_to_gym.py to handle comprehension data and adds tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arti Jain <artij@nvidia.com>

Mirrors CVDP's validate_commercial_eda_setup() — warns at startup if eda_sim_image is not set, since categories 12/13/14 will fail at runtime when harness files reference __VERIF_EDA_IMAGE__. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arti Jain <artij@nvidia.com>

Signed-off-by: Arti Jain <artij@nvidia.com>

cmunley1 · 2026-03-24T16:45:41Z

+class SimpleAgent(SimpleResponsesAPIAgent):
+    config: SimpleAgentConfig
+
+    async def responses(


would be better to import rather than duplicate if you can

cmunley1 · 2026-04-07T19:23:42Z

@@ -0,0 +1,9 @@
+# Description
+
+


could you add a sentence or two here at least pointing to the resources server readme

cmunley1 · 2026-04-07T19:24:14Z

+
+# Licensing information
+Code: Apache 2.0
+Data: N/A


and clarify data availability

just pointing to resources server docs is fine i guess

…928) Part of Customer Eval Bench, this is adding CVDP (non-agentic, non-commercial) support to Gym. This is a single pass evaluation using vLLM as a backend. The current code matches the existing [public CVDP infra](https://github.com/NVlabs/cvdp_benchmark) as of 3/5. resources_server/cvdp --> All the helper scripts and files needed to run the benchmark resources_server/cvdp/cvdp_lib --> contain files and code straight from CVDP Public Github that are needed for the final report generation resources_server/cvdp/scripts --> contain all the helper scripts to convert the dataset to gym, create the final report responses_api_agents/cvdp_agent --> copy of Simple Agent except added support for retries Instead of Docker, this runs with Apptainer --------- Signed-off-by: Arti Jain <artij@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 reviewed Mar 23, 2026

View reviewed changes

Comment thread benchmarks/aime24/config.yaml Outdated

cmunley1 requested changes Mar 23, 2026

View reviewed changes

Comment thread resources_servers/cvdp/configs/cvdp.yaml

Comment thread resources_servers/cvdp/env.yaml.example Outdated

Comment thread resources_servers/cvdp/README.md Outdated

Comment thread responses_api_agents/cvdp_agent/app.py

arti4nvj force-pushed the arti/cvdp_v2 branch 3 times, most recently from d0044b0 to 7bec70a Compare March 24, 2026 03:52

arti4nvj requested review from cmunley1, jmabry and roclark March 24, 2026 04:07

cmunley1 reviewed Mar 24, 2026

View reviewed changes

Comment thread resources_servers/cvdp/README.md Outdated

jmabry reviewed Mar 24, 2026

View reviewed changes

Comment thread resources_servers/cvdp/README.md Outdated

jmabry reviewed Mar 24, 2026

View reviewed changes

Comment thread resources_servers/cvdp/README.md Outdated

jmabry reviewed Mar 24, 2026

View reviewed changes

Comment thread resources_servers/cvdp/README.md

jmabry previously approved these changes Mar 24, 2026

View reviewed changes

arti4nvj dismissed jmabry’s stale review via c1e6e94 March 24, 2026 05:03

arti4nvj force-pushed the arti/cvdp_v2 branch from c1e6e94 to 2231f18 Compare March 24, 2026 05:04

roclark previously approved these changes Mar 24, 2026

View reviewed changes

arti4nvj dismissed roclark’s stale review via be5643e March 30, 2026 02:04

arti4nvj force-pushed the arti/cvdp_v2 branch 2 times, most recently from 6c42b52 to be4eff7 Compare March 31, 2026 22:13

roclark requested a review from cmunley1 April 6, 2026 15:28

roclark previously approved these changes Apr 6, 2026

View reviewed changes

arti4nvj added 8 commits April 7, 2026 12:42

Add CVDP benchmark resource server with apptainer

b0a3343

Signed-off-by: Arti Jain <artij@nvidia.com>

added description + value to config yaml

98a3bca

Signed-off-by: Arti Jain <artij@nvidia.com>

got rid of env.yaml.example + updated readme

f91997e

Signed-off-by: Arti Jain <artij@nvidia.com>

updated readme with apptainer

149e88f

Signed-off-by: Arti Jain <artij@nvidia.com>

readme updated to fix resource server to resources server

39ed86a

Signed-off-by: Arti Jain <artij@nvidia.com>

readme updated to link cvdp source

c93b333

Signed-off-by: Arti Jain <artij@nvidia.com>

readme updated to include apptainer install

3a8d8e3

Signed-off-by: Arti Jain <artij@nvidia.com>

readme updated w/ disclaimer for being for eval purposes

cafe90d

Signed-off-by: Arti Jain <artij@nvidia.com>

arti4nvj and others added 8 commits April 7, 2026 12:42

updated app.py to match simple_agent. got rid of client.py since its …

5cc9634

…a copy of simple agent and irrelevent Signed-off-by: Arti Jain <artij@nvidia.com>

updates to app.py

08f31d8

Signed-off-by: Arti Jain <artij@nvidia.com>

edits to get app.py to get 100% on golden solutions

b556aa1

Signed-off-by: Arti Jain <artij@nvidia.com>

updates to killing the apptainer better

e10190c

Signed-off-by: Arti Jain <artij@nvidia.com>

updated config.yaml to use more up to date v1.0.4

28e9ad6

Signed-off-by: Arti Jain <artij@nvidia.com>

README updated for cvdp_agent to link it to resources_server

2f71320

Signed-off-by: Arti Jain <artij@nvidia.com>

arti4nvj dismissed roclark’s stale review via 2f71320 April 7, 2026 19:46

arti4nvj force-pushed the arti/cvdp_v2 branch from 1071053 to 2f71320 Compare April 7, 2026 19:46

cmunley1 approved these changes Apr 7, 2026

View reviewed changes

cmunley1 merged commit 3a9db6f into NVIDIA-NeMo:main Apr 7, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CVDP benchmark resource server with apptainer instead of docker#928

Add CVDP benchmark resource server with apptainer instead of docker#928
cmunley1 merged 16 commits intoNVIDIA-NeMo:mainfrom
arti4nvj:arti/cvdp_v2

arti4nvj commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmabry commented Mar 24, 2026

Uh oh!

jmabry left a comment

Uh oh!

roclark left a comment

Uh oh!

cmunley1 Mar 24, 2026

Uh oh!

cmunley1 Apr 7, 2026

Uh oh!

cmunley1 Apr 7, 2026

Uh oh!

cmunley1 Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arti4nvj commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmabry commented Mar 24, 2026

Code review

Uh oh!

jmabry left a comment

Choose a reason for hiding this comment

Uh oh!

roclark left a comment

Choose a reason for hiding this comment

Uh oh!

cmunley1 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants