From 761127576842d865e4c3a5889ff109d671907122 Mon Sep 17 00:00:00 2001 From: Yamini Date: Thu, 4 Jun 2026 01:32:55 -0400 Subject: [PATCH 1/2] docs(fern): rename mkdocs .md sources to .mdx (no content change) Pure renames so git --follow tracks each page's history through the mkdocs->fern conversion (the conversion lands in the next commit). --- .../release-notes/{current-release.md => current-release.mdx} | 0 docs/about/release-notes/{index.md => index.mdx} | 0 docs/acknowledgements/{index.md => index.mdx} | 0 docs/agents/{index.md => index.mdx} | 0 docs/agents/{optimization.md => optimization.mdx} | 0 docs/agents/{plugins.md => plugins.mdx} | 0 docs/agents/{security.md => security.mdx} | 0 docs/anonymizer/{cli.md => cli.mdx} | 0 docs/anonymizer/{index.md => index.mdx} | 0 docs/anonymizer/{quickstart.md => quickstart.mdx} | 0 docs/anonymizer/{sdk-resources.md => sdk-resources.mdx} | 0 docs/anonymizer/tutorials/{index.md => index.mdx} | 0 docs/anonymizer/tutorials/{preview.md => preview.mdx} | 0 docs/anonymizer/tutorials/{run.md => run.mdx} | 0 docs/api/{index.md => index.mdx} | 0 docs/auditor/configs/{index.md => index.mdx} | 0 docs/auditor/configs/{probes.md => probes.mdx} | 0 docs/auditor/configs/{schema.md => schema.mdx} | 0 docs/auditor/{index.md => index.mdx} | 0 docs/auditor/{sdk-resources.md => sdk-resources.mdx} | 0 docs/auditor/targets/{index.md => index.mdx} | 0 .../targets/{inference-gateway.md => inference-gateway.mdx} | 0 docs/auditor/targets/{schema.md => schema.mdx} | 0 docs/auditor/tutorials/{index.md => index.mdx} | 0 .../tutorials/{run-audit-locally.md => run-audit-locally.mdx} | 0 docs/auth/authentication/{index.md => index.mdx} | 0 docs/auth/authentication/{oidc.md => oidc.mdx} | 0 docs/auth/authentication/providers/{azure-ad.md => azure-ad.mdx} | 0 docs/auth/authentication/providers/{generic.md => generic.mdx} | 0 docs/auth/authentication/providers/{index.md => index.mdx} | 0 .../{using-authentication.md => using-authentication.mdx} | 0 docs/auth/authorization/{api-scopes.md => api-scopes.mdx} | 0 docs/auth/authorization/{index.md => index.mdx} | 0 .../authorization/{managing-access.md => managing-access.mdx} | 0 .../{permissions-reference.md => permissions-reference.mdx} | 0 docs/auth/authorization/{policy-engine.md => policy-engine.mdx} | 0 .../{roles-and-permissions.md => roles-and-permissions.mdx} | 0 docs/auth/{concepts.md => concepts.mdx} | 0 docs/auth/deployment/{configuration.md => configuration.mdx} | 0 .../{credential-propagation.md => credential-propagation.mdx} | 0 docs/auth/deployment/{gateway.md => gateway.mdx} | 0 docs/auth/deployment/{hardening.md => hardening.mdx} | 0 docs/auth/{index.md => index.mdx} | 0 docs/auth/{security-model.md => security-model.mdx} | 0 docs/auth/{troubleshooting.md => troubleshooting.mdx} | 0 docs/cli/{configuration.md => configuration.mdx} | 0 docs/cli/{index.md => index.mdx} | 0 docs/cli/{reference.md => reference.mdx} | 0 docs/cli/{troubleshooting.md => troubleshooting.mdx} | 0 .../cli/{working-with-resources.md => working-with-resources.mdx} | 0 docs/contributing/{skills-spec.md => skills-spec.mdx} | 0 docs/customizer/{about.md => about.mdx} | 0 docs/customizer/{index.md => index.mdx} | 0 .../manage-customization-jobs/{cancel-job.md => cancel-job.mdx} | 0 .../manage-customization-jobs/{create-job.md => create-job.mdx} | 0 ...omization-job-reference.md => customization-job-reference.mdx} | 0 .../{get-job-status.md => get-job-status.mdx} | 0 .../{hyperparameters.md => hyperparameters.mdx} | 0 docs/customizer/manage-customization-jobs/{index.md => index.mdx} | 0 .../{list-active-jobs.md => list-active-jobs.mdx} | 0 .../{create-fileset.md => create-fileset.mdx} | 0 .../{create-model-entity.md => create-model-entity.mdx} | 0 docs/customizer/manage-model-entities/{index.md => index.mdx} | 0 docs/customizer/models/{data-format.md => data-format.mdx} | 0 docs/customizer/models/{embedding.md => embedding.mdx} | 0 docs/customizer/models/{gpt-oss.md => gpt-oss.mdx} | 0 docs/customizer/models/{index.md => index.mdx} | 0 docs/customizer/models/{llama-nemotron.md => llama-nemotron.mdx} | 0 docs/customizer/models/{llama.md => llama.mdx} | 0 docs/customizer/models/{mistral.md => mistral.mdx} | 0 docs/customizer/models/{phi.md => phi.mdx} | 0 docs/customizer/models/{qwen.md => qwen.mdx} | 0 .../_snippets/{customizer-prereqs.md => customizer-prereqs.mdx} | 0 .../{format-training-dataset.md => format-training-dataset.mdx} | 0 .../tutorials/{import-hf-model.md => import-hf-model.mdx} | 0 docs/customizer/tutorials/{index.md => index.mdx} | 0 docs/customizer/tutorials/{metrics.md => metrics.mdx} | 0 ...ons-and-models.md => understand-configurations-and-models.mdx} | 0 docs/data-designer/_snippets/{job-results.md => job-results.mdx} | 0 .../_snippets/{preview-results.md => preview-results.mdx} | 0 docs/data-designer/{cli.md => cli.mdx} | 0 docs/data-designer/{execution-modes.md => execution-modes.mdx} | 0 docs/data-designer/{index.md => index.mdx} | 0 docs/data-designer/{migration.md => migration.mdx} | 0 docs/data-designer/{sdk-resources.md => sdk-resources.mdx} | 0 docs/data-designer/tutorials/{basics.md => basics.mdx} | 0 docs/data-designer/tutorials/{index.md => index.mdx} | 0 docs/data-designer/tutorials/{seeding.md => seeding.mdx} | 0 docs/{eula.md => eula.mdx} | 0 docs/evaluator/benchmarks/{agentic.md => agentic.mdx} | 0 docs/evaluator/benchmarks/{custom.md => custom.mdx} | 0 ...er-industry-benchmarks.md => discover-industry-benchmarks.mdx} | 0 docs/evaluator/benchmarks/{hf-secret.md => hf-secret.mdx} | 0 docs/evaluator/benchmarks/{index.md => index.mdx} | 0 docs/evaluator/benchmarks/{industry.md => industry.mdx} | 0 .../benchmarks/{job-management.md => job-management.mdx} | 0 .../benchmarks/{manage-benchmarks.md => manage-benchmarks.mdx} | 0 docs/evaluator/benchmarks/{results.md => results.mdx} | 0 docs/evaluator/{index.md => index.mdx} | 0 .../metrics/{agent-configuration.md => agent-configuration.mdx} | 0 docs/evaluator/metrics/{agentic.md => agentic.mdx} | 0 docs/evaluator/metrics/{index.md => index.mdx} | 0 docs/evaluator/metrics/{job-management.md => job-management.mdx} | 0 docs/evaluator/metrics/{llm-as-a-judge.md => llm-as-a-judge.mdx} | 0 docs/evaluator/metrics/{manage-metrics.md => manage-metrics.mdx} | 0 .../metrics/{model-configuration.md => model-configuration.mdx} | 0 docs/evaluator/metrics/{rag.md => rag.mdx} | 0 docs/evaluator/metrics/{remote.md => remote.mdx} | 0 docs/evaluator/metrics/{results.md => results.mdx} | 0 docs/evaluator/metrics/{similarity.md => similarity.mdx} | 0 docs/evaluator/{sdk-resources.md => sdk-resources.mdx} | 0 docs/evaluator/tutorials/{index.md => index.mdx} | 0 .../{run-llm-judge-evaluation.md => run-llm-judge-evaluation.mdx} | 0 docs/example-applications/{about.md => about.mdx} | 0 docs/get-started/concepts/{entities.md => entities.mdx} | 0 .../concepts/{entity-references.md => entity-references.mdx} | 0 docs/get-started/concepts/{filtering.md => filtering.mdx} | 0 docs/get-started/concepts/{index.md => index.mdx} | 0 docs/get-started/concepts/{manage-files.md => manage-files.mdx} | 0 .../concepts/{manage-secrets.md => manage-secrets.mdx} | 0 docs/get-started/concepts/{projects.md => projects.mdx} | 0 docs/get-started/concepts/{workspaces.md => workspaces.mdx} | 0 docs/get-started/{setup.md => setup.mdx} | 0 docs/guardrails/concepts/{architecture.md => architecture.mdx} | 0 docs/guardrails/concepts/{checks.md => checks.mdx} | 0 .../{configuration-structure.md => configuration-structure.mdx} | 0 .../configurations/{default-configs.md => default-configs.mdx} | 0 docs/guardrails/concepts/configurations/{index.md => index.mdx} | 0 .../configurations/{manage-configs.md => manage-configs.mdx} | 0 docs/guardrails/concepts/{index.md => index.mdx} | 0 docs/guardrails/concepts/{inference.md => inference.mdx} | 0 docs/guardrails/{index.md => index.mdx} | 0 docs/guardrails/{observability.md => observability.mdx} | 0 docs/guardrails/{terminology.md => terminology.mdx} | 0 .../tutorials/{content-safety.md => content-safety.mdx} | 0 .../{deploy-nemoguard-nims.md => deploy-nemoguard-nims.mdx} | 0 docs/guardrails/tutorials/{index.md => index.mdx} | 0 .../tutorials/{injection-detection.md => injection-detection.mdx} | 0 .../tutorials/{multimodal-data.md => multimodal-data.mdx} | 0 .../tutorials/{parallel-rails.md => parallel-rails.mdx} | 0 docs/helm/{index.md => index.mdx} | 0 docs/{index.md => index.mdx} | 0 docs/pysdk/client/{index.md => index.mdx} | 0 docs/pysdk/{index.md => index.mdx} | 0 docs/{requirements.md => requirements.mdx} | 0 docs/run-inference/{about.md => about.mdx} | 0 .../tutorials/{deploy-models.md => deploy-models.mdx} | 0 docs/run-inference/tutorials/{index.md => index.mdx} | 0 .../tutorials/{run-inference.md => run-inference.mdx} | 0 .../about/{data-synthesis.md => data-synthesis.mdx} | 0 docs/safe-synthesizer/about/{evaluation.md => evaluation.mdx} | 0 .../{host-local-development.md => host-local-development.mdx} | 0 docs/safe-synthesizer/about/{index.md => index.mdx} | 0 docs/safe-synthesizer/about/{jobs.md => jobs.mdx} | 0 .../about/{pii-replacement.md => pii-replacement.mdx} | 0 docs/safe-synthesizer/about/{reference.md => reference.mdx} | 0 docs/safe-synthesizer/{getting-started.md => getting-started.mdx} | 0 .../{differential-privacy.md => differential-privacy.mdx} | 0 docs/safe-synthesizer/tutorials/{index.md => index.mdx} | 0 .../{safe-synthesizer-101.md => safe-synthesizer-101.mdx} | 0 docs/set-up/{config-reference.md => config-reference.mdx} | 0 .../set-up/helm/{backup-and-restore.md => backup-and-restore.mdx} | 0 docs/set-up/helm/{database-setup.md => database-setup.mdx} | 0 docs/set-up/helm/{file-storage.md => file-storage.mdx} | 0 docs/set-up/helm/{index.md => index.mdx} | 0 docs/set-up/helm/{ingress.md => ingress.mdx} | 0 docs/set-up/helm/{install.md => install.mdx} | 0 .../helm/{multinode-networking.md => multinode-networking.mdx} | 0 docs/set-up/helm/{openshift.md => openshift.mdx} | 0 .../set-up/helm/{persistent-volumes.md => persistent-volumes.mdx} | 0 docs/set-up/helm/{prerequisites.md => prerequisites.mdx} | 0 docs/set-up/{index.md => index.mdx} | 0 docs/set-up/{manage-jobs.md => manage-jobs.mdx} | 0 docs/set-up/{milvus.md => milvus.mdx} | 0 docs/set-up/{opentelemetry.md => opentelemetry.mdx} | 0 docs/set-up/{security.md => security.mdx} | 0 docs/studio/{agents.md => agents.mdx} | 0 docs/studio/{index.md => index.mdx} | 0 docs/studio/{monitor.md => monitor.mdx} | 0 docs/studio/{suggestions.md => suggestions.mdx} | 0 docs/{support-matrix.md => support-matrix.mdx} | 0 docs/troubleshooting/{cluster-setup.md => cluster-setup.mdx} | 0 docs/troubleshooting/{customizer.md => customizer.mdx} | 0 docs/troubleshooting/{data-designer.md => data-designer.mdx} | 0 docs/troubleshooting/{evaluator.md => evaluator.mdx} | 0 docs/troubleshooting/{guardrails.md => guardrails.mdx} | 0 docs/troubleshooting/{index.md => index.mdx} | 0 docs/troubleshooting/{studio.md => studio.mdx} | 0 188 files changed, 0 insertions(+), 0 deletions(-) rename docs/about/release-notes/{current-release.md => current-release.mdx} (100%) rename docs/about/release-notes/{index.md => index.mdx} (100%) rename docs/acknowledgements/{index.md => index.mdx} (100%) rename docs/agents/{index.md => index.mdx} (100%) rename docs/agents/{optimization.md => optimization.mdx} (100%) rename docs/agents/{plugins.md => plugins.mdx} (100%) rename docs/agents/{security.md => security.mdx} (100%) rename docs/anonymizer/{cli.md => cli.mdx} (100%) rename docs/anonymizer/{index.md => index.mdx} (100%) rename docs/anonymizer/{quickstart.md => quickstart.mdx} (100%) rename docs/anonymizer/{sdk-resources.md => sdk-resources.mdx} (100%) rename docs/anonymizer/tutorials/{index.md => index.mdx} (100%) rename docs/anonymizer/tutorials/{preview.md => preview.mdx} (100%) rename docs/anonymizer/tutorials/{run.md => run.mdx} (100%) rename docs/api/{index.md => index.mdx} (100%) rename docs/auditor/configs/{index.md => index.mdx} (100%) rename docs/auditor/configs/{probes.md => probes.mdx} (100%) rename docs/auditor/configs/{schema.md => schema.mdx} (100%) rename docs/auditor/{index.md => index.mdx} (100%) rename docs/auditor/{sdk-resources.md => sdk-resources.mdx} (100%) rename docs/auditor/targets/{index.md => index.mdx} (100%) rename docs/auditor/targets/{inference-gateway.md => inference-gateway.mdx} (100%) rename docs/auditor/targets/{schema.md => schema.mdx} (100%) rename docs/auditor/tutorials/{index.md => index.mdx} (100%) rename docs/auditor/tutorials/{run-audit-locally.md => run-audit-locally.mdx} (100%) rename docs/auth/authentication/{index.md => index.mdx} (100%) rename docs/auth/authentication/{oidc.md => oidc.mdx} (100%) rename docs/auth/authentication/providers/{azure-ad.md => azure-ad.mdx} (100%) rename docs/auth/authentication/providers/{generic.md => generic.mdx} (100%) rename docs/auth/authentication/providers/{index.md => index.mdx} (100%) rename docs/auth/authentication/{using-authentication.md => using-authentication.mdx} (100%) rename docs/auth/authorization/{api-scopes.md => api-scopes.mdx} (100%) rename docs/auth/authorization/{index.md => index.mdx} (100%) rename docs/auth/authorization/{managing-access.md => managing-access.mdx} (100%) rename docs/auth/authorization/{permissions-reference.md => permissions-reference.mdx} (100%) rename docs/auth/authorization/{policy-engine.md => policy-engine.mdx} (100%) rename docs/auth/authorization/{roles-and-permissions.md => roles-and-permissions.mdx} (100%) rename docs/auth/{concepts.md => concepts.mdx} (100%) rename docs/auth/deployment/{configuration.md => configuration.mdx} (100%) rename docs/auth/deployment/{credential-propagation.md => credential-propagation.mdx} (100%) rename docs/auth/deployment/{gateway.md => gateway.mdx} (100%) rename docs/auth/deployment/{hardening.md => hardening.mdx} (100%) rename docs/auth/{index.md => index.mdx} (100%) rename docs/auth/{security-model.md => security-model.mdx} (100%) rename docs/auth/{troubleshooting.md => troubleshooting.mdx} (100%) rename docs/cli/{configuration.md => configuration.mdx} (100%) rename docs/cli/{index.md => index.mdx} (100%) rename docs/cli/{reference.md => reference.mdx} (100%) rename docs/cli/{troubleshooting.md => troubleshooting.mdx} (100%) rename docs/cli/{working-with-resources.md => working-with-resources.mdx} (100%) rename docs/contributing/{skills-spec.md => skills-spec.mdx} (100%) rename docs/customizer/{about.md => about.mdx} (100%) rename docs/customizer/{index.md => index.mdx} (100%) rename docs/customizer/manage-customization-jobs/{cancel-job.md => cancel-job.mdx} (100%) rename docs/customizer/manage-customization-jobs/{create-job.md => create-job.mdx} (100%) rename docs/customizer/manage-customization-jobs/{customization-job-reference.md => customization-job-reference.mdx} (100%) rename docs/customizer/manage-customization-jobs/{get-job-status.md => get-job-status.mdx} (100%) rename docs/customizer/manage-customization-jobs/{hyperparameters.md => hyperparameters.mdx} (100%) rename docs/customizer/manage-customization-jobs/{index.md => index.mdx} (100%) rename docs/customizer/manage-customization-jobs/{list-active-jobs.md => list-active-jobs.mdx} (100%) rename docs/customizer/manage-model-entities/{create-fileset.md => create-fileset.mdx} (100%) rename docs/customizer/manage-model-entities/{create-model-entity.md => create-model-entity.mdx} (100%) rename docs/customizer/manage-model-entities/{index.md => index.mdx} (100%) rename docs/customizer/models/{data-format.md => data-format.mdx} (100%) rename docs/customizer/models/{embedding.md => embedding.mdx} (100%) rename docs/customizer/models/{gpt-oss.md => gpt-oss.mdx} (100%) rename docs/customizer/models/{index.md => index.mdx} (100%) rename docs/customizer/models/{llama-nemotron.md => llama-nemotron.mdx} (100%) rename docs/customizer/models/{llama.md => llama.mdx} (100%) rename docs/customizer/models/{mistral.md => mistral.mdx} (100%) rename docs/customizer/models/{phi.md => phi.mdx} (100%) rename docs/customizer/models/{qwen.md => qwen.mdx} (100%) rename docs/customizer/tutorials/_snippets/{customizer-prereqs.md => customizer-prereqs.mdx} (100%) rename docs/customizer/tutorials/{format-training-dataset.md => format-training-dataset.mdx} (100%) rename docs/customizer/tutorials/{import-hf-model.md => import-hf-model.mdx} (100%) rename docs/customizer/tutorials/{index.md => index.mdx} (100%) rename docs/customizer/tutorials/{metrics.md => metrics.mdx} (100%) rename docs/customizer/tutorials/{understand-configurations-and-models.md => understand-configurations-and-models.mdx} (100%) rename docs/data-designer/_snippets/{job-results.md => job-results.mdx} (100%) rename docs/data-designer/_snippets/{preview-results.md => preview-results.mdx} (100%) rename docs/data-designer/{cli.md => cli.mdx} (100%) rename docs/data-designer/{execution-modes.md => execution-modes.mdx} (100%) rename docs/data-designer/{index.md => index.mdx} (100%) rename docs/data-designer/{migration.md => migration.mdx} (100%) rename docs/data-designer/{sdk-resources.md => sdk-resources.mdx} (100%) rename docs/data-designer/tutorials/{basics.md => basics.mdx} (100%) rename docs/data-designer/tutorials/{index.md => index.mdx} (100%) rename docs/data-designer/tutorials/{seeding.md => seeding.mdx} (100%) rename docs/{eula.md => eula.mdx} (100%) rename docs/evaluator/benchmarks/{agentic.md => agentic.mdx} (100%) rename docs/evaluator/benchmarks/{custom.md => custom.mdx} (100%) rename docs/evaluator/benchmarks/{discover-industry-benchmarks.md => discover-industry-benchmarks.mdx} (100%) rename docs/evaluator/benchmarks/{hf-secret.md => hf-secret.mdx} (100%) rename docs/evaluator/benchmarks/{index.md => index.mdx} (100%) rename docs/evaluator/benchmarks/{industry.md => industry.mdx} (100%) rename docs/evaluator/benchmarks/{job-management.md => job-management.mdx} (100%) rename docs/evaluator/benchmarks/{manage-benchmarks.md => manage-benchmarks.mdx} (100%) rename docs/evaluator/benchmarks/{results.md => results.mdx} (100%) rename docs/evaluator/{index.md => index.mdx} (100%) rename docs/evaluator/metrics/{agent-configuration.md => agent-configuration.mdx} (100%) rename docs/evaluator/metrics/{agentic.md => agentic.mdx} (100%) rename docs/evaluator/metrics/{index.md => index.mdx} (100%) rename docs/evaluator/metrics/{job-management.md => job-management.mdx} (100%) rename docs/evaluator/metrics/{llm-as-a-judge.md => llm-as-a-judge.mdx} (100%) rename docs/evaluator/metrics/{manage-metrics.md => manage-metrics.mdx} (100%) rename docs/evaluator/metrics/{model-configuration.md => model-configuration.mdx} (100%) rename docs/evaluator/metrics/{rag.md => rag.mdx} (100%) rename docs/evaluator/metrics/{remote.md => remote.mdx} (100%) rename docs/evaluator/metrics/{results.md => results.mdx} (100%) rename docs/evaluator/metrics/{similarity.md => similarity.mdx} (100%) rename docs/evaluator/{sdk-resources.md => sdk-resources.mdx} (100%) rename docs/evaluator/tutorials/{index.md => index.mdx} (100%) rename docs/evaluator/tutorials/{run-llm-judge-evaluation.md => run-llm-judge-evaluation.mdx} (100%) rename docs/example-applications/{about.md => about.mdx} (100%) rename docs/get-started/concepts/{entities.md => entities.mdx} (100%) rename docs/get-started/concepts/{entity-references.md => entity-references.mdx} (100%) rename docs/get-started/concepts/{filtering.md => filtering.mdx} (100%) rename docs/get-started/concepts/{index.md => index.mdx} (100%) rename docs/get-started/concepts/{manage-files.md => manage-files.mdx} (100%) rename docs/get-started/concepts/{manage-secrets.md => manage-secrets.mdx} (100%) rename docs/get-started/concepts/{projects.md => projects.mdx} (100%) rename docs/get-started/concepts/{workspaces.md => workspaces.mdx} (100%) rename docs/get-started/{setup.md => setup.mdx} (100%) rename docs/guardrails/concepts/{architecture.md => architecture.mdx} (100%) rename docs/guardrails/concepts/{checks.md => checks.mdx} (100%) rename docs/guardrails/concepts/configurations/{configuration-structure.md => configuration-structure.mdx} (100%) rename docs/guardrails/concepts/configurations/{default-configs.md => default-configs.mdx} (100%) rename docs/guardrails/concepts/configurations/{index.md => index.mdx} (100%) rename docs/guardrails/concepts/configurations/{manage-configs.md => manage-configs.mdx} (100%) rename docs/guardrails/concepts/{index.md => index.mdx} (100%) rename docs/guardrails/concepts/{inference.md => inference.mdx} (100%) rename docs/guardrails/{index.md => index.mdx} (100%) rename docs/guardrails/{observability.md => observability.mdx} (100%) rename docs/guardrails/{terminology.md => terminology.mdx} (100%) rename docs/guardrails/tutorials/{content-safety.md => content-safety.mdx} (100%) rename docs/guardrails/tutorials/{deploy-nemoguard-nims.md => deploy-nemoguard-nims.mdx} (100%) rename docs/guardrails/tutorials/{index.md => index.mdx} (100%) rename docs/guardrails/tutorials/{injection-detection.md => injection-detection.mdx} (100%) rename docs/guardrails/tutorials/{multimodal-data.md => multimodal-data.mdx} (100%) rename docs/guardrails/tutorials/{parallel-rails.md => parallel-rails.mdx} (100%) rename docs/helm/{index.md => index.mdx} (100%) rename docs/{index.md => index.mdx} (100%) rename docs/pysdk/client/{index.md => index.mdx} (100%) rename docs/pysdk/{index.md => index.mdx} (100%) rename docs/{requirements.md => requirements.mdx} (100%) rename docs/run-inference/{about.md => about.mdx} (100%) rename docs/run-inference/tutorials/{deploy-models.md => deploy-models.mdx} (100%) rename docs/run-inference/tutorials/{index.md => index.mdx} (100%) rename docs/run-inference/tutorials/{run-inference.md => run-inference.mdx} (100%) rename docs/safe-synthesizer/about/{data-synthesis.md => data-synthesis.mdx} (100%) rename docs/safe-synthesizer/about/{evaluation.md => evaluation.mdx} (100%) rename docs/safe-synthesizer/about/{host-local-development.md => host-local-development.mdx} (100%) rename docs/safe-synthesizer/about/{index.md => index.mdx} (100%) rename docs/safe-synthesizer/about/{jobs.md => jobs.mdx} (100%) rename docs/safe-synthesizer/about/{pii-replacement.md => pii-replacement.mdx} (100%) rename docs/safe-synthesizer/about/{reference.md => reference.mdx} (100%) rename docs/safe-synthesizer/{getting-started.md => getting-started.mdx} (100%) rename docs/safe-synthesizer/tutorials/{differential-privacy.md => differential-privacy.mdx} (100%) rename docs/safe-synthesizer/tutorials/{index.md => index.mdx} (100%) rename docs/safe-synthesizer/tutorials/{safe-synthesizer-101.md => safe-synthesizer-101.mdx} (100%) rename docs/set-up/{config-reference.md => config-reference.mdx} (100%) rename docs/set-up/helm/{backup-and-restore.md => backup-and-restore.mdx} (100%) rename docs/set-up/helm/{database-setup.md => database-setup.mdx} (100%) rename docs/set-up/helm/{file-storage.md => file-storage.mdx} (100%) rename docs/set-up/helm/{index.md => index.mdx} (100%) rename docs/set-up/helm/{ingress.md => ingress.mdx} (100%) rename docs/set-up/helm/{install.md => install.mdx} (100%) rename docs/set-up/helm/{multinode-networking.md => multinode-networking.mdx} (100%) rename docs/set-up/helm/{openshift.md => openshift.mdx} (100%) rename docs/set-up/helm/{persistent-volumes.md => persistent-volumes.mdx} (100%) rename docs/set-up/helm/{prerequisites.md => prerequisites.mdx} (100%) rename docs/set-up/{index.md => index.mdx} (100%) rename docs/set-up/{manage-jobs.md => manage-jobs.mdx} (100%) rename docs/set-up/{milvus.md => milvus.mdx} (100%) rename docs/set-up/{opentelemetry.md => opentelemetry.mdx} (100%) rename docs/set-up/{security.md => security.mdx} (100%) rename docs/studio/{agents.md => agents.mdx} (100%) rename docs/studio/{index.md => index.mdx} (100%) rename docs/studio/{monitor.md => monitor.mdx} (100%) rename docs/studio/{suggestions.md => suggestions.mdx} (100%) rename docs/{support-matrix.md => support-matrix.mdx} (100%) rename docs/troubleshooting/{cluster-setup.md => cluster-setup.mdx} (100%) rename docs/troubleshooting/{customizer.md => customizer.mdx} (100%) rename docs/troubleshooting/{data-designer.md => data-designer.mdx} (100%) rename docs/troubleshooting/{evaluator.md => evaluator.mdx} (100%) rename docs/troubleshooting/{guardrails.md => guardrails.mdx} (100%) rename docs/troubleshooting/{index.md => index.mdx} (100%) rename docs/troubleshooting/{studio.md => studio.mdx} (100%) diff --git a/docs/about/release-notes/current-release.md b/docs/about/release-notes/current-release.mdx similarity index 100% rename from docs/about/release-notes/current-release.md rename to docs/about/release-notes/current-release.mdx diff --git a/docs/about/release-notes/index.md b/docs/about/release-notes/index.mdx similarity index 100% rename from docs/about/release-notes/index.md rename to docs/about/release-notes/index.mdx diff --git a/docs/acknowledgements/index.md b/docs/acknowledgements/index.mdx similarity index 100% rename from docs/acknowledgements/index.md rename to docs/acknowledgements/index.mdx diff --git a/docs/agents/index.md b/docs/agents/index.mdx similarity index 100% rename from docs/agents/index.md rename to docs/agents/index.mdx diff --git a/docs/agents/optimization.md b/docs/agents/optimization.mdx similarity index 100% rename from docs/agents/optimization.md rename to docs/agents/optimization.mdx diff --git a/docs/agents/plugins.md b/docs/agents/plugins.mdx similarity index 100% rename from docs/agents/plugins.md rename to docs/agents/plugins.mdx diff --git a/docs/agents/security.md b/docs/agents/security.mdx similarity index 100% rename from docs/agents/security.md rename to docs/agents/security.mdx diff --git a/docs/anonymizer/cli.md b/docs/anonymizer/cli.mdx similarity index 100% rename from docs/anonymizer/cli.md rename to docs/anonymizer/cli.mdx diff --git a/docs/anonymizer/index.md b/docs/anonymizer/index.mdx similarity index 100% rename from docs/anonymizer/index.md rename to docs/anonymizer/index.mdx diff --git a/docs/anonymizer/quickstart.md b/docs/anonymizer/quickstart.mdx similarity index 100% rename from docs/anonymizer/quickstart.md rename to docs/anonymizer/quickstart.mdx diff --git a/docs/anonymizer/sdk-resources.md b/docs/anonymizer/sdk-resources.mdx similarity index 100% rename from docs/anonymizer/sdk-resources.md rename to docs/anonymizer/sdk-resources.mdx diff --git a/docs/anonymizer/tutorials/index.md b/docs/anonymizer/tutorials/index.mdx similarity index 100% rename from docs/anonymizer/tutorials/index.md rename to docs/anonymizer/tutorials/index.mdx diff --git a/docs/anonymizer/tutorials/preview.md b/docs/anonymizer/tutorials/preview.mdx similarity index 100% rename from docs/anonymizer/tutorials/preview.md rename to docs/anonymizer/tutorials/preview.mdx diff --git a/docs/anonymizer/tutorials/run.md b/docs/anonymizer/tutorials/run.mdx similarity index 100% rename from docs/anonymizer/tutorials/run.md rename to docs/anonymizer/tutorials/run.mdx diff --git a/docs/api/index.md b/docs/api/index.mdx similarity index 100% rename from docs/api/index.md rename to docs/api/index.mdx diff --git a/docs/auditor/configs/index.md b/docs/auditor/configs/index.mdx similarity index 100% rename from docs/auditor/configs/index.md rename to docs/auditor/configs/index.mdx diff --git a/docs/auditor/configs/probes.md b/docs/auditor/configs/probes.mdx similarity index 100% rename from docs/auditor/configs/probes.md rename to docs/auditor/configs/probes.mdx diff --git a/docs/auditor/configs/schema.md b/docs/auditor/configs/schema.mdx similarity index 100% rename from docs/auditor/configs/schema.md rename to docs/auditor/configs/schema.mdx diff --git a/docs/auditor/index.md b/docs/auditor/index.mdx similarity index 100% rename from docs/auditor/index.md rename to docs/auditor/index.mdx diff --git a/docs/auditor/sdk-resources.md b/docs/auditor/sdk-resources.mdx similarity index 100% rename from docs/auditor/sdk-resources.md rename to docs/auditor/sdk-resources.mdx diff --git a/docs/auditor/targets/index.md b/docs/auditor/targets/index.mdx similarity index 100% rename from docs/auditor/targets/index.md rename to docs/auditor/targets/index.mdx diff --git a/docs/auditor/targets/inference-gateway.md b/docs/auditor/targets/inference-gateway.mdx similarity index 100% rename from docs/auditor/targets/inference-gateway.md rename to docs/auditor/targets/inference-gateway.mdx diff --git a/docs/auditor/targets/schema.md b/docs/auditor/targets/schema.mdx similarity index 100% rename from docs/auditor/targets/schema.md rename to docs/auditor/targets/schema.mdx diff --git a/docs/auditor/tutorials/index.md b/docs/auditor/tutorials/index.mdx similarity index 100% rename from docs/auditor/tutorials/index.md rename to docs/auditor/tutorials/index.mdx diff --git a/docs/auditor/tutorials/run-audit-locally.md b/docs/auditor/tutorials/run-audit-locally.mdx similarity index 100% rename from docs/auditor/tutorials/run-audit-locally.md rename to docs/auditor/tutorials/run-audit-locally.mdx diff --git a/docs/auth/authentication/index.md b/docs/auth/authentication/index.mdx similarity index 100% rename from docs/auth/authentication/index.md rename to docs/auth/authentication/index.mdx diff --git a/docs/auth/authentication/oidc.md b/docs/auth/authentication/oidc.mdx similarity index 100% rename from docs/auth/authentication/oidc.md rename to docs/auth/authentication/oidc.mdx diff --git a/docs/auth/authentication/providers/azure-ad.md b/docs/auth/authentication/providers/azure-ad.mdx similarity index 100% rename from docs/auth/authentication/providers/azure-ad.md rename to docs/auth/authentication/providers/azure-ad.mdx diff --git a/docs/auth/authentication/providers/generic.md b/docs/auth/authentication/providers/generic.mdx similarity index 100% rename from docs/auth/authentication/providers/generic.md rename to docs/auth/authentication/providers/generic.mdx diff --git a/docs/auth/authentication/providers/index.md b/docs/auth/authentication/providers/index.mdx similarity index 100% rename from docs/auth/authentication/providers/index.md rename to docs/auth/authentication/providers/index.mdx diff --git a/docs/auth/authentication/using-authentication.md b/docs/auth/authentication/using-authentication.mdx similarity index 100% rename from docs/auth/authentication/using-authentication.md rename to docs/auth/authentication/using-authentication.mdx diff --git a/docs/auth/authorization/api-scopes.md b/docs/auth/authorization/api-scopes.mdx similarity index 100% rename from docs/auth/authorization/api-scopes.md rename to docs/auth/authorization/api-scopes.mdx diff --git a/docs/auth/authorization/index.md b/docs/auth/authorization/index.mdx similarity index 100% rename from docs/auth/authorization/index.md rename to docs/auth/authorization/index.mdx diff --git a/docs/auth/authorization/managing-access.md b/docs/auth/authorization/managing-access.mdx similarity index 100% rename from docs/auth/authorization/managing-access.md rename to docs/auth/authorization/managing-access.mdx diff --git a/docs/auth/authorization/permissions-reference.md b/docs/auth/authorization/permissions-reference.mdx similarity index 100% rename from docs/auth/authorization/permissions-reference.md rename to docs/auth/authorization/permissions-reference.mdx diff --git a/docs/auth/authorization/policy-engine.md b/docs/auth/authorization/policy-engine.mdx similarity index 100% rename from docs/auth/authorization/policy-engine.md rename to docs/auth/authorization/policy-engine.mdx diff --git a/docs/auth/authorization/roles-and-permissions.md b/docs/auth/authorization/roles-and-permissions.mdx similarity index 100% rename from docs/auth/authorization/roles-and-permissions.md rename to docs/auth/authorization/roles-and-permissions.mdx diff --git a/docs/auth/concepts.md b/docs/auth/concepts.mdx similarity index 100% rename from docs/auth/concepts.md rename to docs/auth/concepts.mdx diff --git a/docs/auth/deployment/configuration.md b/docs/auth/deployment/configuration.mdx similarity index 100% rename from docs/auth/deployment/configuration.md rename to docs/auth/deployment/configuration.mdx diff --git a/docs/auth/deployment/credential-propagation.md b/docs/auth/deployment/credential-propagation.mdx similarity index 100% rename from docs/auth/deployment/credential-propagation.md rename to docs/auth/deployment/credential-propagation.mdx diff --git a/docs/auth/deployment/gateway.md b/docs/auth/deployment/gateway.mdx similarity index 100% rename from docs/auth/deployment/gateway.md rename to docs/auth/deployment/gateway.mdx diff --git a/docs/auth/deployment/hardening.md b/docs/auth/deployment/hardening.mdx similarity index 100% rename from docs/auth/deployment/hardening.md rename to docs/auth/deployment/hardening.mdx diff --git a/docs/auth/index.md b/docs/auth/index.mdx similarity index 100% rename from docs/auth/index.md rename to docs/auth/index.mdx diff --git a/docs/auth/security-model.md b/docs/auth/security-model.mdx similarity index 100% rename from docs/auth/security-model.md rename to docs/auth/security-model.mdx diff --git a/docs/auth/troubleshooting.md b/docs/auth/troubleshooting.mdx similarity index 100% rename from docs/auth/troubleshooting.md rename to docs/auth/troubleshooting.mdx diff --git a/docs/cli/configuration.md b/docs/cli/configuration.mdx similarity index 100% rename from docs/cli/configuration.md rename to docs/cli/configuration.mdx diff --git a/docs/cli/index.md b/docs/cli/index.mdx similarity index 100% rename from docs/cli/index.md rename to docs/cli/index.mdx diff --git a/docs/cli/reference.md b/docs/cli/reference.mdx similarity index 100% rename from docs/cli/reference.md rename to docs/cli/reference.mdx diff --git a/docs/cli/troubleshooting.md b/docs/cli/troubleshooting.mdx similarity index 100% rename from docs/cli/troubleshooting.md rename to docs/cli/troubleshooting.mdx diff --git a/docs/cli/working-with-resources.md b/docs/cli/working-with-resources.mdx similarity index 100% rename from docs/cli/working-with-resources.md rename to docs/cli/working-with-resources.mdx diff --git a/docs/contributing/skills-spec.md b/docs/contributing/skills-spec.mdx similarity index 100% rename from docs/contributing/skills-spec.md rename to docs/contributing/skills-spec.mdx diff --git a/docs/customizer/about.md b/docs/customizer/about.mdx similarity index 100% rename from docs/customizer/about.md rename to docs/customizer/about.mdx diff --git a/docs/customizer/index.md b/docs/customizer/index.mdx similarity index 100% rename from docs/customizer/index.md rename to docs/customizer/index.mdx diff --git a/docs/customizer/manage-customization-jobs/cancel-job.md b/docs/customizer/manage-customization-jobs/cancel-job.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/cancel-job.md rename to docs/customizer/manage-customization-jobs/cancel-job.mdx diff --git a/docs/customizer/manage-customization-jobs/create-job.md b/docs/customizer/manage-customization-jobs/create-job.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/create-job.md rename to docs/customizer/manage-customization-jobs/create-job.mdx diff --git a/docs/customizer/manage-customization-jobs/customization-job-reference.md b/docs/customizer/manage-customization-jobs/customization-job-reference.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/customization-job-reference.md rename to docs/customizer/manage-customization-jobs/customization-job-reference.mdx diff --git a/docs/customizer/manage-customization-jobs/get-job-status.md b/docs/customizer/manage-customization-jobs/get-job-status.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/get-job-status.md rename to docs/customizer/manage-customization-jobs/get-job-status.mdx diff --git a/docs/customizer/manage-customization-jobs/hyperparameters.md b/docs/customizer/manage-customization-jobs/hyperparameters.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/hyperparameters.md rename to docs/customizer/manage-customization-jobs/hyperparameters.mdx diff --git a/docs/customizer/manage-customization-jobs/index.md b/docs/customizer/manage-customization-jobs/index.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/index.md rename to docs/customizer/manage-customization-jobs/index.mdx diff --git a/docs/customizer/manage-customization-jobs/list-active-jobs.md b/docs/customizer/manage-customization-jobs/list-active-jobs.mdx similarity index 100% rename from docs/customizer/manage-customization-jobs/list-active-jobs.md rename to docs/customizer/manage-customization-jobs/list-active-jobs.mdx diff --git a/docs/customizer/manage-model-entities/create-fileset.md b/docs/customizer/manage-model-entities/create-fileset.mdx similarity index 100% rename from docs/customizer/manage-model-entities/create-fileset.md rename to docs/customizer/manage-model-entities/create-fileset.mdx diff --git a/docs/customizer/manage-model-entities/create-model-entity.md b/docs/customizer/manage-model-entities/create-model-entity.mdx similarity index 100% rename from docs/customizer/manage-model-entities/create-model-entity.md rename to docs/customizer/manage-model-entities/create-model-entity.mdx diff --git a/docs/customizer/manage-model-entities/index.md b/docs/customizer/manage-model-entities/index.mdx similarity index 100% rename from docs/customizer/manage-model-entities/index.md rename to docs/customizer/manage-model-entities/index.mdx diff --git a/docs/customizer/models/data-format.md b/docs/customizer/models/data-format.mdx similarity index 100% rename from docs/customizer/models/data-format.md rename to docs/customizer/models/data-format.mdx diff --git a/docs/customizer/models/embedding.md b/docs/customizer/models/embedding.mdx similarity index 100% rename from docs/customizer/models/embedding.md rename to docs/customizer/models/embedding.mdx diff --git a/docs/customizer/models/gpt-oss.md b/docs/customizer/models/gpt-oss.mdx similarity index 100% rename from docs/customizer/models/gpt-oss.md rename to docs/customizer/models/gpt-oss.mdx diff --git a/docs/customizer/models/index.md b/docs/customizer/models/index.mdx similarity index 100% rename from docs/customizer/models/index.md rename to docs/customizer/models/index.mdx diff --git a/docs/customizer/models/llama-nemotron.md b/docs/customizer/models/llama-nemotron.mdx similarity index 100% rename from docs/customizer/models/llama-nemotron.md rename to docs/customizer/models/llama-nemotron.mdx diff --git a/docs/customizer/models/llama.md b/docs/customizer/models/llama.mdx similarity index 100% rename from docs/customizer/models/llama.md rename to docs/customizer/models/llama.mdx diff --git a/docs/customizer/models/mistral.md b/docs/customizer/models/mistral.mdx similarity index 100% rename from docs/customizer/models/mistral.md rename to docs/customizer/models/mistral.mdx diff --git a/docs/customizer/models/phi.md b/docs/customizer/models/phi.mdx similarity index 100% rename from docs/customizer/models/phi.md rename to docs/customizer/models/phi.mdx diff --git a/docs/customizer/models/qwen.md b/docs/customizer/models/qwen.mdx similarity index 100% rename from docs/customizer/models/qwen.md rename to docs/customizer/models/qwen.mdx diff --git a/docs/customizer/tutorials/_snippets/customizer-prereqs.md b/docs/customizer/tutorials/_snippets/customizer-prereqs.mdx similarity index 100% rename from docs/customizer/tutorials/_snippets/customizer-prereqs.md rename to docs/customizer/tutorials/_snippets/customizer-prereqs.mdx diff --git a/docs/customizer/tutorials/format-training-dataset.md b/docs/customizer/tutorials/format-training-dataset.mdx similarity index 100% rename from docs/customizer/tutorials/format-training-dataset.md rename to docs/customizer/tutorials/format-training-dataset.mdx diff --git a/docs/customizer/tutorials/import-hf-model.md b/docs/customizer/tutorials/import-hf-model.mdx similarity index 100% rename from docs/customizer/tutorials/import-hf-model.md rename to docs/customizer/tutorials/import-hf-model.mdx diff --git a/docs/customizer/tutorials/index.md b/docs/customizer/tutorials/index.mdx similarity index 100% rename from docs/customizer/tutorials/index.md rename to docs/customizer/tutorials/index.mdx diff --git a/docs/customizer/tutorials/metrics.md b/docs/customizer/tutorials/metrics.mdx similarity index 100% rename from docs/customizer/tutorials/metrics.md rename to docs/customizer/tutorials/metrics.mdx diff --git a/docs/customizer/tutorials/understand-configurations-and-models.md b/docs/customizer/tutorials/understand-configurations-and-models.mdx similarity index 100% rename from docs/customizer/tutorials/understand-configurations-and-models.md rename to docs/customizer/tutorials/understand-configurations-and-models.mdx diff --git a/docs/data-designer/_snippets/job-results.md b/docs/data-designer/_snippets/job-results.mdx similarity index 100% rename from docs/data-designer/_snippets/job-results.md rename to docs/data-designer/_snippets/job-results.mdx diff --git a/docs/data-designer/_snippets/preview-results.md b/docs/data-designer/_snippets/preview-results.mdx similarity index 100% rename from docs/data-designer/_snippets/preview-results.md rename to docs/data-designer/_snippets/preview-results.mdx diff --git a/docs/data-designer/cli.md b/docs/data-designer/cli.mdx similarity index 100% rename from docs/data-designer/cli.md rename to docs/data-designer/cli.mdx diff --git a/docs/data-designer/execution-modes.md b/docs/data-designer/execution-modes.mdx similarity index 100% rename from docs/data-designer/execution-modes.md rename to docs/data-designer/execution-modes.mdx diff --git a/docs/data-designer/index.md b/docs/data-designer/index.mdx similarity index 100% rename from docs/data-designer/index.md rename to docs/data-designer/index.mdx diff --git a/docs/data-designer/migration.md b/docs/data-designer/migration.mdx similarity index 100% rename from docs/data-designer/migration.md rename to docs/data-designer/migration.mdx diff --git a/docs/data-designer/sdk-resources.md b/docs/data-designer/sdk-resources.mdx similarity index 100% rename from docs/data-designer/sdk-resources.md rename to docs/data-designer/sdk-resources.mdx diff --git a/docs/data-designer/tutorials/basics.md b/docs/data-designer/tutorials/basics.mdx similarity index 100% rename from docs/data-designer/tutorials/basics.md rename to docs/data-designer/tutorials/basics.mdx diff --git a/docs/data-designer/tutorials/index.md b/docs/data-designer/tutorials/index.mdx similarity index 100% rename from docs/data-designer/tutorials/index.md rename to docs/data-designer/tutorials/index.mdx diff --git a/docs/data-designer/tutorials/seeding.md b/docs/data-designer/tutorials/seeding.mdx similarity index 100% rename from docs/data-designer/tutorials/seeding.md rename to docs/data-designer/tutorials/seeding.mdx diff --git a/docs/eula.md b/docs/eula.mdx similarity index 100% rename from docs/eula.md rename to docs/eula.mdx diff --git a/docs/evaluator/benchmarks/agentic.md b/docs/evaluator/benchmarks/agentic.mdx similarity index 100% rename from docs/evaluator/benchmarks/agentic.md rename to docs/evaluator/benchmarks/agentic.mdx diff --git a/docs/evaluator/benchmarks/custom.md b/docs/evaluator/benchmarks/custom.mdx similarity index 100% rename from docs/evaluator/benchmarks/custom.md rename to docs/evaluator/benchmarks/custom.mdx diff --git a/docs/evaluator/benchmarks/discover-industry-benchmarks.md b/docs/evaluator/benchmarks/discover-industry-benchmarks.mdx similarity index 100% rename from docs/evaluator/benchmarks/discover-industry-benchmarks.md rename to docs/evaluator/benchmarks/discover-industry-benchmarks.mdx diff --git a/docs/evaluator/benchmarks/hf-secret.md b/docs/evaluator/benchmarks/hf-secret.mdx similarity index 100% rename from docs/evaluator/benchmarks/hf-secret.md rename to docs/evaluator/benchmarks/hf-secret.mdx diff --git a/docs/evaluator/benchmarks/index.md b/docs/evaluator/benchmarks/index.mdx similarity index 100% rename from docs/evaluator/benchmarks/index.md rename to docs/evaluator/benchmarks/index.mdx diff --git a/docs/evaluator/benchmarks/industry.md b/docs/evaluator/benchmarks/industry.mdx similarity index 100% rename from docs/evaluator/benchmarks/industry.md rename to docs/evaluator/benchmarks/industry.mdx diff --git a/docs/evaluator/benchmarks/job-management.md b/docs/evaluator/benchmarks/job-management.mdx similarity index 100% rename from docs/evaluator/benchmarks/job-management.md rename to docs/evaluator/benchmarks/job-management.mdx diff --git a/docs/evaluator/benchmarks/manage-benchmarks.md b/docs/evaluator/benchmarks/manage-benchmarks.mdx similarity index 100% rename from docs/evaluator/benchmarks/manage-benchmarks.md rename to docs/evaluator/benchmarks/manage-benchmarks.mdx diff --git a/docs/evaluator/benchmarks/results.md b/docs/evaluator/benchmarks/results.mdx similarity index 100% rename from docs/evaluator/benchmarks/results.md rename to docs/evaluator/benchmarks/results.mdx diff --git a/docs/evaluator/index.md b/docs/evaluator/index.mdx similarity index 100% rename from docs/evaluator/index.md rename to docs/evaluator/index.mdx diff --git a/docs/evaluator/metrics/agent-configuration.md b/docs/evaluator/metrics/agent-configuration.mdx similarity index 100% rename from docs/evaluator/metrics/agent-configuration.md rename to docs/evaluator/metrics/agent-configuration.mdx diff --git a/docs/evaluator/metrics/agentic.md b/docs/evaluator/metrics/agentic.mdx similarity index 100% rename from docs/evaluator/metrics/agentic.md rename to docs/evaluator/metrics/agentic.mdx diff --git a/docs/evaluator/metrics/index.md b/docs/evaluator/metrics/index.mdx similarity index 100% rename from docs/evaluator/metrics/index.md rename to docs/evaluator/metrics/index.mdx diff --git a/docs/evaluator/metrics/job-management.md b/docs/evaluator/metrics/job-management.mdx similarity index 100% rename from docs/evaluator/metrics/job-management.md rename to docs/evaluator/metrics/job-management.mdx diff --git a/docs/evaluator/metrics/llm-as-a-judge.md b/docs/evaluator/metrics/llm-as-a-judge.mdx similarity index 100% rename from docs/evaluator/metrics/llm-as-a-judge.md rename to docs/evaluator/metrics/llm-as-a-judge.mdx diff --git a/docs/evaluator/metrics/manage-metrics.md b/docs/evaluator/metrics/manage-metrics.mdx similarity index 100% rename from docs/evaluator/metrics/manage-metrics.md rename to docs/evaluator/metrics/manage-metrics.mdx diff --git a/docs/evaluator/metrics/model-configuration.md b/docs/evaluator/metrics/model-configuration.mdx similarity index 100% rename from docs/evaluator/metrics/model-configuration.md rename to docs/evaluator/metrics/model-configuration.mdx diff --git a/docs/evaluator/metrics/rag.md b/docs/evaluator/metrics/rag.mdx similarity index 100% rename from docs/evaluator/metrics/rag.md rename to docs/evaluator/metrics/rag.mdx diff --git a/docs/evaluator/metrics/remote.md b/docs/evaluator/metrics/remote.mdx similarity index 100% rename from docs/evaluator/metrics/remote.md rename to docs/evaluator/metrics/remote.mdx diff --git a/docs/evaluator/metrics/results.md b/docs/evaluator/metrics/results.mdx similarity index 100% rename from docs/evaluator/metrics/results.md rename to docs/evaluator/metrics/results.mdx diff --git a/docs/evaluator/metrics/similarity.md b/docs/evaluator/metrics/similarity.mdx similarity index 100% rename from docs/evaluator/metrics/similarity.md rename to docs/evaluator/metrics/similarity.mdx diff --git a/docs/evaluator/sdk-resources.md b/docs/evaluator/sdk-resources.mdx similarity index 100% rename from docs/evaluator/sdk-resources.md rename to docs/evaluator/sdk-resources.mdx diff --git a/docs/evaluator/tutorials/index.md b/docs/evaluator/tutorials/index.mdx similarity index 100% rename from docs/evaluator/tutorials/index.md rename to docs/evaluator/tutorials/index.mdx diff --git a/docs/evaluator/tutorials/run-llm-judge-evaluation.md b/docs/evaluator/tutorials/run-llm-judge-evaluation.mdx similarity index 100% rename from docs/evaluator/tutorials/run-llm-judge-evaluation.md rename to docs/evaluator/tutorials/run-llm-judge-evaluation.mdx diff --git a/docs/example-applications/about.md b/docs/example-applications/about.mdx similarity index 100% rename from docs/example-applications/about.md rename to docs/example-applications/about.mdx diff --git a/docs/get-started/concepts/entities.md b/docs/get-started/concepts/entities.mdx similarity index 100% rename from docs/get-started/concepts/entities.md rename to docs/get-started/concepts/entities.mdx diff --git a/docs/get-started/concepts/entity-references.md b/docs/get-started/concepts/entity-references.mdx similarity index 100% rename from docs/get-started/concepts/entity-references.md rename to docs/get-started/concepts/entity-references.mdx diff --git a/docs/get-started/concepts/filtering.md b/docs/get-started/concepts/filtering.mdx similarity index 100% rename from docs/get-started/concepts/filtering.md rename to docs/get-started/concepts/filtering.mdx diff --git a/docs/get-started/concepts/index.md b/docs/get-started/concepts/index.mdx similarity index 100% rename from docs/get-started/concepts/index.md rename to docs/get-started/concepts/index.mdx diff --git a/docs/get-started/concepts/manage-files.md b/docs/get-started/concepts/manage-files.mdx similarity index 100% rename from docs/get-started/concepts/manage-files.md rename to docs/get-started/concepts/manage-files.mdx diff --git a/docs/get-started/concepts/manage-secrets.md b/docs/get-started/concepts/manage-secrets.mdx similarity index 100% rename from docs/get-started/concepts/manage-secrets.md rename to docs/get-started/concepts/manage-secrets.mdx diff --git a/docs/get-started/concepts/projects.md b/docs/get-started/concepts/projects.mdx similarity index 100% rename from docs/get-started/concepts/projects.md rename to docs/get-started/concepts/projects.mdx diff --git a/docs/get-started/concepts/workspaces.md b/docs/get-started/concepts/workspaces.mdx similarity index 100% rename from docs/get-started/concepts/workspaces.md rename to docs/get-started/concepts/workspaces.mdx diff --git a/docs/get-started/setup.md b/docs/get-started/setup.mdx similarity index 100% rename from docs/get-started/setup.md rename to docs/get-started/setup.mdx diff --git a/docs/guardrails/concepts/architecture.md b/docs/guardrails/concepts/architecture.mdx similarity index 100% rename from docs/guardrails/concepts/architecture.md rename to docs/guardrails/concepts/architecture.mdx diff --git a/docs/guardrails/concepts/checks.md b/docs/guardrails/concepts/checks.mdx similarity index 100% rename from docs/guardrails/concepts/checks.md rename to docs/guardrails/concepts/checks.mdx diff --git a/docs/guardrails/concepts/configurations/configuration-structure.md b/docs/guardrails/concepts/configurations/configuration-structure.mdx similarity index 100% rename from docs/guardrails/concepts/configurations/configuration-structure.md rename to docs/guardrails/concepts/configurations/configuration-structure.mdx diff --git a/docs/guardrails/concepts/configurations/default-configs.md b/docs/guardrails/concepts/configurations/default-configs.mdx similarity index 100% rename from docs/guardrails/concepts/configurations/default-configs.md rename to docs/guardrails/concepts/configurations/default-configs.mdx diff --git a/docs/guardrails/concepts/configurations/index.md b/docs/guardrails/concepts/configurations/index.mdx similarity index 100% rename from docs/guardrails/concepts/configurations/index.md rename to docs/guardrails/concepts/configurations/index.mdx diff --git a/docs/guardrails/concepts/configurations/manage-configs.md b/docs/guardrails/concepts/configurations/manage-configs.mdx similarity index 100% rename from docs/guardrails/concepts/configurations/manage-configs.md rename to docs/guardrails/concepts/configurations/manage-configs.mdx diff --git a/docs/guardrails/concepts/index.md b/docs/guardrails/concepts/index.mdx similarity index 100% rename from docs/guardrails/concepts/index.md rename to docs/guardrails/concepts/index.mdx diff --git a/docs/guardrails/concepts/inference.md b/docs/guardrails/concepts/inference.mdx similarity index 100% rename from docs/guardrails/concepts/inference.md rename to docs/guardrails/concepts/inference.mdx diff --git a/docs/guardrails/index.md b/docs/guardrails/index.mdx similarity index 100% rename from docs/guardrails/index.md rename to docs/guardrails/index.mdx diff --git a/docs/guardrails/observability.md b/docs/guardrails/observability.mdx similarity index 100% rename from docs/guardrails/observability.md rename to docs/guardrails/observability.mdx diff --git a/docs/guardrails/terminology.md b/docs/guardrails/terminology.mdx similarity index 100% rename from docs/guardrails/terminology.md rename to docs/guardrails/terminology.mdx diff --git a/docs/guardrails/tutorials/content-safety.md b/docs/guardrails/tutorials/content-safety.mdx similarity index 100% rename from docs/guardrails/tutorials/content-safety.md rename to docs/guardrails/tutorials/content-safety.mdx diff --git a/docs/guardrails/tutorials/deploy-nemoguard-nims.md b/docs/guardrails/tutorials/deploy-nemoguard-nims.mdx similarity index 100% rename from docs/guardrails/tutorials/deploy-nemoguard-nims.md rename to docs/guardrails/tutorials/deploy-nemoguard-nims.mdx diff --git a/docs/guardrails/tutorials/index.md b/docs/guardrails/tutorials/index.mdx similarity index 100% rename from docs/guardrails/tutorials/index.md rename to docs/guardrails/tutorials/index.mdx diff --git a/docs/guardrails/tutorials/injection-detection.md b/docs/guardrails/tutorials/injection-detection.mdx similarity index 100% rename from docs/guardrails/tutorials/injection-detection.md rename to docs/guardrails/tutorials/injection-detection.mdx diff --git a/docs/guardrails/tutorials/multimodal-data.md b/docs/guardrails/tutorials/multimodal-data.mdx similarity index 100% rename from docs/guardrails/tutorials/multimodal-data.md rename to docs/guardrails/tutorials/multimodal-data.mdx diff --git a/docs/guardrails/tutorials/parallel-rails.md b/docs/guardrails/tutorials/parallel-rails.mdx similarity index 100% rename from docs/guardrails/tutorials/parallel-rails.md rename to docs/guardrails/tutorials/parallel-rails.mdx diff --git a/docs/helm/index.md b/docs/helm/index.mdx similarity index 100% rename from docs/helm/index.md rename to docs/helm/index.mdx diff --git a/docs/index.md b/docs/index.mdx similarity index 100% rename from docs/index.md rename to docs/index.mdx diff --git a/docs/pysdk/client/index.md b/docs/pysdk/client/index.mdx similarity index 100% rename from docs/pysdk/client/index.md rename to docs/pysdk/client/index.mdx diff --git a/docs/pysdk/index.md b/docs/pysdk/index.mdx similarity index 100% rename from docs/pysdk/index.md rename to docs/pysdk/index.mdx diff --git a/docs/requirements.md b/docs/requirements.mdx similarity index 100% rename from docs/requirements.md rename to docs/requirements.mdx diff --git a/docs/run-inference/about.md b/docs/run-inference/about.mdx similarity index 100% rename from docs/run-inference/about.md rename to docs/run-inference/about.mdx diff --git a/docs/run-inference/tutorials/deploy-models.md b/docs/run-inference/tutorials/deploy-models.mdx similarity index 100% rename from docs/run-inference/tutorials/deploy-models.md rename to docs/run-inference/tutorials/deploy-models.mdx diff --git a/docs/run-inference/tutorials/index.md b/docs/run-inference/tutorials/index.mdx similarity index 100% rename from docs/run-inference/tutorials/index.md rename to docs/run-inference/tutorials/index.mdx diff --git a/docs/run-inference/tutorials/run-inference.md b/docs/run-inference/tutorials/run-inference.mdx similarity index 100% rename from docs/run-inference/tutorials/run-inference.md rename to docs/run-inference/tutorials/run-inference.mdx diff --git a/docs/safe-synthesizer/about/data-synthesis.md b/docs/safe-synthesizer/about/data-synthesis.mdx similarity index 100% rename from docs/safe-synthesizer/about/data-synthesis.md rename to docs/safe-synthesizer/about/data-synthesis.mdx diff --git a/docs/safe-synthesizer/about/evaluation.md b/docs/safe-synthesizer/about/evaluation.mdx similarity index 100% rename from docs/safe-synthesizer/about/evaluation.md rename to docs/safe-synthesizer/about/evaluation.mdx diff --git a/docs/safe-synthesizer/about/host-local-development.md b/docs/safe-synthesizer/about/host-local-development.mdx similarity index 100% rename from docs/safe-synthesizer/about/host-local-development.md rename to docs/safe-synthesizer/about/host-local-development.mdx diff --git a/docs/safe-synthesizer/about/index.md b/docs/safe-synthesizer/about/index.mdx similarity index 100% rename from docs/safe-synthesizer/about/index.md rename to docs/safe-synthesizer/about/index.mdx diff --git a/docs/safe-synthesizer/about/jobs.md b/docs/safe-synthesizer/about/jobs.mdx similarity index 100% rename from docs/safe-synthesizer/about/jobs.md rename to docs/safe-synthesizer/about/jobs.mdx diff --git a/docs/safe-synthesizer/about/pii-replacement.md b/docs/safe-synthesizer/about/pii-replacement.mdx similarity index 100% rename from docs/safe-synthesizer/about/pii-replacement.md rename to docs/safe-synthesizer/about/pii-replacement.mdx diff --git a/docs/safe-synthesizer/about/reference.md b/docs/safe-synthesizer/about/reference.mdx similarity index 100% rename from docs/safe-synthesizer/about/reference.md rename to docs/safe-synthesizer/about/reference.mdx diff --git a/docs/safe-synthesizer/getting-started.md b/docs/safe-synthesizer/getting-started.mdx similarity index 100% rename from docs/safe-synthesizer/getting-started.md rename to docs/safe-synthesizer/getting-started.mdx diff --git a/docs/safe-synthesizer/tutorials/differential-privacy.md b/docs/safe-synthesizer/tutorials/differential-privacy.mdx similarity index 100% rename from docs/safe-synthesizer/tutorials/differential-privacy.md rename to docs/safe-synthesizer/tutorials/differential-privacy.mdx diff --git a/docs/safe-synthesizer/tutorials/index.md b/docs/safe-synthesizer/tutorials/index.mdx similarity index 100% rename from docs/safe-synthesizer/tutorials/index.md rename to docs/safe-synthesizer/tutorials/index.mdx diff --git a/docs/safe-synthesizer/tutorials/safe-synthesizer-101.md b/docs/safe-synthesizer/tutorials/safe-synthesizer-101.mdx similarity index 100% rename from docs/safe-synthesizer/tutorials/safe-synthesizer-101.md rename to docs/safe-synthesizer/tutorials/safe-synthesizer-101.mdx diff --git a/docs/set-up/config-reference.md b/docs/set-up/config-reference.mdx similarity index 100% rename from docs/set-up/config-reference.md rename to docs/set-up/config-reference.mdx diff --git a/docs/set-up/helm/backup-and-restore.md b/docs/set-up/helm/backup-and-restore.mdx similarity index 100% rename from docs/set-up/helm/backup-and-restore.md rename to docs/set-up/helm/backup-and-restore.mdx diff --git a/docs/set-up/helm/database-setup.md b/docs/set-up/helm/database-setup.mdx similarity index 100% rename from docs/set-up/helm/database-setup.md rename to docs/set-up/helm/database-setup.mdx diff --git a/docs/set-up/helm/file-storage.md b/docs/set-up/helm/file-storage.mdx similarity index 100% rename from docs/set-up/helm/file-storage.md rename to docs/set-up/helm/file-storage.mdx diff --git a/docs/set-up/helm/index.md b/docs/set-up/helm/index.mdx similarity index 100% rename from docs/set-up/helm/index.md rename to docs/set-up/helm/index.mdx diff --git a/docs/set-up/helm/ingress.md b/docs/set-up/helm/ingress.mdx similarity index 100% rename from docs/set-up/helm/ingress.md rename to docs/set-up/helm/ingress.mdx diff --git a/docs/set-up/helm/install.md b/docs/set-up/helm/install.mdx similarity index 100% rename from docs/set-up/helm/install.md rename to docs/set-up/helm/install.mdx diff --git a/docs/set-up/helm/multinode-networking.md b/docs/set-up/helm/multinode-networking.mdx similarity index 100% rename from docs/set-up/helm/multinode-networking.md rename to docs/set-up/helm/multinode-networking.mdx diff --git a/docs/set-up/helm/openshift.md b/docs/set-up/helm/openshift.mdx similarity index 100% rename from docs/set-up/helm/openshift.md rename to docs/set-up/helm/openshift.mdx diff --git a/docs/set-up/helm/persistent-volumes.md b/docs/set-up/helm/persistent-volumes.mdx similarity index 100% rename from docs/set-up/helm/persistent-volumes.md rename to docs/set-up/helm/persistent-volumes.mdx diff --git a/docs/set-up/helm/prerequisites.md b/docs/set-up/helm/prerequisites.mdx similarity index 100% rename from docs/set-up/helm/prerequisites.md rename to docs/set-up/helm/prerequisites.mdx diff --git a/docs/set-up/index.md b/docs/set-up/index.mdx similarity index 100% rename from docs/set-up/index.md rename to docs/set-up/index.mdx diff --git a/docs/set-up/manage-jobs.md b/docs/set-up/manage-jobs.mdx similarity index 100% rename from docs/set-up/manage-jobs.md rename to docs/set-up/manage-jobs.mdx diff --git a/docs/set-up/milvus.md b/docs/set-up/milvus.mdx similarity index 100% rename from docs/set-up/milvus.md rename to docs/set-up/milvus.mdx diff --git a/docs/set-up/opentelemetry.md b/docs/set-up/opentelemetry.mdx similarity index 100% rename from docs/set-up/opentelemetry.md rename to docs/set-up/opentelemetry.mdx diff --git a/docs/set-up/security.md b/docs/set-up/security.mdx similarity index 100% rename from docs/set-up/security.md rename to docs/set-up/security.mdx diff --git a/docs/studio/agents.md b/docs/studio/agents.mdx similarity index 100% rename from docs/studio/agents.md rename to docs/studio/agents.mdx diff --git a/docs/studio/index.md b/docs/studio/index.mdx similarity index 100% rename from docs/studio/index.md rename to docs/studio/index.mdx diff --git a/docs/studio/monitor.md b/docs/studio/monitor.mdx similarity index 100% rename from docs/studio/monitor.md rename to docs/studio/monitor.mdx diff --git a/docs/studio/suggestions.md b/docs/studio/suggestions.mdx similarity index 100% rename from docs/studio/suggestions.md rename to docs/studio/suggestions.mdx diff --git a/docs/support-matrix.md b/docs/support-matrix.mdx similarity index 100% rename from docs/support-matrix.md rename to docs/support-matrix.mdx diff --git a/docs/troubleshooting/cluster-setup.md b/docs/troubleshooting/cluster-setup.mdx similarity index 100% rename from docs/troubleshooting/cluster-setup.md rename to docs/troubleshooting/cluster-setup.mdx diff --git a/docs/troubleshooting/customizer.md b/docs/troubleshooting/customizer.mdx similarity index 100% rename from docs/troubleshooting/customizer.md rename to docs/troubleshooting/customizer.mdx diff --git a/docs/troubleshooting/data-designer.md b/docs/troubleshooting/data-designer.mdx similarity index 100% rename from docs/troubleshooting/data-designer.md rename to docs/troubleshooting/data-designer.mdx diff --git a/docs/troubleshooting/evaluator.md b/docs/troubleshooting/evaluator.mdx similarity index 100% rename from docs/troubleshooting/evaluator.md rename to docs/troubleshooting/evaluator.mdx diff --git a/docs/troubleshooting/guardrails.md b/docs/troubleshooting/guardrails.mdx similarity index 100% rename from docs/troubleshooting/guardrails.md rename to docs/troubleshooting/guardrails.mdx diff --git a/docs/troubleshooting/index.md b/docs/troubleshooting/index.mdx similarity index 100% rename from docs/troubleshooting/index.md rename to docs/troubleshooting/index.mdx diff --git a/docs/troubleshooting/studio.md b/docs/troubleshooting/studio.mdx similarity index 100% rename from docs/troubleshooting/studio.md rename to docs/troubleshooting/studio.mdx From 619cf024fe42375cd570268a052b4685e848a2a4 Mon Sep 17 00:00:00 2001 From: Yamini Date: Thu, 4 Jun 2026 01:39:31 -0400 Subject: [PATCH 2/2] docs(fern): convert mkdocs content to Fern MDX, scaffold docs/fern - Convert 188 pages from mkdocs Material/pymdown to Fern MDX (admonitions, tabs, snippets inlined, macro variables substituted, MDX-safe escaping). - Scaffold docs/fern: docs.yml, fern.config.json (CLI 5.44.7), versions/latest.yml with explicit slugs, NVIDIA-green theme (global-theme: nvidia documented toggle, pending org theme access), redirects carried over from the mkdocs redirects plugin. - Native Fern API reference from the OpenAPI spec (docs/fern/apis/nemo-platform). - Preserve launch gating: sections hidden via the same rules as the mkdocs hide_unready_docs hook (hidden in nav, still routable so links resolve). - Hoist converted MDX to docs/ top-level; remove mkdocs.yml, hooks, overrides, theme assets, and excluded template/work/notebooks trees. fern check + fern docs broken-links both pass. --- docs/CONTRIBUTING.md | 225 - docs/Makefile | 126 - docs/README.md | 100 - docs/_hooks/hide_unready_docs.py | 214 - docs/_overrides/.gitkeep | 0 docs/_overrides/main.html | 10 - docs/_scripts/README.md | 59 - docs/_scripts/format_code_blocks.py | 272 - docs/_scripts/lint_notebooks.py | 568 - docs/_scripts/run_notebooks.py | 66 - docs/_scripts/setup_mkdocs_env.sh | 20 - docs/_scripts/test_format_code_blocks.py | 240 - docs/_scripts/test_lint_notebooks.py | 27 - docs/_snippets/cli-summary.md | 19 - docs/_snippets/naming-rules.md | 9 - docs/_snippets/nvidia-build-model-provider.md | 8 - docs/_snippets/tutorials/cli-sdk-setup.md | 19 - docs/_snippets/tutorials/prereqs.md | 12 - docs/about/release-notes/current-release.mdx | 16 +- docs/about/release-notes/index.mdx | 10 +- docs/acknowledgements/index.mdx | 8 +- docs/agents/index.mdx | 44 +- docs/agents/optimization.mdx | 385 +- docs/agents/plugins.mdx | 24 +- docs/agents/security.mdx | 258 +- docs/anonymizer/cli.mdx | 16 +- docs/anonymizer/index.mdx | 37 +- docs/anonymizer/quickstart.mdx | 90 +- docs/anonymizer/sdk-resources.mdx | 40 +- docs/anonymizer/tutorials/index.mdx | 18 +- docs/anonymizer/tutorials/preview.mdx | 42 +- docs/anonymizer/tutorials/run.mdx | 24 +- docs/api/index.mdx | 25 +- docs/auditor/configs/index.mdx | 10 +- docs/auditor/configs/probes.mdx | 4 + docs/auditor/configs/schema.mdx | 10 +- docs/auditor/index.mdx | 48 +- docs/auditor/sdk-resources.mdx | 16 +- docs/auditor/targets/index.mdx | 14 +- docs/auditor/targets/inference-gateway.mdx | 8 +- docs/auditor/targets/schema.mdx | 8 +- docs/auditor/tutorials/index.mdx | 14 +- docs/auditor/tutorials/run-audit-locally.mdx | 53 +- docs/auth/authentication/index.mdx | 28 +- docs/auth/authentication/oidc.mdx | 50 +- .../authentication/providers/azure-ad.mdx | 28 +- .../auth/authentication/providers/generic.mdx | 16 +- docs/auth/authentication/providers/index.mdx | 8 +- .../authentication/using-authentication.mdx | 38 +- docs/auth/authorization/api-scopes.mdx | 23 +- docs/auth/authorization/index.mdx | 20 +- docs/auth/authorization/managing-access.mdx | 510 +- .../authorization/permissions-reference.mdx | 21 +- docs/auth/authorization/policy-engine.mdx | 20 +- .../authorization/roles-and-permissions.mdx | 25 +- docs/auth/concepts.mdx | 30 +- docs/auth/deployment/configuration.mdx | 26 +- .../deployment/credential-propagation.mdx | 25 +- docs/auth/deployment/gateway.mdx | 157 +- docs/auth/deployment/hardening.mdx | 36 +- docs/auth/index.mdx | 144 +- docs/auth/security-model.mdx | 62 +- docs/auth/troubleshooting.mdx | 20 +- docs/cli/configuration.mdx | 6 +- docs/cli/index.mdx | 53 +- docs/cli/reference.mdx | 590 +- docs/cli/troubleshooting.mdx | 10 +- docs/cli/working-with-resources.mdx | 6 +- docs/contributing/skills-spec.mdx | 22 +- docs/customizer/about.mdx | 50 +- docs/customizer/index.mdx | 52 +- .../manage-customization-jobs/cancel-job.mdx | 76 +- .../manage-customization-jobs/create-job.mdx | 91 +- .../customization-job-reference.mdx | 168 +- .../get-job-status.mdx | 26 +- .../hyperparameters.mdx | 52 +- .../manage-customization-jobs/index.mdx | 27 +- .../list-active-jobs.mdx | 119 +- .../manage-model-entities/create-fileset.mdx | 418 +- .../create-model-entity.mdx | 82 +- .../manage-model-entities/index.mdx | 15 +- docs/customizer/models/data-format.mdx | 6 +- docs/customizer/models/embedding.mdx | 22 +- docs/customizer/models/gpt-oss.mdx | 18 +- docs/customizer/models/index.mdx | 33 +- docs/customizer/models/llama-nemotron.mdx | 23 +- docs/customizer/models/llama.mdx | 8 +- docs/customizer/models/mistral.mdx | 12 +- docs/customizer/models/phi.mdx | 8 +- docs/customizer/models/qwen.mdx | 6 +- .../_snippets/customizer-prereqs.mdx | 65 +- .../tutorials/format-training-dataset.mdx | 211 +- docs/customizer/tutorials/import-hf-model.mdx | 885 +- docs/customizer/tutorials/index.mdx | 28 +- docs/customizer/tutorials/metrics.mdx | 65 +- .../understand-configurations-and-models.mdx | 120 +- docs/data-designer/_snippets/job-results.mdx | 19 +- .../_snippets/preview-results.mdx | 13 +- docs/data-designer/cli.mdx | 6 +- docs/data-designer/execution-modes.mdx | 9 +- docs/data-designer/index.mdx | 23 +- docs/data-designer/migration.mdx | 12 +- docs/data-designer/sdk-resources.mdx | 10 +- docs/data-designer/tutorials/basics.mdx | 39 +- docs/data-designer/tutorials/index.mdx | 24 +- docs/data-designer/tutorials/seeding.mdx | 37 +- docs/eula.mdx | 6 +- docs/evaluator/benchmarks/agentic.mdx | 92 +- docs/evaluator/benchmarks/custom.mdx | 61 +- .../discover-industry-benchmarks.mdx | 14 +- docs/evaluator/benchmarks/hf-secret.mdx | 4 + docs/evaluator/benchmarks/index.mdx | 75 +- docs/evaluator/benchmarks/industry.mdx | 1272 +- docs/evaluator/benchmarks/job-management.mdx | 13 +- .../benchmarks/manage-benchmarks.mdx | 17 +- docs/evaluator/benchmarks/results.mdx | 4 + docs/evaluator/index.mdx | 55 +- .../evaluator/metrics/agent-configuration.mdx | 31 +- docs/evaluator/metrics/agentic.mdx | 1295 +- docs/evaluator/metrics/index.mdx | 43 +- docs/evaluator/metrics/job-management.mdx | 17 +- docs/evaluator/metrics/llm-as-a-judge.mdx | 146 +- docs/evaluator/metrics/manage-metrics.mdx | 24 +- .../evaluator/metrics/model-configuration.mdx | 17 +- docs/evaluator/metrics/rag.mdx | 710 +- docs/evaluator/metrics/remote.mdx | 191 +- docs/evaluator/metrics/results.mdx | 12 +- docs/evaluator/metrics/similarity.mdx | 879 +- docs/evaluator/sdk-resources.mdx | 24 +- docs/evaluator/tutorials/index.mdx | 12 +- .../tutorials/run-llm-judge-evaluation.mdx | 95 +- docs/example-applications/about.mdx | 8 +- .../_images/nemo-platform-architecture.svg | 1 + docs/fern/apis/nemo-platform/generators.yml | 3 + docs/fern/apis/nemo-platform/openapi.yaml | 28781 ++++++++++++++++ docs/fern/assets/favicon.ico | 3 + .../assets/images/agent_eval_framework.png | 3 + .../images/attribute-inference-protection.png | 3 + docs/fern/assets/images/car-audio-theft.jpg | 3 + .../images/column-correlation-stability.png | 3 + .../images/column-distribution-stability.png | 3 + docs/fern/assets/images/customizations.png | 3 + docs/fern/assets/images/dashboard.png | 3 + docs/fern/assets/images/datasets.png | 3 + .../images/deep-structure-stability.png | 3 + docs/fern/assets/images/dps-subscores.png | 3 + docs/fern/assets/images/dps-usability.png | 3 + .../assets/images/evaluation_result_dir.png | 3 + docs/fern/assets/images/evaluations.png | 3 + .../assets/images/evaluator_interactions.png | 3 + docs/fern/assets/images/favicon.ico | 3 + docs/fern/assets/images/gpu_memory.png | 3 + docs/fern/assets/images/gpu_utilization.png | 3 + docs/fern/assets/images/jobs.png | 3 + .../assets/images/main_nv_logo_square.png | 3 + .../membership-inference-protection.png | 3 + .../assets/images/nemo-eval-diagram-2510.png | 3 + .../images/nemo-platform-architecture.svg | 3 + docs/fern/assets/images/nemo-wordmark.svg | 3 + docs/fern/assets/images/nvidia-logo-white.png | 3 + docs/fern/assets/images/overall-sqs-dps.png | 3 + .../images/packed_vs_not_packed_tables.png | 3 + .../images/packed_vs_not_packed_val_loss.png | 3 + docs/fern/assets/images/pca.png | 3 + docs/fern/assets/images/pii-replay.png | 3 + docs/fern/assets/images/pod_logs.png | 3 + .../assets/images/prompt-tuned-models.png | 3 + docs/fern/assets/images/run-evaluation.png | 3 + docs/fern/assets/images/runtime.png | 3 + docs/fern/assets/images/safe-synthesizer.png | 3 + docs/fern/assets/images/secrets.png | 3 + docs/fern/assets/images/seq_packing_stats.png | 3 + docs/fern/assets/images/sqs-subscores.png | 3 + docs/fern/assets/images/sqs-usability.png | 3 + docs/fern/assets/images/street-scene.jpg | 3 + .../images/text-semantic-similarity.png | 3 + .../images/text-structure-similarity.png | 3 + .../assets/images/wandb_charts_example.png | 3 + docs/fern/assets/images/workflow_logs.png | 3 + docs/fern/assets/images/workspaces.png | 3 + docs/fern/assets/nemo-wordmark.svg | 13 + docs/fern/assets/nvidia-logo-white.png | 3 + docs/fern/components/Authors.tsx | 56 + docs/fern/components/BadgeLinks.tsx | 37 + docs/fern/components/CustomCard.tsx | 34 + docs/fern/components/MetricsTable.tsx | 106 + docs/fern/components/NotebookViewer.tsx | 399 + docs/fern/components/Tag.tsx | 63 + docs/fern/components/TrajectoryViewer.tsx | 144 + docs/fern/customizer/_images/gpu_memory.png | 3 + .../customizer/_images/gpu_utilization.png | 3 + .../_images/packed_vs_not_packed_tables.png | 3 + .../_images/packed_vs_not_packed_val_loss.png | 3 + docs/fern/customizer/_images/runtime.png | 3 + .../customizer/_images/seq_packing_stats.png | 3 + .../_images/wandb_charts_example.png | 3 + docs/fern/docs.yml | 41 + .../evaluator/images/agent_eval_framework.png | 3 + .../images/evaluation_result_dir.png | 3 + .../images/evaluator_interactions.png | 3 + .../images/nemo-eval-diagram-2510.png | 3 + docs/fern/evaluator/images/pod_logs.png | 3 + docs/fern/evaluator/images/workflow_logs.png | 3 + docs/fern/fern.config.json | 4 + .../_snippets/input/car-audio-theft.jpg | 3 + .../_snippets/input/street-scene.jpg | 3 + docs/fern/images/favicon.ico | 3 + docs/fern/images/main_nv_logo_square.png | 3 + docs/fern/images/nvidia-logo-white.png | 3 + .../attribute-inference-protection.png | 3 + .../_images/column-correlation-stability.png | 3 + .../_images/column-distribution-stability.png | 3 + .../_images/deep-structure-stability.png | 3 + .../_images/dps-subscores.png | 3 + .../_images/dps-usability.png | 3 + .../membership-inference-protection.png | 3 + .../_images/overall-sqs-dps.png | 3 + docs/fern/safe-synthesizer/_images/pca.png | 3 + .../safe-synthesizer/_images/pii-replay.png | 3 + .../_images/sqs-subscores.png | 3 + .../_images/sqs-usability.png | 3 + .../_images/text-semantic-similarity.png | 3 + .../_images/text-structure-similarity.png | 3 + docs/fern/studio/_images/customizations.png | 3 + docs/fern/studio/_images/dashboard.png | 3 + docs/fern/studio/_images/datasets.png | 3 + docs/fern/studio/_images/evaluations.png | 3 + docs/fern/studio/_images/jobs.png | 3 + .../studio/_images/prompt-tuned-models.png | 3 + docs/fern/studio/_images/run-evaluation.png | 3 + docs/fern/studio/_images/safe-synthesizer.png | 3 + docs/fern/studio/_images/secrets.png | 3 + docs/fern/studio/_images/workspaces.png | 3 + docs/fern/versions/_nav_order.yml | 226 + docs/fern/versions/_nav_titles.yml | 226 + docs/get-started/concepts/entities.mdx | 12 +- .../concepts/entity-references.mdx | 4 + docs/get-started/concepts/filtering.mdx | 21 +- docs/get-started/concepts/index.mdx | 22 +- docs/get-started/concepts/manage-files.mdx | 977 +- docs/get-started/concepts/manage-secrets.mdx | 82 +- docs/get-started/concepts/projects.mdx | 241 +- docs/get-started/concepts/workspaces.mdx | 231 +- docs/get-started/setup.mdx | 22 +- docs/guardrails/concepts/architecture.mdx | 25 +- docs/guardrails/concepts/checks.mdx | 273 +- .../configuration-structure.mdx | 52 +- .../configurations/default-configs.mdx | 110 +- .../concepts/configurations/index.mdx | 12 +- .../configurations/manage-configs.mdx | 381 +- docs/guardrails/concepts/index.mdx | 14 +- docs/guardrails/concepts/inference.mdx | 271 +- docs/guardrails/index.mdx | 26 +- docs/guardrails/observability.mdx | 68 +- docs/guardrails/terminology.mdx | 100 +- docs/guardrails/tutorials/content-safety.mdx | 160 +- .../tutorials/deploy-nemoguard-nims.mdx | 352 +- docs/guardrails/tutorials/index.mdx | 20 +- .../tutorials/injection-detection.mdx | 171 +- docs/guardrails/tutorials/multimodal-data.mdx | 146 +- docs/guardrails/tutorials/parallel-rails.mdx | 147 +- docs/helm/index.mdx | 26 +- docs/index.mdx | 22 +- docs/javascripts/api-filter.js | 185 - docs/notebooks/ndd_evaluator.md | 655 - docs/pysdk/client/index.mdx | 14 +- docs/pysdk/index.mdx | 29 +- docs/requirements-mkdocs.txt | 41 - docs/requirements.mdx | 16 +- docs/run-inference/about.mdx | 282 +- .../run-inference/tutorials/deploy-models.mdx | 1397 +- docs/run-inference/tutorials/index.mdx | 8 +- .../run-inference/tutorials/run-inference.mdx | 364 +- .../safe-synthesizer/about/data-synthesis.mdx | 26 +- docs/safe-synthesizer/about/evaluation.mdx | 18 +- .../about/host-local-development.mdx | 18 +- docs/safe-synthesizer/about/index.mdx | 57 +- docs/safe-synthesizer/about/jobs.mdx | 26 +- .../about/pii-replacement.mdx | 16 +- docs/safe-synthesizer/about/reference.mdx | 26 +- docs/safe-synthesizer/getting-started.mdx | 34 +- .../tutorials/differential-privacy.mdx | 18 +- docs/safe-synthesizer/tutorials/index.mdx | 16 +- .../tutorials/safe-synthesizer-101.mdx | 49 +- docs/set-up/config-reference.mdx | 6 +- docs/set-up/helm/backup-and-restore.mdx | 8 +- docs/set-up/helm/database-setup.mdx | 8 +- docs/set-up/helm/file-storage.mdx | 15 +- docs/set-up/helm/index.mdx | 38 +- docs/set-up/helm/ingress.mdx | 22 +- docs/set-up/helm/install.mdx | 51 +- docs/set-up/helm/multinode-networking.mdx | 14 +- docs/set-up/helm/openshift.mdx | 16 +- docs/set-up/helm/persistent-volumes.mdx | 19 +- docs/set-up/helm/prerequisites.mdx | 14 +- docs/set-up/index.mdx | 32 +- docs/set-up/manage-jobs.mdx | 32 +- docs/set-up/milvus.mdx | 10 +- docs/set-up/opentelemetry.mdx | 12 +- docs/set-up/security.mdx | 51 +- docs/studio/agents.mdx | 24 +- docs/studio/index.mdx | 27 +- docs/studio/monitor.mdx | 14 +- docs/studio/suggestions.mdx | 22 +- docs/stylesheets/nvidia.css | 433 - docs/support-matrix.mdx | 16 +- docs/template/EULA.md | 4 - docs/template/acknowledgements.md | 7 - .../template/getting_started/deploy-docker.md | 38 - docs/template/getting_started/deploy-helm.md | 35 - docs/template/models.md | 20 - docs/template/overview.md | 12 - docs/template/playbooks/playbook.md | 31 - docs/template/reference/api-reference.md | 81 - docs/template/release-notes.md | 64 - docs/template/support-matrix.md | 34 - docs/template/using-ms.md | 33 - docs/troubleshooting/cluster-setup.mdx | 16 +- docs/troubleshooting/customizer.mdx | 8 +- docs/troubleshooting/data-designer.mdx | 18 +- docs/troubleshooting/evaluator.mdx | 81 +- docs/troubleshooting/guardrails.mdx | 55 +- docs/troubleshooting/index.mdx | 22 +- docs/troubleshooting/studio.mdx | 8 +- docs/work/guardrails/README.md | 7 - .../work/guardrails/content-safety/config.yml | 20 - mkdocs.yml | 513 - 327 files changed, 40178 insertions(+), 12679 deletions(-) delete mode 100644 docs/CONTRIBUTING.md delete mode 100644 docs/Makefile delete mode 100644 docs/README.md delete mode 100644 docs/_hooks/hide_unready_docs.py delete mode 100644 docs/_overrides/.gitkeep delete mode 100644 docs/_overrides/main.html delete mode 100644 docs/_scripts/README.md delete mode 100644 docs/_scripts/format_code_blocks.py delete mode 100644 docs/_scripts/lint_notebooks.py delete mode 100644 docs/_scripts/run_notebooks.py delete mode 100755 docs/_scripts/setup_mkdocs_env.sh delete mode 100644 docs/_scripts/test_format_code_blocks.py delete mode 100644 docs/_scripts/test_lint_notebooks.py delete mode 100644 docs/_snippets/cli-summary.md delete mode 100644 docs/_snippets/naming-rules.md delete mode 100644 docs/_snippets/nvidia-build-model-provider.md delete mode 100644 docs/_snippets/tutorials/cli-sdk-setup.md delete mode 100644 docs/_snippets/tutorials/prereqs.md create mode 100644 docs/fern/_images/nemo-platform-architecture.svg create mode 100644 docs/fern/apis/nemo-platform/generators.yml create mode 100644 docs/fern/apis/nemo-platform/openapi.yaml create mode 100644 docs/fern/assets/favicon.ico create mode 100644 docs/fern/assets/images/agent_eval_framework.png create mode 100644 docs/fern/assets/images/attribute-inference-protection.png create mode 100644 docs/fern/assets/images/car-audio-theft.jpg create mode 100644 docs/fern/assets/images/column-correlation-stability.png create mode 100644 docs/fern/assets/images/column-distribution-stability.png create mode 100644 docs/fern/assets/images/customizations.png create mode 100644 docs/fern/assets/images/dashboard.png create mode 100644 docs/fern/assets/images/datasets.png create mode 100644 docs/fern/assets/images/deep-structure-stability.png create mode 100644 docs/fern/assets/images/dps-subscores.png create mode 100644 docs/fern/assets/images/dps-usability.png create mode 100644 docs/fern/assets/images/evaluation_result_dir.png create mode 100644 docs/fern/assets/images/evaluations.png create mode 100644 docs/fern/assets/images/evaluator_interactions.png create mode 100644 docs/fern/assets/images/favicon.ico create mode 100644 docs/fern/assets/images/gpu_memory.png create mode 100644 docs/fern/assets/images/gpu_utilization.png create mode 100644 docs/fern/assets/images/jobs.png create mode 100644 docs/fern/assets/images/main_nv_logo_square.png create mode 100644 docs/fern/assets/images/membership-inference-protection.png create mode 100644 docs/fern/assets/images/nemo-eval-diagram-2510.png create mode 100644 docs/fern/assets/images/nemo-platform-architecture.svg create mode 100644 docs/fern/assets/images/nemo-wordmark.svg create mode 100644 docs/fern/assets/images/nvidia-logo-white.png create mode 100644 docs/fern/assets/images/overall-sqs-dps.png create mode 100644 docs/fern/assets/images/packed_vs_not_packed_tables.png create mode 100644 docs/fern/assets/images/packed_vs_not_packed_val_loss.png create mode 100644 docs/fern/assets/images/pca.png create mode 100644 docs/fern/assets/images/pii-replay.png create mode 100644 docs/fern/assets/images/pod_logs.png create mode 100644 docs/fern/assets/images/prompt-tuned-models.png create mode 100644 docs/fern/assets/images/run-evaluation.png create mode 100644 docs/fern/assets/images/runtime.png create mode 100644 docs/fern/assets/images/safe-synthesizer.png create mode 100644 docs/fern/assets/images/secrets.png create mode 100644 docs/fern/assets/images/seq_packing_stats.png create mode 100644 docs/fern/assets/images/sqs-subscores.png create mode 100644 docs/fern/assets/images/sqs-usability.png create mode 100644 docs/fern/assets/images/street-scene.jpg create mode 100644 docs/fern/assets/images/text-semantic-similarity.png create mode 100644 docs/fern/assets/images/text-structure-similarity.png create mode 100644 docs/fern/assets/images/wandb_charts_example.png create mode 100644 docs/fern/assets/images/workflow_logs.png create mode 100644 docs/fern/assets/images/workspaces.png create mode 100644 docs/fern/assets/nemo-wordmark.svg create mode 100644 docs/fern/assets/nvidia-logo-white.png create mode 100644 docs/fern/components/Authors.tsx create mode 100644 docs/fern/components/BadgeLinks.tsx create mode 100644 docs/fern/components/CustomCard.tsx create mode 100644 docs/fern/components/MetricsTable.tsx create mode 100644 docs/fern/components/NotebookViewer.tsx create mode 100644 docs/fern/components/Tag.tsx create mode 100644 docs/fern/components/TrajectoryViewer.tsx create mode 100644 docs/fern/customizer/_images/gpu_memory.png create mode 100644 docs/fern/customizer/_images/gpu_utilization.png create mode 100644 docs/fern/customizer/_images/packed_vs_not_packed_tables.png create mode 100644 docs/fern/customizer/_images/packed_vs_not_packed_val_loss.png create mode 100644 docs/fern/customizer/_images/runtime.png create mode 100644 docs/fern/customizer/_images/seq_packing_stats.png create mode 100644 docs/fern/customizer/_images/wandb_charts_example.png create mode 100644 docs/fern/docs.yml create mode 100644 docs/fern/evaluator/images/agent_eval_framework.png create mode 100644 docs/fern/evaluator/images/evaluation_result_dir.png create mode 100644 docs/fern/evaluator/images/evaluator_interactions.png create mode 100644 docs/fern/evaluator/images/nemo-eval-diagram-2510.png create mode 100644 docs/fern/evaluator/images/pod_logs.png create mode 100644 docs/fern/evaluator/images/workflow_logs.png create mode 100644 docs/fern/fern.config.json create mode 100644 docs/fern/guardrails/_snippets/input/car-audio-theft.jpg create mode 100644 docs/fern/guardrails/_snippets/input/street-scene.jpg create mode 100644 docs/fern/images/favicon.ico create mode 100644 docs/fern/images/main_nv_logo_square.png create mode 100644 docs/fern/images/nvidia-logo-white.png create mode 100644 docs/fern/safe-synthesizer/_images/attribute-inference-protection.png create mode 100644 docs/fern/safe-synthesizer/_images/column-correlation-stability.png create mode 100644 docs/fern/safe-synthesizer/_images/column-distribution-stability.png create mode 100644 docs/fern/safe-synthesizer/_images/deep-structure-stability.png create mode 100644 docs/fern/safe-synthesizer/_images/dps-subscores.png create mode 100644 docs/fern/safe-synthesizer/_images/dps-usability.png create mode 100644 docs/fern/safe-synthesizer/_images/membership-inference-protection.png create mode 100644 docs/fern/safe-synthesizer/_images/overall-sqs-dps.png create mode 100644 docs/fern/safe-synthesizer/_images/pca.png create mode 100644 docs/fern/safe-synthesizer/_images/pii-replay.png create mode 100644 docs/fern/safe-synthesizer/_images/sqs-subscores.png create mode 100644 docs/fern/safe-synthesizer/_images/sqs-usability.png create mode 100644 docs/fern/safe-synthesizer/_images/text-semantic-similarity.png create mode 100644 docs/fern/safe-synthesizer/_images/text-structure-similarity.png create mode 100644 docs/fern/studio/_images/customizations.png create mode 100644 docs/fern/studio/_images/dashboard.png create mode 100644 docs/fern/studio/_images/datasets.png create mode 100644 docs/fern/studio/_images/evaluations.png create mode 100644 docs/fern/studio/_images/jobs.png create mode 100644 docs/fern/studio/_images/prompt-tuned-models.png create mode 100644 docs/fern/studio/_images/run-evaluation.png create mode 100644 docs/fern/studio/_images/safe-synthesizer.png create mode 100644 docs/fern/studio/_images/secrets.png create mode 100644 docs/fern/studio/_images/workspaces.png create mode 100644 docs/fern/versions/_nav_order.yml create mode 100644 docs/fern/versions/_nav_titles.yml delete mode 100644 docs/javascripts/api-filter.js delete mode 100644 docs/notebooks/ndd_evaluator.md delete mode 100644 docs/requirements-mkdocs.txt delete mode 100644 docs/stylesheets/nvidia.css delete mode 100644 docs/template/EULA.md delete mode 100644 docs/template/acknowledgements.md delete mode 100644 docs/template/getting_started/deploy-docker.md delete mode 100644 docs/template/getting_started/deploy-helm.md delete mode 100644 docs/template/models.md delete mode 100644 docs/template/overview.md delete mode 100644 docs/template/playbooks/playbook.md delete mode 100644 docs/template/reference/api-reference.md delete mode 100644 docs/template/release-notes.md delete mode 100644 docs/template/support-matrix.md delete mode 100644 docs/template/using-ms.md delete mode 100644 docs/work/guardrails/README.md delete mode 100644 docs/work/guardrails/content-safety/config.yml delete mode 100644 mkdocs.yml diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md deleted file mode 100644 index 1f89376285..0000000000 --- a/docs/CONTRIBUTING.md +++ /dev/null @@ -1,225 +0,0 @@ -# Documentation Contributions - -This repository builds the NeMo Platform documentation with MkDocs. The current -source of truth is the root [`mkdocs.yml`](../mkdocs.yml), the documentation -source tree under [`docs/`](.), and the targets in [`docs/Makefile`](Makefile). - -Do not recreate the old Sphinx release flow as part of normal documentation -work. In particular, `docs/conf.py` and `docs/versions1.json` are gone, and -`docs/requirements-docs.txt` has been removed. - -## Current Stack - -- **Source:** Markdown, notebooks, snippets, images, and assets under `docs/`. -- **Config:** `mkdocs.yml` at the repository root. -- **Theme:** Material for MkDocs with local overrides in `docs/_overrides/`. -- **Environment:** `docs/.venv-mkdocs`, created by `make -C docs env`. -- **Dependencies:** `docs/requirements-mkdocs.txt`. -- **Build output:** `site/`. -- **Versioning and deploys:** `mike` publishes versioned docs to the `gh-pages` - branch through `docs/Makefile` targets. -- **CI:** `.github/workflows/docs.yaml` builds docs when `docs/**`, - `mkdocs.yml`, or `openapi/openapi.yaml` changes. - -The docs environment is intentionally separate from the main Python workspace. -The setup script uses `uv --no-config` to create and populate `.venv-mkdocs`. -Use the Make targets rather than installing MkDocs packages into the repo -environment by hand. - -## Directory Layout - -- `api/`: REST API landing page. -- `assets/`, `images/`, `stylesheets/`, `javascripts/`: MkDocs assets. -- `_hooks/`: MkDocs hook modules. -- `_overrides/`: Material for MkDocs theme overrides. -- `_scripts/`: docs helper scripts. -- `_snippets/`: reusable Markdown fragments that are included by pages. -- Feature directories such as `get-started/`, `guardrails/`, `evaluator/`, - `customizer/`, and `safe-synthesizer/`: published documentation content. - -## Local Commands - -Run docs commands from the repository root unless otherwise noted. - -```bash -make -C docs env -make -C docs live -make -C docs html -``` - -Useful variants: - -```bash -# Use another port if 8000 is busy. -LIVE_DOCS_PORT=8001 make -C docs live - -# Build or serve pages that are currently hidden from normal output. -make -C docs html-with-unready -make -C docs live-with-unready - -# Run the same strict MkDocs build used by CI. -make -C docs publish - -# Remove generated docs output and the MkDocs virtualenv. -make -C docs clean -``` - -Before opening or updating a PR, run the strict build: - -```bash -make -C docs publish -``` - -For changes that include JSON or Python fenced code blocks, also run: - -```bash -make -C docs check-code-blocks -``` - -To apply the supported code-block formatter: - -```bash -make -C docs format-code-blocks -``` - -For changes that touch notebooks or Python examples in the linted docs areas, -run: - -```bash -make -C docs lint-python -``` - -## Authoring Guidelines - -Add pages under the relevant `docs/` section and update the explicit `nav:` -block in `mkdocs.yml`. Pages that are not listed in `nav:` can still build, but -they will not appear in the published navigation unless linked from another -page. - -Use `index.md` for section landing pages. Keep section names and file names -stable when possible because redirects and external links may depend on them. -If you rename or move a published page, add a redirect in the `redirects` plugin -configuration in `mkdocs.yml`. - -Use MkDocs-supported Markdown, not Sphinx-only directives. The active extensions -include admonitions, details blocks, tabbed content, Mermaid fences, tables, -footnotes, definition lists, task lists, snippets, and syntax highlighting. - -Use substitutions from the `extra:` block in `mkdocs.yml`, such as -`{{ platform_name }}` or `{{ release }}`. Add new shared product names and -versions there instead of hard-coding them across many pages. - -Place reusable Markdown fragments under `_snippets/`. Snippet files are excluded -as standalone pages and are intended to be included from real pages with the -configured snippets extension. - -Put static assets under the existing docs asset directories. Image assets under -`docs/images/**` are fetched with Git LFS in CI. If local images render as tiny -text pointer files, run: - -```bash -git lfs pull --include="docs/images/**" --exclude="" -``` - -Notebooks are rendered by `mkdocs-jupyter` with execution disabled. Keep checked -in notebook output deliberate, small, and reviewable. - -## Generated Material - -Some docs are generated from code or OpenAPI output. Regenerate them from the -repo root instead of hand-editing generated files. - -```bash -make generate-cli-reference-docs -make generate-config-reference-docs -``` - -The API reference page uses `docs/api/openapi.yaml`, a tracked symlink to -`openapi/openapi.yaml`. When API routes, schemas, or service models change, -follow the repository SDK/OpenAPI workflow and regenerate the OpenAPI spec before -building the docs. - -The Python SDK reference is rendered through `mkdocstrings` from -`sdk/python/nemo-platform/src`. If SDK-facing API changes affect the docs, follow -the repository SDK generation process rather than editing generated SDK output -by hand. - -## Hidden Docs - -Some pages are temporarily gated by `extra.hidden_docs` in `mkdocs.yml` and the -`docs/_hooks/hide_unready_docs.py` hook. In normal builds, the hook removes -matching nav entries, source files, inline links, and API filter chips. - -Use the `with-unready` targets to inspect hidden docs locally: - -```bash -make -C docs live-with-unready -make -C docs html-with-unready -``` - -When a hidden area becomes ready, update `extra.hidden_docs` in `mkdocs.yml` -rather than deleting files or working around the hook from individual pages. - -## CI And Publishing - -The Documentation workflow runs `make -C docs publish` and uploads the `site/` -directory as a `docs-site` artifact. - -For pull requests from branches in the same repository, the workflow also -deploys a PR preview under GitHub Pages by using `make -C docs deploy-pr-preview`. -Pull requests from forks receive the build artifact but do not deploy a preview. - -Pushes to `main` deploy the `main` docs version with the `latest` alias. Tag -builds derive the docs version from the tag name, stripping an optional `docs/` -prefix and an optional leading `v`, then deploy that version with the `latest` -alias. - -Manual publishing uses these targets: - -```bash -make -C docs deploy-pages -PR_NUMBER= make -C docs deploy-pr-preview -PR_NUMBER= make -C docs delete-pr-preview -``` - -Deployment targets push to `gh-pages`; do not run them casually from a local -branch. - -## PR Expectations - -Keep docs updates in the same PR as the user-facing code change when that is the -fastest way to keep behavior and documentation in sync. For larger features, -draft the initial technical content near the code change and ask the docs owners -for review. - -For navigation moves, new top-level sections, broad terminology changes, or -anything that changes the information architecture, coordinate with the docs -owners before reshaping the tree. - -For small fixes, update the relevant Markdown directly and include the local -`make -C docs publish` result in the PR description. - -## Troubleshooting - -If `make -C docs live` fails because the port is already in use, set -`LIVE_DOCS_PORT`: - -```bash -LIVE_DOCS_PORT=8001 make -C docs live -``` - -If the docs virtualenv appears stale, rebuild it: - -```bash -make -C docs clean -make -C docs env -``` - -If a strict build reports missing files, check the `nav:` entries, redirects, -snippet includes, and hidden-doc patterns in `mkdocs.yml`. - -If API docs look stale, regenerate `openapi/openapi.yaml` through the repository -OpenAPI workflow and confirm that `docs/api/openapi.yaml` still points to it. - -If CI reports that a docs image is a Git LFS pointer, fetch the image content -with Git LFS and recommit the real asset state. diff --git a/docs/Makefile b/docs/Makefile deleted file mode 100644 index 65ab373ae3..0000000000 --- a/docs/Makefile +++ /dev/null @@ -1,126 +0,0 @@ -# Makefile targets for MkDocs documentation - -.PHONY: help env mkdocs-env html html-fast html-with-unready mkdocs-html \ - publish mkdocs-publish live live-fast live-with-unready mkdocs-live \ - clean mkdocs-clean format-code-blocks check-code-blocks sync lint-python deploy-pages \ - deploy-pr-preview delete-pr-preview mkdocs-version require-pr-number - -# Fix locale if current setting is unsupported (common in containers). -ifeq ($(shell locale 2>&1 | grep -q "Cannot set" && echo broken),broken) -export LC_ALL=C -endif - -# Usage: -# make html # Build documentation with MkDocs -# make html-with-unready # Build with temporarily hidden docs included -# make live # Live server with auto-reload -# make live-with-unready # Live server with temporarily hidden docs included -# make publish # Production build (strict) - -MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) -MKDOCS_ROOT = $(MAKEFILE_DIR).. -SITE_DIR = $(MKDOCS_ROOT)/site -VENV_MKDOCS = $(MAKEFILE_DIR).venv-mkdocs -VENV_MKDOCS_BIN = $(VENV_MKDOCS)/bin -VENV_PYTHON = $(VENV_MKDOCS_BIN)/python -MKDOCS = $(VENV_MKDOCS_BIN)/mkdocs -MIKE = $(VENV_MKDOCS_BIN)/mike -LIVE_DOCS_HOST ?= 127.0.0.1 -LIVE_DOCS_PORT ?= 8000 -PAGES_BRANCH ?= gh-pages -DOCS_SITE_URL ?= https://nvidia-nemo.github.io/nemo-platform/latest/ -DOCS_CANONICAL_VERSION ?= latest -DOCS_VERSION ?= main -DOCS_ALIAS ?= latest -PR_PREVIEW_VERSION = pr-$(PR_NUMBER) -PR_PREVIEW_PREFIX = pr-preview/$(PR_PREVIEW_VERSION) -PR_PREVIEW_SITE_URL ?= $(DOCS_SITE_URL)$(PR_PREVIEW_PREFIX)/ -PR_PREVIEW_CANONICAL_VERSION ?= $(PR_PREVIEW_VERSION) -export PATH := $(VENV_MKDOCS_BIN):$(HOME)/.local/bin:$(PATH) - -help: - @echo "Makefile commands:" - @grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' - -env: mkdocs-env ## Set up the documentation virtualenv - -mkdocs-env: ## Set up the MkDocs virtualenv - cd $(MAKEFILE_DIR) && bash _scripts/setup_mkdocs_env.sh - -html-fast: html ## Build documentation with MkDocs - -html: mkdocs-html ## Build documentation with MkDocs - -html-with-unready: mkdocs-env ## Build documentation with temporarily hidden docs included - @echo "Building with MkDocs, including temporarily hidden docs..." - cd $(MKDOCS_ROOT) && NMP_HIDE_UNREADY_DOCS=false $(MKDOCS) build --strict --config-file mkdocs.yml - -mkdocs-html: mkdocs-env ## Build documentation with MkDocs - @echo "Building with MkDocs..." - cd $(MKDOCS_ROOT) && $(MKDOCS) build --strict --config-file mkdocs.yml - -publish: mkdocs-publish ## Build production documentation with MkDocs - -mkdocs-publish: mkdocs-env ## Build production documentation with MkDocs in strict mode - @echo "Publishing with MkDocs..." - cd $(MKDOCS_ROOT) && $(MKDOCS) build --config-file mkdocs.yml --strict - -live-fast: live ## Start the MkDocs live-reload server - -live: mkdocs-live ## Start the MkDocs live-reload server - -live-with-unready: mkdocs-env ## Start the live server with temporarily hidden docs included - @echo "Starting MkDocs live server, including temporarily hidden docs..." - cd $(MKDOCS_ROOT) && NMP_HIDE_UNREADY_DOCS=false $(MKDOCS) serve --config-file mkdocs.yml --dev-addr $(LIVE_DOCS_HOST):$(LIVE_DOCS_PORT) - -mkdocs-live: mkdocs-env ## Start the MkDocs live-reload server - @echo "Starting MkDocs live server..." - cd $(MKDOCS_ROOT) && $(MKDOCS) serve --config-file mkdocs.yml --dev-addr $(LIVE_DOCS_HOST):$(LIVE_DOCS_PORT) - -clean: mkdocs-clean ## Remove documentation build artifacts and virtualenv - -mkdocs-clean: ## Remove MkDocs build artifacts and virtualenv - rm -rf $(SITE_DIR) $(VENV_MKDOCS) - -format-code-blocks: mkdocs-env ## Autoformat supported fenced code blocks - @echo "Formatting supported code blocks in markdown files..." - cd $(MAKEFILE_DIR) && $(VENV_PYTHON) _scripts/format_code_blocks.py --docs-dir $(MAKEFILE_DIR) - -check-code-blocks: mkdocs-env ## Check supported fenced code blocks are formatted - @echo "Checking supported code block formatting..." - cd $(MAKEFILE_DIR) && $(VENV_PYTHON) _scripts/format_code_blocks.py --docs-dir $(MAKEFILE_DIR) --check - -sync: ## Sync files from documentation/docs/ into docs/ - @echo "Syncing files from documentation/docs/ to docs/..." - @rsync -av $(MAKEFILE_DIR)/../documentation/docs/ $(MAKEFILE_DIR) - @rm -rf $(MAKEFILE_DIR)/../documentation/docs/ - @echo "Sync complete" - -lint-python: mkdocs-env ## Lint selected notebooks and run type checking - @echo "Linting Python code in notebooks..." - cd $(MAKEFILE_DIR) && $(VENV_PYTHON) _scripts/lint_notebooks.py run-inference/ safe-synthesizer/ --type-check - -deploy-pages: mkdocs-env ## Deploy main docs to GitHub Pages with mike - @echo "Deploying $(DOCS_VERSION) docs to $(PAGES_BRANCH) with alias $(DOCS_ALIAS)..." - cd $(MKDOCS_ROOT) && NMP_DOCS_SITE_URL="$(DOCS_SITE_URL)" NMP_DOCS_CANONICAL_VERSION="$(DOCS_CANONICAL_VERSION)" $(MIKE) deploy --push --update-aliases --alias-type redirect \ - --config-file mkdocs.yml --branch $(PAGES_BRANCH) $(DOCS_VERSION) $(DOCS_ALIAS) - cd $(MKDOCS_ROOT) && NMP_DOCS_SITE_URL="$(DOCS_SITE_URL)" NMP_DOCS_CANONICAL_VERSION="$(DOCS_CANONICAL_VERSION)" $(MIKE) set-default --push --config-file mkdocs.yml \ - --branch $(PAGES_BRANCH) $(DOCS_ALIAS) - -deploy-pr-preview: require-pr-number mkdocs-env ## Deploy a PR preview to GitHub Pages with mike - @echo "Deploying PR preview $(PR_PREVIEW_VERSION) under $(PR_PREVIEW_PREFIX)..." - cd $(MKDOCS_ROOT) && NMP_DOCS_SITE_URL="$(PR_PREVIEW_SITE_URL)" NMP_DOCS_CANONICAL_VERSION="$(PR_PREVIEW_CANONICAL_VERSION)" $(MIKE) deploy --push --update-aliases --alias-type redirect \ - --config-file mkdocs.yml --branch $(PAGES_BRANCH) --deploy-prefix $(PR_PREVIEW_PREFIX) \ - --title "PR #$(PR_NUMBER)" $(PR_PREVIEW_VERSION) - cd $(MKDOCS_ROOT) && NMP_DOCS_SITE_URL="$(PR_PREVIEW_SITE_URL)" NMP_DOCS_CANONICAL_VERSION="$(PR_PREVIEW_CANONICAL_VERSION)" $(MIKE) set-default --push --config-file mkdocs.yml \ - --branch $(PAGES_BRANCH) --deploy-prefix $(PR_PREVIEW_PREFIX) $(PR_PREVIEW_VERSION) - -delete-pr-preview: require-pr-number mkdocs-env ## Delete a PR preview from GitHub Pages - @echo "Deleting PR preview $(PR_PREVIEW_VERSION) from $(PR_PREVIEW_PREFIX)..." - cd $(MKDOCS_ROOT) && $(MIKE) delete --push --config-file mkdocs.yml \ - --branch $(PAGES_BRANCH) --deploy-prefix $(PR_PREVIEW_PREFIX) --all || true - -mkdocs-version: deploy-pages ## Deploy a versioned release with mike - -require-pr-number: - @test -n "$(PR_NUMBER)" || (echo "PR_NUMBER is required" && exit 1) diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index 7607b58aba..0000000000 --- a/docs/README.md +++ /dev/null @@ -1,100 +0,0 @@ -# NeMo Platform Docs - -The documentation site is built with MkDocs from the repository-root -`mkdocs.yml` and the source files in this `docs/` directory. The generated site -is written to `../site/`. - -For contribution process details, see [`CONTRIBUTING.md`](CONTRIBUTING.md). - -## Prerequisites - -- Python 3.11 or later. Verify with `python3 --version`. See the - [Python downloads](https://www.python.org/downloads/) page for installation - options. -- [uv](https://docs.astral.sh/uv/). Verify with `uv --version`. -- GNU Make. Verify with `make --version`. -- Bash and a POSIX-like shell environment for the helper scripts. - -MkDocs and the required plugins are installed into `docs/.venv-mkdocs` from -`docs/requirements-mkdocs.txt`: - -```bash -make -C docs env -``` - -Node.js and npm are not required for the current MkDocs workflow. - -## Local Development - -Set up the docs-local virtual environment: - -```bash -make -C docs env -``` - -Serve the site with live reload: - -```bash -make -C docs live -``` - -Build the site with the same strict MkDocs mode used by CI: - -```bash -make -C docs publish -``` - -Use another local port if `8000` is already in use: - -```bash -LIVE_DOCS_PORT=8001 make -C docs live -``` - -To include pages that are temporarily hidden by `extra.hidden_docs` in -`mkdocs.yml`, use: - -```bash -make -C docs live-with-unready -make -C docs html-with-unready -``` - -## Useful Checks - -Check supported JSON and Python fenced code blocks: - -```bash -make -C docs check-code-blocks -``` - -Apply the formatter for supported code blocks: - -```bash -make -C docs format-code-blocks -``` - -Lint selected notebook-style docs: - -```bash -make -C docs lint-python -``` - -## Generated References - -Regenerate CLI and config reference docs from the repository root: - -```bash -make generate-cli-reference-docs -make generate-config-reference-docs -``` - -The REST API page at `docs/api/index.md` renders -`docs/api/openapi.yaml`, which is a tracked symlink to -`../openapi/openapi.yaml`. - -## Directory Layout - -For the docs tree reference, see -[`CONTRIBUTING.md#directory-layout`](CONTRIBUTING.md#directory-layout). - -MkDocs excludes this README and other docs-maintenance files from the published -site through `exclude_docs` in `mkdocs.yml`. diff --git a/docs/_hooks/hide_unready_docs.py b/docs/_hooks/hide_unready_docs.py deleted file mode 100644 index b32813204c..0000000000 --- a/docs/_hooks/hide_unready_docs.py +++ /dev/null @@ -1,214 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -"""Hide temporarily unavailable docs from MkDocs builds. - -Configured by the `extra.hidden_docs` block in mkdocs.yml. -""" - -from __future__ import annotations - -import posixpath -import re -from collections.abc import Sequence -from fnmatch import fnmatchcase -from html import escape -from typing import Any - - -INLINE_LINK_RE = re.compile(r"(?]*\bclass="api-chip\b[^"]*")(?=[^>]*\bdata-tag="(?P[^"]*)")[^>]*>[^<]+[ \t]*\n?', - re.MULTILINE, -) - - -def on_config(config: Any, **_: Any) -> Any: - hidden_docs = _hidden_docs_config(config) - if not hidden_docs: - return config - - sections, path_patterns, _ = hidden_docs - nav = config.get("nav") - if nav: - config["nav"] = _filter_nav(nav, sections, path_patterns) - - return config - - -def on_files(files: Any, config: Any, **_: Any) -> Any: - hidden_docs = _hidden_docs_config(config) - if not hidden_docs: - return files - - _, path_patterns, _ = hidden_docs - for file in list(files): - src_path = _file_src_path(file) - if src_path and _matches_hidden_path(src_path, path_patterns): - files.remove(file) - - return files - - -def on_page_markdown(markdown: str, page: Any, config: Any, **_: Any) -> str: - hidden_docs = _hidden_docs_config(config) - if not hidden_docs: - return markdown - - _, path_patterns, api_tags = hidden_docs - page_src_path = _file_src_path(page.file) - - def replace_link(match: re.Match[str]) -> str: - label = match.group(1) - target = match.group(2) - target_path = _resolve_link_target(target, page_src_path) - if target_path and _matches_hidden_path(target_path, path_patterns): - return label - return match.group(0) - - markdown = INLINE_LINK_RE.sub(replace_link, markdown) - if page_src_path == "api/index.md": - markdown = _filter_api_reference(markdown, api_tags) - - return markdown - - -def _hidden_docs_config(config: Any) -> tuple[frozenset[str], tuple[str, ...], frozenset[str]] | None: - extra = config.get("extra", {}) - hidden_docs = extra.get("hidden_docs", {}) - if not isinstance(hidden_docs, dict) or not _is_enabled(hidden_docs.get("enabled")): - return None - - sections_val = _hidden_docs_sequence(hidden_docs, "sections") - paths_val = _hidden_docs_sequence(hidden_docs, "paths") - api_tags_val = _hidden_docs_sequence(hidden_docs, "api_tags") - - sections = frozenset(str(section) for section in sections_val) - path_patterns = tuple(_normalize_pattern(pattern) for pattern in paths_val if str(pattern).strip()) - api_tags = frozenset(str(tag) for tag in api_tags_val) - if not sections and not path_patterns and not api_tags: - return None - - return sections, path_patterns, api_tags - - -def _hidden_docs_sequence(hidden_docs: dict[str, Any], key: str) -> Sequence[Any]: - value = hidden_docs.get(key, ()) - if isinstance(value, (str, bytes)) or not isinstance(value, Sequence): - raise ValueError(f"extra.hidden_docs.{key} must be a sequence, not {type(value).__name__}") - return value - - -def _is_enabled(value: Any) -> bool: - if isinstance(value, str): - return value.strip().lower() not in {"", "0", "false", "no", "off"} - return bool(value) - - -def _filter_nav(nav: list[Any], sections: frozenset[str], path_patterns: tuple[str, ...]) -> list[Any]: - filtered_items: list[Any] = [] - for item in nav: - filtered_item = _filter_nav_item(item, sections, path_patterns) - if filtered_item is not None: - filtered_items.append(filtered_item) - return filtered_items - - -def _filter_nav_item(item: Any, sections: frozenset[str], path_patterns: tuple[str, ...]) -> Any | None: - if isinstance(item, str): - return None if _matches_hidden_path(item, path_patterns) else item - - if not isinstance(item, dict): - return item - - filtered_item: dict[str, Any] = {} - for title, value in item.items(): - if title in sections: - continue - - if isinstance(value, str): - if not _matches_hidden_path(value, path_patterns): - filtered_item[title] = value - elif isinstance(value, list): - filtered_children = _filter_nav(value, sections, path_patterns) - if filtered_children: - filtered_item[title] = filtered_children - else: - filtered_item[title] = value - - return filtered_item or None - - -def _matches_hidden_path(path: str, patterns: tuple[str, ...]) -> bool: - normalized_path = _normalize_path(path) - for pattern in patterns: - if pattern.endswith("/**"): - directory = pattern[:-3] - if normalized_path == directory or normalized_path.startswith(f"{directory}/"): - return True - - if fnmatchcase(normalized_path, pattern): - return True - - return False - - -def _filter_api_reference(markdown: str, api_tags: frozenset[str]) -> str: - if not api_tags: - return markdown - - markdown = API_CHIP_RE.sub( - lambda match: "" if match.group("tag") in api_tags else match.group(0), - markdown, - ) - - if "data-hidden-tags=" in markdown: - return markdown - - hidden_tags = ",".join(escape(tag, quote=True) for tag in sorted(api_tags)) - return markdown.replace( - '
', - f'
', - 1, - ) - - -def _resolve_link_target(target: str, page_src_path: str) -> str | None: - target_path = target.strip() - if not target_path: - return None - - if target_path.startswith("<") and ">" in target_path: - target_path = target_path[1 : target_path.index(">")] - else: - target_path = target_path.split(maxsplit=1)[0] - - if ( - not target_path - or target_path.startswith("#") - or target_path.startswith("//") - or re.match(r"^[a-zA-Z][a-zA-Z0-9+.-]*:", target_path) - ): - return None - - target_path = target_path.split("#", 1)[0].split("?", 1)[0] - if not target_path: - return None - - if target_path.startswith("/"): - return _normalize_path(target_path) - - page_directory = posixpath.dirname(_normalize_path(page_src_path)) - return _normalize_path(posixpath.join(page_directory, target_path)) - - -def _file_src_path(file: Any) -> str: - return _normalize_path(getattr(file, "src_uri", "") or getattr(file, "src_path", "")) - - -def _normalize_pattern(pattern: Any) -> str: - return _normalize_path(str(pattern).strip()) - - -def _normalize_path(path: str) -> str: - return posixpath.normpath(path.strip().replace("\\", "/")).lstrip("./") diff --git a/docs/_overrides/.gitkeep b/docs/_overrides/.gitkeep deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/docs/_overrides/main.html b/docs/_overrides/main.html deleted file mode 100644 index edf4a86bf3..0000000000 --- a/docs/_overrides/main.html +++ /dev/null @@ -1,10 +0,0 @@ -{% extends "base.html" %} - -{% block content %} - {{ super() }} - -{% endblock %} diff --git a/docs/_scripts/README.md b/docs/_scripts/README.md deleted file mode 100644 index 9bdaffc2af..0000000000 --- a/docs/_scripts/README.md +++ /dev/null @@ -1,59 +0,0 @@ -# Documentation Scripts - -This directory contains helper scripts used by the active MkDocs documentation -workflow. Prefer calling these through `docs/Makefile` unless you are debugging a -script directly. - -## Active Scripts - -### `setup_mkdocs_env.sh` - -Creates or updates the docs-local `.venv-mkdocs` virtual environment with `uv` -and installs `docs/requirements-mkdocs.txt`. - -```bash -make -C docs env -``` - -### `format_code_blocks.py` - -Formats supported fenced code blocks in Markdown files. JSON blocks are parsed -and reserialized with stable indentation. Python blocks are formatted with Ruff -from the MkDocs virtual environment. - -```bash -make -C docs check-code-blocks -make -C docs format-code-blocks -``` - -### `lint_notebooks.py` - -Lints selected Markdown notebook sources and performs type checks for supported -Python code blocks. - -```bash -make -C docs lint-python -``` - -### `run_notebooks.py` - -Runs docs notebooks or Markdown notebook sources that use the `@nemo-nb` -markers. This is a direct utility rather than a Makefile target. - -```bash -uv run python docs/_scripts/run_notebooks.py docs/ -``` - -## Tests - -The `test_*.py` files validate the active docs helper scripts and can be run -with `uv run pytest` from the repository root. - -## Adding Scripts - -When adding a script: - -1. Put it in `docs/_scripts/`. -2. Wire it into `docs/Makefile` if it is part of the normal docs workflow. -3. Add or update focused tests when the behavior is nontrivial. -4. Document the Make target or direct command in this README. diff --git a/docs/_scripts/format_code_blocks.py b/docs/_scripts/format_code_blocks.py deleted file mode 100644 index 79cd7ba786..0000000000 --- a/docs/_scripts/format_code_blocks.py +++ /dev/null @@ -1,272 +0,0 @@ -#!/usr/bin/env python3 -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -"""Format fenced code blocks in Markdown documentation.""" - -from __future__ import annotations - -import argparse -import importlib.util -import json -import os -import re -import subprocess -import sys -from dataclasses import dataclass -from functools import cache -from pathlib import Path -from typing import Iterable - -FENCE_RE = re.compile(r"^([ \t]*)(`{3,}|~{3,})(.*)$") -JSON_LANGUAGES = {"json", "output"} -PYTHON_LANGUAGES = {"python", "py"} -RAW_START = "{% raw %}" -RAW_END = "{% endraw %}" -RUFF_FORMAT_TIMEOUT_SECONDS = 30 -SKIP_DIRS = { - ".git", - ".venv-docs", - ".venv-mkdocs", - "_build", - "_test_serve", - "site", -} - - -@dataclass -class FileResult: - path: Path - changed: bool - blocks_changed: int - - -def find_markdown_files(paths: Iterable[Path], docs_dir: Path) -> list[Path]: - markdown_files: list[Path] = [] - for path in paths: - if not path.exists(): - raise FileNotFoundError(f"{path} does not exist") - if path.is_file(): - if path.suffix == ".md": - markdown_files.append(path) - continue - for root, dirs, files in os.walk(path): - dirs[:] = [directory for directory in dirs if directory not in SKIP_DIRS] - for filename in files: - file_path = Path(root) / filename - if file_path.suffix == ".md" and file_path.is_relative_to(docs_dir): - markdown_files.append(file_path) - return sorted(set(markdown_files)) - - -def fence_closes(line: str, fence_marker: str) -> bool: - stripped = line.lstrip(" \t") - marker_char = fence_marker[0] - marker_len = len(fence_marker) - if not stripped.startswith(marker_char * marker_len): - return False - return stripped.strip(marker_char).strip() == "" - - -def get_language(info_string: str) -> str: - stripped = info_string.strip() - if not stripped: - return "" - return stripped.split(maxsplit=1)[0].lower() - - -def remove_markdown_indent(lines: list[str], indent: str) -> str: - if not indent: - return "\n".join(lines) - return "\n".join(line[len(indent) :] if line.startswith(indent) else line for line in lines) - - -def format_json_lines(content_lines: list[str], indent: str, language: str) -> tuple[list[str], bool]: - if language not in JSON_LANGUAGES: - return content_lines, False - if any("{% raw %}" in line or "{% endraw %}" in line for line in content_lines): - return content_lines, False - - raw_content = remove_markdown_indent(content_lines, indent) - candidate = raw_content.strip() - if not candidate or candidate[0] not in "[{": - return content_lines, False - - try: - parsed = json.loads(candidate) - except json.JSONDecodeError: - return content_lines, False - - if not isinstance(parsed, (dict, list)): - return content_lines, False - - formatted = json.dumps(parsed, ensure_ascii=False, indent=2) - formatted_lines = [add_markdown_indent(line, indent) for line in formatted.splitlines()] - return formatted_lines, formatted_lines != content_lines - - -def format_python_lines(content_lines: list[str], indent: str, language: str) -> tuple[list[str], bool]: - if language not in PYTHON_LANGUAGES: - return content_lines, False - if not ruff_available(): - raise RuntimeError("Ruff is required to format Python code blocks. Run `make -C docs mkdocs-env` first.") - - raw_lines = [line[len(indent) :] if indent and line.startswith(indent) else line for line in content_lines] - raw_content = "\n".join(raw_lines).rstrip("\n") - if not raw_content.strip(): - return content_lines, False - - try: - result = subprocess.run( - [sys.executable, "-m", "ruff", "format", "--stdin-filename", "docs-code-block.py", "-"], - input=f"{raw_content}\n", - text=True, - capture_output=True, - check=False, - timeout=RUFF_FORMAT_TIMEOUT_SECONDS, - ) - except (OSError, subprocess.TimeoutExpired): - return content_lines, False - - if result.returncode != 0: - return content_lines, False - - formatted_lines = [add_markdown_indent(line, indent) for line in result.stdout.rstrip("\n").splitlines()] - return formatted_lines, formatted_lines != content_lines - - -def add_markdown_indent(line: str, indent: str) -> str: - if not line: - return line - return f"{indent}{line}" - - -@cache -def ruff_available() -> bool: - return importlib.util.find_spec("ruff") is not None - - -def format_code_lines( - content_lines: list[str], - indent: str, - language: str, - inside_raw_block: bool, -) -> tuple[list[str], bool]: - python_lines, python_changed = format_python_lines(content_lines, indent, language) - if python_changed: - return python_lines, True - if inside_raw_block: - return content_lines, False - return format_json_lines(content_lines, indent, language) - - -def format_markdown(text: str) -> tuple[str, int]: - trailing_newline = text.endswith("\n") - lines = text.splitlines() - formatted_lines: list[str] = [] - blocks_changed = 0 - index = 0 - inside_raw_block = False - - while index < len(lines): - line = lines[index] - fence_match = FENCE_RE.match(line) - if not fence_match: - formatted_lines.append(line) - inside_raw_block = update_raw_block_state(line, inside_raw_block) - index += 1 - continue - - indent, fence_marker, info_string = fence_match.groups() - language = get_language(info_string) - formatted_lines.append(line) - index += 1 - - content_lines: list[str] = [] - while index < len(lines) and not fence_closes(lines[index], fence_marker): - content_lines.append(lines[index]) - index += 1 - - new_content_lines, changed = format_code_lines(content_lines, indent, language, inside_raw_block) - if changed: - blocks_changed += 1 - formatted_lines.extend(new_content_lines) - - if index < len(lines): - formatted_lines.append(lines[index]) - index += 1 - - formatted = "\n".join(formatted_lines) - if trailing_newline: - formatted += "\n" - return formatted, blocks_changed - - -def update_raw_block_state(line: str, inside_raw_block: bool) -> bool: - raw_start = RAW_START in line - raw_end = RAW_END in line - if raw_start and not raw_end: - return True - if raw_end and not raw_start: - return False - return inside_raw_block - - -def process_file(path: Path, check: bool) -> FileResult: - original = path.read_text(encoding="utf-8") - formatted, blocks_changed = format_markdown(original) - changed = formatted != original - if changed and not check: - path.write_text(formatted, encoding="utf-8") - return FileResult(path=path, changed=changed, blocks_changed=blocks_changed) - - -def display_path(path: Path, docs_dir: Path) -> Path: - if path.is_relative_to(docs_dir): - return path.relative_to(docs_dir) - return path - - -def main() -> int: - parser = argparse.ArgumentParser(description="Format fenced code blocks in Markdown docs.") - parser.add_argument("paths", nargs="*", type=Path, help="Markdown files or directories to process.") - parser.add_argument( - "--docs-dir", type=Path, default=Path("."), help="Docs directory. Defaults to current directory." - ) - parser.add_argument("--check", action="store_true", help="Exit non-zero if any files would be changed.") - args = parser.parse_args() - - docs_dir = args.docs_dir.resolve() - paths = [path.resolve() for path in args.paths] if args.paths else [docs_dir] - - try: - markdown_files = find_markdown_files(paths, docs_dir) - except FileNotFoundError as exc: - print(f"ERROR: {exc}", file=sys.stderr) - return 2 - - changed_results: list[FileResult] = [] - total_blocks = 0 - try: - for path in markdown_files: - result = process_file(path, args.check) - if result.changed: - changed_results.append(result) - total_blocks += result.blocks_changed - except RuntimeError as exc: - print(f"ERROR: {exc}", file=sys.stderr) - return 2 - - if changed_results: - action = "Would format" if args.check else "Formatted" - for result in changed_results: - print(f"{action} {display_path(result.path, docs_dir)} ({result.blocks_changed} code block(s))") - print(f"{action} {len(changed_results)} file(s), {total_blocks} code block(s)") - return 1 if args.check else 0 - - print("All supported code blocks are already formatted.") - return 0 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/docs/_scripts/lint_notebooks.py b/docs/_scripts/lint_notebooks.py deleted file mode 100644 index 1b62f8aaef..0000000000 --- a/docs/_scripts/lint_notebooks.py +++ /dev/null @@ -1,568 +0,0 @@ -#!/usr/bin/env python3 -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -"""Lint notebook documentation files for syntax errors. - -This script validates markdown notebooks without executing them, making it -suitable for CI/CD pipelines and pre-commit hooks. - -Usage: - uv run python docs/_scripts/lint_notebooks.py [ ...] - uv run python docs/_scripts/lint_notebooks.py docs/run-inference/ - uv run python docs/_scripts/lint_notebooks.py docs/run-inference/ docs/safe-synthesizer/ - uv run python docs/_scripts/lint_notebooks.py docs/run-inference/ --type-check - uv run python docs/_scripts/lint_notebooks.py docs/run-inference/ --markers-only # Only files with marker -""" - -import argparse -import ast -import re -import subprocess -import sys -import tempfile -from pathlib import Path -from typing import List, Tuple - -from nemo_nb import ( - MarkdownToNotebookConverter, - expand_includes, - find_processable_notebooks, - is_excluded_file, - is_in_excluded_dir, - print_conflicts_error, -) - -# Pattern to detect Python code blocks in markdown -RE_PYTHON_CODE_BLOCK = re.compile(r"^```python\s*$", re.MULTILINE) -SKIP_TYPE_CHECK_MARKER = "" - -# Pattern to match code fence opening (```python) with optional leading whitespace -RE_CODE_FENCE_OPEN = re.compile(r"^(\s*)(`{3,})python\s*$") -# Pattern to match code fence closing -RE_CODE_FENCE_CLOSE = re.compile(r"^(\s*)`{3,}\s*$") - - -def has_python_code_blocks(md_path: Path) -> bool: - """Check if a markdown file has Python code blocks. - - Args: - md_path: Path to the markdown file - - Returns: - True if the file contains at least one ```python code block - """ - try: - content = md_path.read_text(encoding="utf-8") - return bool(RE_PYTHON_CODE_BLOCK.search(content)) - except Exception: - return False - - -def extract_python_blocks_with_line_numbers(md_path: Path) -> List[Tuple[int, str]]: - """Extract Python code blocks from markdown with their starting line numbers. - - Args: - md_path: Path to the markdown file - - Returns: - List of (start_line_in_markdown, code_content) tuples. - start_line_in_markdown is 1-indexed and points to the first line of code - (the line after the opening ```python fence). - """ - try: - content = md_path.read_text(encoding="utf-8") - # Expand include directives before extracting code blocks - content = expand_includes(content, md_path.parent) - except Exception: - return [] - - lines = content.split("\n") - blocks: List[Tuple[int, str]] = [] - skip_next_python_block = False - - i = 0 - while i < len(lines): - line = lines[i] - if line.strip() == SKIP_TYPE_CHECK_MARKER: - skip_next_python_block = True - i += 1 - continue - - fence_match = RE_CODE_FENCE_OPEN.match(line) - - if fence_match: - fence_delimiter = fence_match.group(2) - fence_len = len(fence_delimiter) - indentation = fence_match.group(1) - indent_len = len(indentation) - - # Code starts on the next line (1-indexed for display) - code_start_line = i + 2 # +1 for 0-indexed to 1-indexed, +1 to skip fence line - - # Find closing fence - code_lines = [] - j = i + 1 - while j < len(lines): - close_match = RE_CODE_FENCE_CLOSE.match(lines[j]) - if close_match: - # Check if this fence closes our block (same or more backticks) - close_fence = lines[j].lstrip() - if len(close_fence.rstrip()) >= fence_len: - break - # Remove indentation from code line if present - code_line = lines[j] - if indent_len > 0 and code_line.startswith(indentation): - code_line = code_line[indent_len:] - code_lines.append(code_line) - j += 1 - - if code_lines and not skip_next_python_block: - blocks.append((code_start_line, "\n".join(code_lines))) - - skip_next_python_block = False - - i = j + 1 - else: - i += 1 - - return blocks - - -def prepare_notebook_for_type_check(notebook_path: Path) -> Tuple[str, List[int]]: - """ - Extract and combine Python code blocks from a markdown file for type checking. - - Args: - notebook_path: Path to the markdown file - - Returns: - Tuple of (combined_source, line_mapping) where line_mapping maps - 1-indexed line numbers in the combined source to 1-indexed line numbers - in the original markdown file. Returns ("", []) if extraction fails. - """ - blocks = extract_python_blocks_with_line_numbers(notebook_path) - - if not blocks: - return "", [] - - combined_lines: List[str] = [] - # line_mapping[i] = markdown line number for combined source line i+1 (1-indexed) - line_mapping: List[int] = [] - - for md_start_line, source in blocks: - source_lines = source.splitlines() - for offset, line in enumerate(source_lines): - combined_lines.append(line) - line_mapping.append(md_start_line + offset) - # Add blank line between blocks (maps to last line of previous block) - combined_lines.append("") - if source_lines: - line_mapping.append(md_start_line + len(source_lines) - 1) - else: - line_mapping.append(md_start_line) - - return "\n".join(combined_lines), line_mapping - - -def translate_line_number(line_in_combined: int, line_mapping: List[int]) -> int: - """Translate a line number from combined source to original markdown. - - Args: - line_in_combined: 1-indexed line number in the combined Python source - line_mapping: Mapping from combined line numbers to markdown line numbers - - Returns: - 1-indexed line number in the original markdown file - """ - if not line_mapping: - return line_in_combined - - # line_mapping is 0-indexed (line_mapping[0] = markdown line for combined line 1) - idx = line_in_combined - 1 - if 0 <= idx < len(line_mapping): - return line_mapping[idx] - - # Fallback: return original line number if out of range - return line_in_combined - - -def batch_type_check(notebook_paths: List[Path]) -> dict[Path, Tuple[int, str]]: - """ - Run ty type checker on multiple notebooks in a single batch. - - This is much faster than running ty separately for each notebook, - as it eliminates the repeated uv startup overhead. - - Args: - notebook_paths: List of notebook paths to type check - - Returns: - Dictionary mapping notebook path to (exit_code, output) tuple - """ - if not notebook_paths: - return {} - - # Prepare all notebooks and create temp files - temp_files: dict[Path, str] = {} # notebook_path -> temp_file_path - temp_to_notebook: dict[str, Path] = {} # temp_file_path -> notebook_path - line_mappings: dict[Path, List[int]] = {} # notebook_path -> line mapping - - try: - for nb_path in notebook_paths: - combined_source, line_mapping = prepare_notebook_for_type_check(nb_path) - if not combined_source: - continue - - f = tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) - f.write(combined_source) - f.close() - - temp_files[nb_path] = f.name - temp_to_notebook[f.name] = nb_path - line_mappings[nb_path] = line_mapping - - if not temp_files: - return {nb: (0, "") for nb in notebook_paths} - - # Run ty check once on all files - temp_paths = list(temp_files.values()) - result = subprocess.run( - ["uv", "run", "ty", "check", "--output-format", "concise"] + temp_paths, - capture_output=True, - text=True, - timeout=120, - ) - - # Parse output to map errors back to notebooks - output_by_notebook: dict[Path, List[str]] = {nb: [] for nb in notebook_paths} - - # Warnings to filter out (common SDK false positives) - ignored_warnings = ["possibly-unbound-attribute"] - - # Pattern to parse ty output: "path:line:col: message" - ty_output_pattern = re.compile(r"^(.+?):(\d+):(\d+): (.+)$") - - for line in result.stdout.splitlines(): - # Skip ignored warning types - if any(f"[{warning}]" in line for warning in ignored_warnings): - continue - - # ty output format: "path/to/file.py:line:col: error message" - for temp_path, nb_path in temp_to_notebook.items(): - if line.startswith(temp_path): - # Parse and translate line number - match = ty_output_pattern.match(line) - if match: - combined_line = int(match.group(2)) - col = match.group(3) - message = match.group(4) - md_line = translate_line_number(combined_line, line_mappings.get(nb_path, [])) - cleaned_line = f"{nb_path.name}:{md_line}:{col}: {message}" - else: - # Fallback: just replace path - cleaned_line = line.replace(temp_path, str(nb_path.name)) - output_by_notebook[nb_path].append(cleaned_line) - break - - # Build result dictionary - results: dict[Path, Tuple[int, str]] = {} - - # Check if ty produced any per-file output at all - has_any_output = any(output_by_notebook[nb] for nb in notebook_paths if nb in output_by_notebook) - - for nb_path in notebook_paths: - if nb_path in temp_files: - # Had Python code and was checked - nb_output = "\n".join(output_by_notebook[nb_path]) - # If this notebook had errors, return non-zero exit code - if output_by_notebook[nb_path]: - exit_code = 1 - elif result.returncode != 0 and not has_any_output: - # ty failed but produced no per-file output - ty itself failed - exit_code = 1 - nb_output = "Type checking failed: ty exited with non-zero status but produced no output" - else: - exit_code = 0 - results[nb_path] = (exit_code, nb_output) - else: - # No Python code or conversion failed - results[nb_path] = (0, "") - - return results - - except subprocess.TimeoutExpired: - return {nb: (1, "Type checking timed out") for nb in notebook_paths} - except FileNotFoundError: - return {nb: (0, "") for nb in notebook_paths} - except Exception as e: - return {nb: (1, f"Type checking failed: {e}") for nb in notebook_paths} - finally: - # Clean up temp files - for temp_path in temp_files.values(): - Path(temp_path).unlink(missing_ok=True) - - -def lint_notebook_syntax_only(notebook_path: Path) -> List[Tuple[str, str]]: - """ - Lint a notebook file for syntax errors only (fast). - - Args: - notebook_path: Path to the notebook file - - Returns: - List of (error_type, message) tuples - """ - errors = [] - - try: - # Read and expand includes before conversion - content = notebook_path.read_text(encoding="utf-8") - expanded_content = expand_includes(content, notebook_path.parent) - - # Write expanded content to temporary file for conversion - with tempfile.NamedTemporaryFile(mode="w", suffix=".md", delete=False, encoding="utf-8") as temp_file: - temp_file.write(expanded_content) - temp_path = Path(temp_file.name) - - try: - converter = MarkdownToNotebookConverter() - notebook = converter.convert(temp_path) - finally: - temp_path.unlink(missing_ok=True) - except Exception as e: - return [("conversion", f"Failed to convert notebook: {e}")] - - # Check Python cells for syntax errors - python_cells = [ - (i, c) - for i, c in enumerate(notebook["cells"]) - if c.get("cell_type") == "code" and c.get("metadata", {}).get("language", "python") == "python" - ] - - for cell_idx, cell in python_cells: - source = cell.get("source", []) - if isinstance(source, list): - source = "".join(source) - - if not source.strip(): - continue - - # Syntax check (fast, always enabled) - try: - ast.parse(source) - except SyntaxError as e: - errors.append(("syntax", f"Cell {cell_idx + 1}: {e.msg} at line {e.lineno}")) - - return errors - - -def lint_notebooks( - notebooks_dir: str, - enable_type_check: bool = False, - markers_only: bool = False, -) -> int: - """ - Lint all notebooks in a directory. - - Args: - notebooks_dir: Directory to search for notebooks - enable_type_check: Enable ty type checking - markers_only: If True, only lint files with @nemo-nb: process marker. - If False (default), lint any markdown file with Python code blocks. - - Returns: - Exit code: 0 if all valid, 1 if errors found - """ - path_obj = Path(notebooks_dir) - - # Handle single file case - if path_obj.is_file() and path_obj.suffix == ".md": - if is_excluded_file(path_obj): - print(f"Skipping {path_obj}: excluded filename") - return 0 - - content = path_obj.read_text(encoding="utf-8") - has_marker = "@nemo-nb: process" in content - has_python = has_python_code_blocks(path_obj) - - if markers_only and not has_marker: - print(f"Skipping {path_obj}: no @nemo-nb: process marker (use without --markers-only to lint anyway)") - return 0 - - if not has_marker and not has_python: - print(f"Skipping {path_obj}: no @nemo-nb: process marker and no Python code blocks") - return 0 - - return _lint_single_file(path_obj, enable_type_check) - - # Directory case - if markers_only: - # Only use marked files - result = find_processable_notebooks(notebooks_dir) - - if result.conflicts: - print_conflicts_error(result.conflicts) - return 1 - - md_files = list(result.md_files) - ipynb_files = list(result.ipynb_files) - else: - # Find all markdown files with Python code blocks - ipynb_files: List[Path] = [] - md_files: List[Path] = [] - - if path_obj.is_dir(): - all_md = list(path_obj.rglob("*.md")) - # Filter out excluded directories (_generated, _build), excluded filenames, and check for Python code - md_files = [ - md - for md in all_md - if not is_in_excluded_dir(md) and not is_excluded_file(md) and has_python_code_blocks(md) - ] - - # Also include .ipynb files with marker (for completeness) - result = find_processable_notebooks(notebooks_dir) - if result.conflicts: - print_conflicts_error(result.conflicts) - return 1 - ipynb_files = list(result.ipynb_files) - - total_count = len(ipynb_files) + len(md_files) - - if total_count == 0: - print(f"No markdown files with Python code blocks found in {notebooks_dir}") - if path_obj.is_dir(): - all_md = list(path_obj.rglob("*.md")) - if all_md: - print(f"(Found {len(all_md)} .md files total, but none had Python code blocks)") - return 0 - - print(f"Linting {total_count} file(s) with Python code...\n") - - failed = [] - - # Lint .ipynb files - for nb in ipynb_files: - # For .ipynb, we need to check if it's already a valid notebook - # This is a placeholder - you might want to add actual ipynb linting - print(f"ℹ {nb} (skipping .ipynb linting for now)") - - # Step 1: Syntax check all files (fast, always enabled) - print("Running syntax checks...") - syntax_results: dict[Path, List[Tuple[str, str]]] = {} - for md in md_files: - syntax_results[md] = lint_notebook_syntax_only(md) - - # Step 2: Batch type check all files (if enabled) - type_results: dict[Path, Tuple[int, str]] = {} - if enable_type_check: - print("Running batch type checking...") - type_results = batch_type_check(md_files) - else: - type_results = {md: (0, "") for md in md_files} - - # Step 3: Print results - print("\nResults:\n") - for md in md_files: - syntax_errors = syntax_results[md] - ty_exit_code, ty_output = type_results.get(md, (0, "")) - - has_errors = bool(syntax_errors) or ty_exit_code != 0 - - if not has_errors: - print(f"✓ {md}") - continue - - print(f"✗ {md}") - - for error_type, message in syntax_errors: - print(f" SYNTAX ERROR: {message}") - - if ty_exit_code != 0 and ty_output: - print(f" TYPE CHECK FAILED (exit code {ty_exit_code}):") - for line in ty_output.strip().splitlines(): - print(f" {line}") - - failed.append(md) - - print("\n" + "=" * 60) - if failed: - print(f"✗ FAILED: {len(failed)} page(s) have errors") - for nb in failed: - print(f" - {nb}") - return 1 - else: - print("✓ All pages passed") - return 0 - - -def _lint_single_file(md_path: Path, enable_type_check: bool) -> int: - """Lint a single markdown file. - - Args: - md_path: Path to the markdown file - enable_type_check: Enable ty type checking - - Returns: - Exit code: 0 if valid, 1 if errors found - """ - print(f"Linting {md_path}...\n") - - # Syntax check - syntax_errors = lint_notebook_syntax_only(md_path) - - # Type check - ty_exit_code, ty_output = 0, "" - if enable_type_check: - type_results = batch_type_check([md_path]) - ty_exit_code, ty_output = type_results.get(md_path, (0, "")) - - has_errors = bool(syntax_errors) or ty_exit_code != 0 - - if not has_errors: - print(f"✓ {md_path}") - print("\n" + "=" * 60) - print("✓ All notebooks passed") - return 0 - - print(f"✗ {md_path}") - - for error_type, message in syntax_errors: - print(f" SYNTAX ERROR: {message}") - - if ty_exit_code != 0 and ty_output: - print(f" TYPE CHECK FAILED (exit code {ty_exit_code}):") - for line in ty_output.strip().splitlines(): - print(f" {line}") - - print("\n" + "=" * 60) - print("✗ FAILED: 1 page has errors") - print(f" - {md_path}") - return 1 - - -if __name__ == "__main__": - parser = argparse.ArgumentParser( - description="Lint notebook files for syntax and type errors without executing them" - ) - parser.add_argument("paths", nargs="+", help="Directories or files to lint") - parser.add_argument( - "--type-check", - action="store_true", - help="Enable ty type checking (slower but catches parameter errors, type mismatches, etc.)", - ) - parser.add_argument( - "--markers-only", - action="store_true", - help="Only lint files with @nemo-nb: process marker. " - "By default, any markdown file with Python code blocks is linted.", - ) - args = parser.parse_args() - - overall_exit = 0 - for path in args.paths: - result = lint_notebooks(path, args.type_check, args.markers_only) - if result != 0: - overall_exit = result - sys.exit(overall_exit) diff --git a/docs/_scripts/run_notebooks.py b/docs/_scripts/run_notebooks.py deleted file mode 100644 index 473923dad9..0000000000 --- a/docs/_scripts/run_notebooks.py +++ /dev/null @@ -1,66 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -# run via root directory: -# uv run python docs/_scripts/run_notebooks.py docs/ -# -# Options: -# --language python Run only Python cells (default, skip shell/curl) -# --language shell Run only shell cells (skip Python) -# --use-temporary-venv Create and use a temporary virtualenv for running notebooks -# --requirements FILE Install packages from requirements file using uv (requires --use-temporary-venv) -# -# This script finds and runs notebooks with the @nemo-nb: process marker. -# It supports both: -# - .ipynb files (standard Jupyter notebooks) -# - .md files (markdown notebooks using nemo-nb format) -# -# For .md files, the script converts them to .ipynb first, runs them, -# then cleans up the generated notebook. -# -# Core logic lives in nmp.testing.notebooks and can be imported directly. - -import argparse -import sys - -from dotenv import load_dotenv -from nmp.testing.notebooks import run_notebooks - -# Load environment variables from .env file if it exists -load_dotenv() - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Run notebooks in a directory.") - parser.add_argument("dir", help="Directory containing notebooks to run") - parser.add_argument( - "--language", - choices=["all", "python", "shell"], - default="python", - help="Which cells to execute: python only (default), all of them, or shell only", - ) - parser.add_argument( - "--keep-temp-files", - action="store_true", - help="Keep the temporary files generated by the notebooks", - ) - parser.add_argument( - "--use-temporary-venv", - action="store_true", - help="Create a temporary virtualenv for running notebooks. " - "This allows testing library installations from scratch in a clean environment.", - ) - parser.add_argument( - "--requirements", - type=str, - default=None, - metavar="FILE", - help="Path to a requirements.txt file to install in the temporary virtualenv. " - "Requires --use-temporary-venv. Packages are installed using 'uv pip install -r'.", - ) - args = parser.parse_args() - - sys.exit( - run_notebooks( - args.dir, args.language, args.keep_temp_files, args.use_temporary_venv, args.requirements - ) - ) diff --git a/docs/_scripts/setup_mkdocs_env.sh b/docs/_scripts/setup_mkdocs_env.sh deleted file mode 100755 index 96b0d6009b..0000000000 --- a/docs/_scripts/setup_mkdocs_env.sh +++ /dev/null @@ -1,20 +0,0 @@ -#!/usr/bin/env bash -# Sets up the MkDocs virtualenv using uv -set -euo pipefail - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -DOCS_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" -VENV_DIR="$DOCS_DIR/.venv-mkdocs" -PYTHON_BIN="${PYTHON:-python3}" - -if [ ! -d "$VENV_DIR" ]; then - echo "Creating MkDocs virtualenv at $VENV_DIR ..." - UV_PYTHON_DOWNLOADS=never uv --no-config venv --python "$PYTHON_BIN" "$VENV_DIR" -fi - -echo "Installing MkDocs dependencies..." -UV_PYTHON_DOWNLOADS=never uv --no-config pip install --python "$VENV_DIR" -r "$DOCS_DIR/requirements-mkdocs.txt" - -echo "✓ MkDocs environment ready: $VENV_DIR" -echo " Activate with: source $VENV_DIR/bin/activate" -echo " Or use directly: $VENV_DIR/bin/mkdocs" diff --git a/docs/_scripts/test_format_code_blocks.py b/docs/_scripts/test_format_code_blocks.py deleted file mode 100644 index 92ab8a4f7b..0000000000 --- a/docs/_scripts/test_format_code_blocks.py +++ /dev/null @@ -1,240 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -from pathlib import Path - -from docs._scripts.format_code_blocks import display_path, format_markdown - - -def test_format_markdown_formats_json_fence() -> None: - markdown = """Before -```json -{"b": [1,2], "a": true} -``` -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """Before -```json -{ - "b": [ - 1, - 2 - ], - "a": true -} -``` -""" - ) - - -def test_format_markdown_preserves_markdown_indent() -> None: - markdown = """??? "Output" - ```json - {"a":1} - ``` -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """??? "Output" - ```json - { - "a": 1 - } - ``` -""" - ) - - -def test_format_markdown_skips_invalid_json() -> None: - markdown = """```json -{"a": 1,} -``` -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 0 - assert formatted == markdown - - -def test_format_markdown_skips_raw_blocks() -> None: - markdown = """{% raw %} -```json -{"a":1} -``` -{% endraw %} -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 0 - assert formatted == markdown - - -def test_format_markdown_formats_compact_python_indent() -> None: - markdown = """```python -model_configs = [ - dd.ModelConfig( - provider="system/nvidia-build", - ) -] -``` -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """```python -model_configs = [ - dd.ModelConfig( - provider="system/nvidia-build", - ) -] -``` -""" - ) - - -def test_format_markdown_formats_compact_python_indent_in_raw_block() -> None: - markdown = """{% raw %} -```python -config_builder.add_column( - dd.ExpressionColumnConfig( - expr="{{ item.value }}", - ) -) -``` -{% endraw %} -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """{% raw %} -```python -config_builder.add_column( - dd.ExpressionColumnConfig( - expr="{{ item.value }}", - ) -) -``` -{% endraw %} -""" - ) - - -def test_format_markdown_preserves_markdown_indent_for_python() -> None: - markdown = """=== "Python" - - ```python - if enabled: - print("enabled") - ``` -""" - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """=== "Python" - - ```python - if enabled: - print("enabled") - ``` -""" - ) - - -def test_format_markdown_does_not_indent_python_blank_lines() -> None: - markdown = ( - """!!! note - - ```python - import os -""" - " \n" - """ value=1 - ``` -""" - ) - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """!!! note - - ```python - import os - - value = 1 - ``` -""" - ) - - -def test_format_markdown_does_not_indent_json_blank_lines() -> None: - markdown = ( - """!!! note - - ```json - { -""" - " \n" - """ "a": 1 - } - ``` -""" - ) - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 1 - assert ( - formatted - == """!!! note - - ```json - { - "a": 1 - } - ``` -""" - ) - - -def test_format_markdown_skips_python_with_triple_quoted_strings() -> None: - markdown = '''```python -prompt = """ - keep this leading space -""" -``` -''' - - formatted, blocks_changed = format_markdown(markdown) - - assert blocks_changed == 0 - assert formatted == markdown - - -def test_display_path_handles_paths_outside_docs_dir() -> None: - docs_dir = Path("/repo/docs") - - assert display_path(Path("/repo/docs/page.md"), docs_dir) == Path("page.md") - assert display_path(Path("/tmp/page.md"), docs_dir) == Path("/tmp/page.md") diff --git a/docs/_scripts/test_lint_notebooks.py b/docs/_scripts/test_lint_notebooks.py deleted file mode 100644 index 8273345853..0000000000 --- a/docs/_scripts/test_lint_notebooks.py +++ /dev/null @@ -1,27 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -from pathlib import Path - -from docs._scripts.lint_notebooks import extract_python_blocks_with_line_numbers - - -def test_extract_python_blocks_skips_next_block_marked_for_type_check(tmp_path: Path) -> None: - markdown = """ - - -```python -from litellm import completion -completion(model="demo", messages=[]) -``` - -```python -print("kept") -``` -""" - notebook = tmp_path / "notebook.md" - notebook.write_text(markdown, encoding="utf-8") - - blocks = extract_python_blocks_with_line_numbers(notebook) - - assert blocks == [(10, 'print("kept")')] diff --git a/docs/_snippets/cli-summary.md b/docs/_snippets/cli-summary.md deleted file mode 100644 index d669de6609..0000000000 --- a/docs/_snippets/cli-summary.md +++ /dev/null @@ -1,19 +0,0 @@ -**Global options** apply to all commands: - -| Option | Description | -|--------|-------------| -| `--base-url` | Base URL for the NeMo Platform API | -| `--output-format, -f ` | Output format for how results are printed. [possible values: table, json, yaml, markdown, csv, raw, code] | -| `--no-truncate` | Don't truncate long values in table/markdown/csv output | -| `--timestamp-format ` | Timestamp format for table/markdown/csv output [possible values: relative, iso8601] | -| `--verbose, -v` | Enable verbose messaging. This only impacts logs that are visible, it doesn't change any data outputs. | -| `--agent-mode, -A` | Enable agent-friendly output mode with extra context for coding agents. | - -**Commands** are organized into categories: - -| Category | Commands | Description | -|----------|----------|-------------| -| Setup | `setup`, `services`, `skills` | Set up and run local platform components | -| CLI functions | `chat`, `docs`, `wait`, `agent`, `plugins` | Interactive, documentation, and agent-oriented workflows | -| Core plugins | `files`, `inference`, `jobs`, `models`, `secrets`, `workspaces` | Core platform resources | -| Functional plugins | `guardrail` | Functional service and plugin commands | diff --git a/docs/_snippets/naming-rules.md b/docs/_snippets/naming-rules.md deleted file mode 100644 index 6f8b2e8db5..0000000000 --- a/docs/_snippets/naming-rules.md +++ /dev/null @@ -1,9 +0,0 @@ -Resource names must follow these rules: - -- Must start with a lowercase letter (`a`-`z`) -- 2-63 characters long -- Allowed characters: lowercase letters, digits, hyphens, and temporarily `@`, `.`, `+`, `_` -- No consecutive hyphens (`--`) -- Cannot end with a hyphen - -**Example valid names:** `my-model`, `llama-3.2-3b`, `test-config-v1` diff --git a/docs/_snippets/nvidia-build-model-provider.md b/docs/_snippets/nvidia-build-model-provider.md deleted file mode 100644 index ed54b6e50d..0000000000 --- a/docs/_snippets/nvidia-build-model-provider.md +++ /dev/null @@ -1,8 +0,0 @@ -!!! note - The platform pre-configures a `system/nvidia-build` model provider during startup. - This provider routes inference requests to models hosted on `build.nvidia.com` using the API base URL `https://integrate.api.nvidia.com` - and the NGC API key with `Public API Endpoints` permissions provided during deployment (automatically saved as the built-in `system/ngc-api-key` secret). - - You can verify this provider exists by running `nemo inference providers list --workspace system`. - - The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead. diff --git a/docs/_snippets/tutorials/cli-sdk-setup.md b/docs/_snippets/tutorials/cli-sdk-setup.md deleted file mode 100644 index f59a1bedd4..0000000000 --- a/docs/_snippets/tutorials/cli-sdk-setup.md +++ /dev/null @@ -1,19 +0,0 @@ - -=== "CLI" - - ```bash - # Configure CLI (if not already done) - nemo config set --base-url "$NMP_BASE_URL" --workspace default - ``` - -=== "Python SDK" - - ```python - import os - from nemo_platform import NeMoPlatform - - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace="default", - ) - ``` diff --git a/docs/_snippets/tutorials/prereqs.md b/docs/_snippets/tutorials/prereqs.md deleted file mode 100644 index 2d8ee45194..0000000000 --- a/docs/_snippets/tutorials/prereqs.md +++ /dev/null @@ -1,12 +0,0 @@ -# Platform Prerequisites - -??? "New to using {{platform_name}}?" - :icon: info - - All platform resources—models, datasets, and more—must belong to a **workspace**. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use **projects** to group related resources. - - **If you're new to the platform**, start with the **[Setup guide](../../get-started/setup.md)** to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end. - - **If you're already familiar** with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial. - - For more information, see [Workspaces](../../get-started/concepts/workspaces.md) and [Projects](../../get-started/concepts/projects.md). diff --git a/docs/about/release-notes/current-release.mdx b/docs/about/release-notes/current-release.mdx index 592a60694d..ccf9c7ff0f 100644 --- a/docs/about/release-notes/current-release.mdx +++ b/docs/about/release-notes/current-release.mdx @@ -1,6 +1,8 @@ -# v{{ release }} — 2026-05-20 - -First public release of {{platform_name}}. This release establishes the +--- +title: "Current Release" +description: "" +--- +First public release of NeMo Platform. This release establishes the local-first OSS distribution and the core agent build, optimize, and secure flow. @@ -61,11 +63,11 @@ flow. - Workspace browser, agent list, and chat playground for deployed agents and models. The Studio agent build, optimize, and secure flow is in progress - and not part of {{ release }}. + and not part of 0.1.0. ### Coding-agent skills -- `nemo skills install --agent ` installs +- `nemo skills install --agent <claude|codex|cursor|opencode>` installs platform skills for the supported coding agents. - Agent-flow skills: `nemo-setup`, `nemo-explore`, `nemo-spec`, `nemo-build-agent`, `nemo-try-agent`, `agents-optimize`, `agents-secure`, @@ -86,7 +88,7 @@ make bootstrap nemo setup ``` -See [Setup](../../get-started/setup.md) for prerequisites and provider +See [Setup](/get-started/setup) for prerequisites and provider configuration. ## Compatibility @@ -99,7 +101,7 @@ configuration. ## Current constraints - **Local-only.** Cluster and scale-out deployment are on the roadmap for - enterprise use; {{ release }} supports a single local install. + enterprise use; 0.1.0 supports a single local install. - **Studio agent flow.** Studio ships with chat and workspace browsing today. The agent build, optimize, and security flows in Studio are in progress. diff --git a/docs/about/release-notes/index.mdx b/docs/about/release-notes/index.mdx index 8dcc771684..0e12af47d8 100644 --- a/docs/about/release-notes/index.mdx +++ b/docs/about/release-notes/index.mdx @@ -1,5 +1,7 @@ -# Release Notes for {{platform_name}} +--- +title: "Overview" +description: "" +--- +Check out the latest release notes for NeMo Platform. -Check out the latest release notes for {{platform_name}}. - -- [Release {{ release }}](current-release.md) +- [Release 0.1.0](/reference/release-notes/current-release) diff --git a/docs/acknowledgements/index.mdx b/docs/acknowledgements/index.mdx index b66eaa2474..50f2c6d2db 100644 --- a/docs/acknowledgements/index.mdx +++ b/docs/acknowledgements/index.mdx @@ -1,5 +1,7 @@ -# OSS License Acknowledgements - -The NVIDIA {{platform_name}} source repository includes open source license information in the [`LICENSES/`](https://github.com/NVIDIA-NeMo/nemo-platform/tree/main/LICENSES) directory. +--- +title: "Acknowledgements" +description: "" +--- +The NVIDIA NeMo Platform source repository includes open source license information in the [`LICENSES/`](https://github.com/NVIDIA-NeMo/nemo-platform/tree/main/LICENSES) directory. Review the license files in that directory, together with any license or acknowledgement files included in NVIDIA-distributed containers, model artifacts, and other separately distributed materials. Those artifacts may include additional components or terms that are not part of the source repository. diff --git a/docs/agents/index.mdx b/docs/agents/index.mdx index 7108a240c4..2277ca1ff3 100644 --- a/docs/agents/index.mdx +++ b/docs/agents/index.mdx @@ -1,8 +1,10 @@ -# About Agents - +--- +title: "About" +description: "" +--- -An agent on {{platform_name}} is a workflow that calls tools, talks to models +An agent on NeMo Platform is a workflow that calls tools, talks to models through the local platform, and runs as a managed service you can deploy, invoke, evaluate, and optimize as a unit. Agents are defined as NeMo Agent Toolkit (NAT) workflows and are managed through the `nemo agents` command @@ -14,7 +16,7 @@ NVIDIA NeMo Agent Toolkit is a flexible, lightweight, and unifying library that allows you to easily connect existing enterprise agents to data sources and tools across any framework. -{{platform_name}} uses NAT as the runtime wrapper around your agent so the +NeMo Platform uses NAT as the runtime wrapper around your agent so the platform can deploy it, evaluate it, optimize it, and route its model traffic through shared infrastructure. For the toolkit itself, see the [NeMo Agent Toolkit documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/). @@ -53,14 +55,14 @@ workflow: llm_name: llm ``` -The [Inference Gateway](../run-inference/about.md) is the local platform's +The [Inference Gateway](/models-and-inference/about) is the local platform's model proxy. Agents send model requests to it instead of to provider APIs directly, so the platform can resolve model names, attach credentials, and route through middleware on the agent's behalf. Two conventions apply when a config targets a deployed agent: - **Model names use the Inference Gateway entity form**, with slashes and dots converted to hyphens (`nvidia/nemotron-3-nano-30b-a3b` becomes `nvidia-nemotron-3-nano-30b-a3b`). The Inference Gateway resolves the entity to the upstream provider that owns it. -- **Leave `base_url` and `api_key` unset on `openai` and `nim` LLMs.** {{platform_name}} injects an Inference Gateway URL when it deploys the agent, and the gateway retrieves upstream credentials from the secrets service. Setting `base_url` explicitly bypasses both. +- **Leave `base_url` and `api_key` unset on `openai` and `nim` LLMs.** NeMo Platform injects an Inference Gateway URL when it deploys the agent, and the gateway retrieves upstream credentials from the secrets service. Setting `base_url` explicitly bypasses both. ## Agent Lifecycle @@ -68,15 +70,15 @@ Agents are managed end-to-end through the `nemo agents` command group: | Stage | Command | What it does | |-------|---------|--------------| -| Register | `nemo agents create --name --agent-config ` | Store the workflow YAML as an `agent` entity in a workspace. | -| Deploy | `nemo agents deploy --agent ` | Start a running service from the stored config. | -| Wait | `nemo agents deployments wait --agent ` | Block until the deployment is `running` or `failed`. | -| Invoke | `nemo agents invoke --agent --input "..."` or `nemo agents invoke --agent-config --input "..."` | Send a single request through the Agents gateway or run a local config directly. | -| Evaluate | `nemo agents evaluate run --eval-config --agent ` | Run a NAT evaluation against the deployed agent. | -| Optimize | `nemo agents optimize run --optimize-config --agent ` | Run NAT parameter or prompt tuning trials against the agent's stored config. | -| Tear down | `nemo agents undeploy --agent ` then `nemo agents delete ` | Stop the running service and remove the agent entity. | +| Register | `nemo agents create --name <name> --agent-config <path>` | Store the workflow YAML as an `agent` entity in a workspace. | +| Deploy | `nemo agents deploy --agent <name>` | Start a running service from the stored config. | +| Wait | `nemo agents deployments wait --agent <name>` | Block until the deployment is `running` or `failed`. | +| Invoke | `nemo agents invoke --agent <name> --input "..."` or `nemo agents invoke --agent-config <path> --input "..."` | Send a single request through the Agents gateway or run a local config directly. | +| Evaluate | `nemo agents evaluate run --eval-config <path> --agent <name>` | Run a NAT evaluation against the deployed agent. | +| Optimize | `nemo agents optimize run --optimize-config <path> --agent <name>` | Run NAT parameter or prompt tuning trials against the agent's stored config. | +| Tear down | `nemo agents undeploy --agent <name>` then `nemo agents delete <name>` | Stop the running service and remove the agent entity. | -To run a workflow YAML directly without registering it on the platform, pass `--agent-config ` to `nemo agents invoke` or `nemo agents run`. +To run a workflow YAML directly without registering it on the platform, pass `--agent-config <path>` to `nemo agents invoke` or `nemo agents run`. ## How It Works @@ -84,13 +86,13 @@ A platform-managed agent consists of three components: 1. **The agent entity.** `nemo agents create` stores the workflow YAML as an entity in the workspace. The same configuration can be redeployed, evaluated, or optimized without re-registering it. 1. **The deployment controller.** `nemo agents deploy` passes the stored config to the Agents service controller, which launches a `nat start fastapi` process for it, assigns a port, watches its health, and tears it down on `nemo agents undeploy`. -1. **The Agents gateway.** Client requests reach the agent at `/apis/agents/v2/workspaces//agents//-/`. The gateway resolves the agent to its current running deployment and proxies the request, including streaming responses. From a client's perspective, the agent is an OpenAI-compatible endpoint owned by {{platform_name}}. +1. **The Agents gateway.** Client requests reach the agent at `/apis/agents/v2/workspaces/<workspace>/agents/<name>/-/<path>`. The gateway resolves the agent to its current running deployment and proxies the request, including streaming responses. From a client's perspective, the agent is an OpenAI-compatible endpoint owned by NeMo Platform. Model traffic from inside the agent process routes back through the Inference Gateway, which resolves model entity names to upstream providers and supplies their credentials. This is why agent configs do not carry `base_url` or `api_key` values — the deployment injects the gateway URL automatically, and the gateway looks up the rest. A virtual model is a platform-managed wrapper around one or more backend models. An agent can point at a virtual model entity while the virtual model -handles routing, format translation, and [guardrails](../guardrails/index.md) +handles routing, format translation, and [guardrails](/guardrails) behind the scenes — no changes to the agent's workflow YAML required. ### Applying Changes Through Candidate Agents @@ -108,11 +110,11 @@ undeploying the candidate. ## Common Tasks -- [Optimize Agents](optimization.md): analyze deployed agents for model routing, +- [Optimize Agents](/agents/optimize-agents): analyze deployed agents for model routing, skill, prompt, and new-model opportunities. -- [Secure Agents](security.md): check guardrail coverage and scan recent +- [Secure Agents](/agents/secure-agents): check guardrail coverage and scan recent telemetry for sensitive data. -- [Plugins and Skills](plugins.md): understand how agent, middleware, and +- [Plugins and Skills](/agents/plugins-and-skills): understand how agent, middleware, and coding-agent integrations extend the local platform. -- [Agentic Metrics](../evaluator/metrics/agentic.md): evaluate tool use, goal completion, topic adherence, answer accuracy, and trajectories. -- [Agent Configuration](../evaluator/metrics/agent-configuration.md): use agents as online evaluation targets. +- [Agentic Metrics](/evaluation/metrics/agentic-metrics): evaluate tool use, goal completion, topic adherence, answer accuracy, and trajectories. +- [Agent Configuration](/evaluation/metrics/agent-configuration): use agents as online evaluation targets. diff --git a/docs/agents/optimization.mdx b/docs/agents/optimization.mdx index a08f90ae6c..2fcda4f298 100644 --- a/docs/agents/optimization.mdx +++ b/docs/agents/optimization.mdx @@ -1,5 +1,7 @@ -# Optimize Agents - +--- +title: "Optimize Agents" +description: "" +--- Use the Agent Optimizer to analyze a deployed agent and act on improvement @@ -27,7 +29,7 @@ Optimizer state is stored in the `nemo-agent-optimizer` fileset: - `optimizer_snapshot.json`: model and agent names from the latest run. Security-oriented suggestions such as missing guardrails, PII exposure, or -leaked secrets are covered in [Secure Agents](security.md). +leaked secrets are covered in [Secure Agents](/agents/secure-agents). ## Prerequisites @@ -75,157 +77,155 @@ smaller compatible model. Switchyard is the inference middleware that lets a virtual model split traffic across multiple backend models. The common optimization pattern is to create a -[virtual model](../run-inference/about.md) with a strong model and a weaker, +[virtual model](/models-and-inference/about) with a strong model and a weaker, cheaper model, then evaluate whether the route split preserves application quality. Run `nemo models list` first and replace the placeholders below with model entity names from your workspace that use the `OPENAI_CHAT` backend format. -=== "CLI" - - The command below creates a virtual model that sends 80% of traffic to the - strong model and 20% to the weak one. - - ```bash - nemo inference virtual-models create routed-agent-model \ - --workspace default \ - --models '[ - {"model":"default/","backend_format":"OPENAI_CHAT"}, - {"model":"default/","backend_format":"OPENAI_CHAT"} - ]' \ - --request-middleware '[{ - "name":"nemo-switchyard", - "config_type":"random_routing", - "config":{ - "strong":{"model":"default/"}, - "weak":{"model":"default/"}, - "strong_probability":0.8, - "enable_stats":false - } - }]' - ``` - - Before wiring the virtual model to an agent, smoke-test the route by - making several minimal chat-completions calls and checking the returned - model name. The observed split should roughly match `strong_probability`. - -=== "Skill" - - Ask your coding agent: - - > Optimize my deployed agent. - - The `agents-optimize` skill picks a deployed agent, establishes an - evaluation baseline, runs the analysis steps below, and surfaces - suggestions for you to apply. - - Verify the skill is installed: - - ```bash - nemo skills show agents-optimize - ``` - - What it does under the hood: - - - Lists deployed agents and prompts you to choose one. - - Inspects the agent's `llms[*].model_name` and looks for cheaper compatible - models in the workspace catalog. - - Creates a Switchyard `random_routing` virtual model with an 80% strong / - 20% weak split and smoke-tests the route before wiring it to a sibling - agent. - - Suggests skill optimization, prompt tuning, and new-model evaluations - where the agent qualifies. - - Persists suggestions to the `nemo-agent-optimizer` fileset. - -=== "Python SDK" - - ```python - import os - from nemo_platform import NeMoPlatform - - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace="default", - ) - - client.inference.virtual_models.create( - name="routed-agent-model", - workspace="default", - models=[ - {"model": "default/", "backend_format": "OPENAI_CHAT"}, - {"model": "default/", "backend_format": "OPENAI_CHAT"}, - ], - request_middleware=[{ - "name": "nemo-switchyard", - "config_type": "random_routing", - "config": { - "strong": {"model": "default/"}, - "weak": {"model": "default/"}, - "strong_probability": 0.8, - "enable_stats": False, - }, - }], - ) - ``` - -## Optimize Skills + + +The command below creates a virtual model that sends 80% of traffic to the +strong model and 20% to the weak one. -Skill optimization applies when the agent depends on local skill files and has -an evaluation suite. The loop runs evaluations, analyzes failures, lets the -coding agent edit only the configured skills directory, reruns verification, -and keeps the change only when the evaluation result improves. - -=== "CLI" - - ```bash - nemo agents optimize-skills run --spec-file .agent-improver.yml - ``` +```bash +nemo inference virtual-models create routed-agent-model \ + --workspace default \ + --models '[ + {"model":"default/","backend_format":"OPENAI_CHAT"}, + {"model":"default/","backend_format":"OPENAI_CHAT"} + ]' \ + --request-middleware '[{ + "name":"nemo-switchyard", + "config_type":"random_routing", + "config":{ + "strong":{"model":"default/"}, + "weak":{"model":"default/"}, + "strong_probability":0.8, + "enable_stats":false + } + }]' +``` - Set `open_pr: true` in the YAML when you want the loop to prepare a - reviewable branch. +Before wiring the virtual model to an agent, smoke-test the route by +making several minimal chat-completions calls and checking the returned +model name. The observed split should roughly match `strong_probability`. + + +Ask your coding agent: - A sample `.agent-improver.yml` is in - `plugins/nemo-agents/examples/agent-improver.example.yml`. +> Optimize my deployed agent. -=== "Skill" +The `agents-optimize` skill picks a deployed agent, establishes an +evaluation baseline, runs the analysis steps below, and surfaces +suggestions for you to apply. - Ask your coding agent: +Verify the skill is installed: - > Optimize the skills used by my agent and keep the changes that improve evaluation scores. +```bash +nemo skills show agents-optimize +``` - The `agents-optimize` skill drives the skill-optimization loop when the - selected agent has skills and an evaluation suite. Verify it is installed: +What it does under the hood: + +- Lists deployed agents and prompts you to choose one. +- Inspects the agent's `llms[*].model_name` and looks for cheaper compatible + models in the workspace catalog. +- Creates a Switchyard `random_routing` virtual model with an 80% strong / + 20% weak split and smoke-tests the route before wiring it to a sibling + agent. +- Suggests skill optimization, prompt tuning, and new-model evaluations + where the agent qualifies. +- Persists suggestions to the `nemo-agent-optimizer` fileset. + + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) + +client.inference.virtual_models.create( + name="routed-agent-model", + workspace="default", + models=[ + {"model": "default/", "backend_format": "OPENAI_CHAT"}, + {"model": "default/", "backend_format": "OPENAI_CHAT"}, + ], + request_middleware=[{ + "name": "nemo-switchyard", + "config_type": "random_routing", + "config": { + "strong": {"model": "default/"}, + "weak": {"model": "default/"}, + "strong_probability": 0.8, + "enable_stats": False, + }, + }], +) +``` + + +## Optimize Skills - ```bash - nemo skills show agents-optimize - ``` +Skill optimization applies when the agent depends on local skill files and has +an evaluation suite. The loop runs evaluations, analyzes failures, lets the +coding agent edit only the configured skills directory, reruns verification, +and keeps the change only when the evaluation result improves. - What it does under the hood: + + +```bash +nemo agents optimize-skills run --spec-file .agent-improver.yml +``` - - Confirms the agent uses skills (a `--skills-path`, a `.agent-improver.yml`, - or skill files referenced from the config). - - Runs `nemo agents optimize-skills` against the configured skills directory. - - Re-runs evaluation and keeps the change only when scores improve. - - Persists outcomes to the `nemo-agent-optimizer` fileset. +Set `open_pr: true` in the YAML when you want the loop to prepare a +reviewable branch. -=== "Python SDK" +A sample `.agent-improver.yml` is in +`plugins/nemo-agents/examples/agent-improver.example.yml`. + + +Ask your coding agent: - ```python - from nemo_agents_plugin.jobs.optimize_skills import OptimizeSkillsJob - from nemo_platform_plugin.scheduler import NemoJobScheduler +> Optimize the skills used by my agent and keep the changes that improve evaluation scores. - import yaml - from pathlib import Path +The `agents-optimize` skill drives the skill-optimization loop when the +selected agent has skills and an evaluation suite. Verify it is installed: - spec = yaml.safe_load(Path(".agent-improver.yml").read_text()) - NemoJobScheduler().run_local( - OptimizeSkillsJob, - spec, - workspace="default", - ) - ``` +```bash +nemo skills show agents-optimize +``` +What it does under the hood: + +- Confirms the agent uses skills (a `--skills-path`, a `.agent-improver.yml`, + or skill files referenced from the config). +- Runs `nemo agents optimize-skills` against the configured skills directory. +- Re-runs evaluation and keeps the change only when scores improve. +- Persists outcomes to the `nemo-agent-optimizer` fileset. + + +```python +from nemo_agents_plugin.jobs.optimize_skills import OptimizeSkillsJob +from nemo_platform_plugin.scheduler import NemoJobScheduler + +import yaml +from pathlib import Path + +spec = yaml.safe_load(Path(".agent-improver.yml").read_text()) +NemoJobScheduler().run_local( + OptimizeSkillsJob, + spec, + workspace="default", +) +``` + + ## Inspect Saved Results Use the Files service to inspect what the optimizer saved: @@ -258,66 +258,65 @@ YAML and want to run `nat optimize` through the Agents plugin. For the ReAct example: -=== "CLI" - - ```bash - nemo agents optimize run \ - --optimize-config plugins/nemo-agents/examples/react-agent/react-optimize.yml \ - --agent react-agent - ``` - -=== "Skill" - - Ask your coding agent: - - > Run prompt tuning on my deployed agent against this optimization config. - - The `agents-optimize` skill suggests `nemo agents optimize run` when the - agent has an optimization config and a baseline dataset. Verify it is - installed: - - ```bash - nemo skills show agents-optimize - ``` - - What it does under the hood: - - - Confirms the agent has a NAT optimization YAML. - - Runs `nemo agents optimize run` (or `submit` for platform jobs). - - Compares results against the evaluation baseline and surfaces deltas - for review. - -=== "Python SDK" - - ```python - import os - from pathlib import Path - - from nemo_agents_plugin.jobs.optimize_agent import OptimizeAgentJob - from nemo_platform import NeMoPlatform - from nemo_platform_plugin.scheduler import NemoJobScheduler + + +```bash +nemo agents optimize run \ + --optimize-config plugins/nemo-agents/examples/react-agent/react-optimize.yml \ + --agent react-agent +``` + + +Ask your coding agent: - WORKSPACE = "default" - optimize_config = Path("plugins/nemo-agents/examples/react-agent/react-optimize.yml") +> Run prompt tuning on my deployed agent against this optimization config. - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace=WORKSPACE, - ) +The `agents-optimize` skill suggests `nemo agents optimize run` when the +agent has an optimization config and a baseline dataset. Verify it is +installed: - result = NemoJobScheduler().run_local( - OptimizeAgentJob, - { - "optimize_config": str(optimize_config), - "agent": "react-agent", - "workspace": WORKSPACE, - }, - workspace=WORKSPACE, - sdk=client, - ) - print(result) - ``` +```bash +nemo skills show agents-optimize +``` +What it does under the hood: + +- Confirms the agent has a NAT optimization YAML. +- Runs `nemo agents optimize run` (or `submit` for platform jobs). +- Compares results against the evaluation baseline and surfaces deltas + for review. + + +```python +import os +from pathlib import Path + +from nemo_agents_plugin.jobs.optimize_agent import OptimizeAgentJob +from nemo_platform import NeMoPlatform +from nemo_platform_plugin.scheduler import NemoJobScheduler + +WORKSPACE = "default" +optimize_config = Path("plugins/nemo-agents/examples/react-agent/react-optimize.yml") + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace=WORKSPACE, +) + +result = NemoJobScheduler().run_local( + OptimizeAgentJob, + { + "optimize_config": str(optimize_config), + "agent": "react-agent", + "workspace": WORKSPACE, + }, + workspace=WORKSPACE, + sdk=client, +) +print(result) +``` + + When `--agent` is a platform-managed agent name, the job fetches the stored agent config, merges it with the optimization config, injects the Inference Gateway URL, and runs trials locally. When `--agent` is a raw HTTP endpoint, @@ -328,12 +327,12 @@ do not change the remote agent behavior. **No suggestions appear.** Confirm the workspace has agents, model entities, and a model catalog entry smaller than the agent's current model. New-model suggestions require a previous optimizer snapshot, so they do not appear on the first run. -**The model evaluation fails.** Confirm the judge model in the eval config is available through the workspace Inference Gateway. You can replace the eval files in `-eval` with your own evaluation config and dataset. +**The model evaluation fails.** Confirm the judge model in the eval config is available through the workspace Inference Gateway. You can replace the eval files in `<agent-name>-eval` with your own evaluation config and dataset. **Data safety suggestions do not appear.** Telemetry is optional. The optimizer only scans `nemo-agent-telemetry` when that fileset exists and contains JSONL trace files. ## Next steps -- [Agent overview](index.md): review how platform-managed agents are registered, deployed, invoked, evaluated, and optimized. -- [Agent evaluation](../evaluator/metrics/agent-configuration.md): configure agents as online evaluation targets and choose the right agent response mapping. -- [CLI reference](../cli/reference.md): look up complete command options and global CLI flags for scripted workflows. +- [Agent overview](/agents): review how platform-managed agents are registered, deployed, invoked, evaluated, and optimized. +- [Agent evaluation](/evaluation/metrics/agent-configuration): configure agents as online evaluation targets and choose the right agent response mapping. +- [CLI reference](/reference/cli-reference/full-cli-reference): look up complete command options and global CLI flags for scripted workflows. diff --git a/docs/agents/plugins.mdx b/docs/agents/plugins.mdx index 299f4727c1..780ebbd83e 100644 --- a/docs/agents/plugins.mdx +++ b/docs/agents/plugins.mdx @@ -1,8 +1,10 @@ -# Plugins and Skills - +--- +title: "Plugins and Skills" +description: "" +--- -{{platform_name}} {{ release }} is extended through Python plugins and +NeMo Platform 0.1.0 is extended through Python plugins and coding-agent skills. Plugins add runtime capabilities to the local platform. Skills teach your coding agent how to operate those capabilities on your behalf. @@ -13,8 +15,8 @@ can contribute one or more surfaces: | Surface | Entry-point group | What it adds | |---------|-------------------|--------------| -| HTTP service | `nemo.services` | FastAPI routers mounted under `/apis//...` | -| CLI | `nemo.cli` | `nemo ...` commands | +| HTTP service | `nemo.services` | FastAPI routers mounted under `/apis/<name>/...` | +| CLI | `nemo.cli` | `nemo <plugin> ...` commands | | Job | `nemo.jobs` | Local or submitted jobs with generated `run`, `submit`, and `explain` verbs | | Controller | `nemo.controllers` | Background reconciliation loops | | Inference middleware | `nemo.inference_middleware` | Virtual-model request or response middleware | @@ -60,13 +62,13 @@ The skills that drive the agent lifecycle are: |----------|---------| | `nemo-setup` | Verifies the platform is installed and running. Walks the user through `make bootstrap`, `nemo services run`, the local data directory (`NMP_DATA_DIR` / `XDG_DATA_HOME`), and DB reset. Use this when the user asks to set up the platform or hits a startup issue. | | `nemo-explore` | Design conversation that feeds into an agent spec. Use before scaffolding an agent. | -| `nemo-spec` | Writes an agent spec at `agents/.spec.md` from the explore output. | +| `nemo-spec` | Writes an agent spec at `agents/<name>.spec.md` from the explore output. | | `nemo-build-agent` | Scaffolds a NAT workflow YAML from the spec and deploys it. | | `nemo-try-agent` | Sends a query to a deployed agent. | | `nemo-status` | Read-only platform health dashboard. | | `nemo-teardown` | Guided shutdown with confirmation. | -| `agents-optimize` | Selects a deployed agent, establishes an evaluation baseline, and suggests Switchyard routing, model swaps, skill optimization, prompt tuning, and new-model evaluations. See [Optimize Agents](optimization.md). | -| `agents-secure` | Selects a deployed agent, checks guardrail coverage, and scans recent telemetry for sensitive data. See [Secure Agents](security.md). | +| `agents-optimize` | Selects a deployed agent, establishes an evaluation baseline, and suggests Switchyard routing, model swaps, skill optimization, prompt tuning, and new-model evaluations. See [Optimize Agents](/agents/optimize-agents). | +| `agents-secure` | Selects a deployed agent, checks guardrail coverage, and scans recent telemetry for sensitive data. See [Secure Agents](/agents/secure-agents). | Plugin-owned skills cover guardrails, evaluations, optimization, data designer, anonymizer, and auditor. They are installed with their plugin and @@ -92,8 +94,8 @@ Common middleware patterns: - **Switchyard translation** for cross-format model providers. - **Guardrails middleware** for request and response checks. -For command details, see the [Inference Gateway API Reference](../api/index.md#tag-inference-gateway) -and the [CLI reference](../cli/reference.md). +For command details, see the [Inference Gateway API Reference](/reference/api-reference#tag-inference-gateway) +and the [CLI reference](/reference/cli-reference/full-cli-reference). ## Writing Plugins @@ -101,5 +103,5 @@ Plugin authors use the `nemo-platform-plugin` package. A minimal plugin declares points in `pyproject.toml`, implements the matching class, and restarts `nemo services run` so the local platform discovers it. -Keep plugins local-first for OSS {{ release }}: avoid requiring a cluster, +Keep plugins local-first for OSS 0.1.0: avoid requiring a cluster, container runtime, or external control plane for the documented launch path. diff --git a/docs/agents/security.mdx b/docs/agents/security.mdx index c6a59ebeba..bd71e7c595 100644 --- a/docs/agents/security.mdx +++ b/docs/agents/security.mdx @@ -1,5 +1,7 @@ -# Secure Agents - +--- +title: "Secure Agents" +description: "" +--- Use the agent security workflow to check a deployed agent for guardrail @@ -47,94 +49,93 @@ with `nemo models list`): - `nvidia-llama-3-1-nemotron-safety-guard-8b-v3` For how to create the guardrail config that the virtual model references, see -the [Guardrails documentation](../guardrails/index.md). +the [Guardrails documentation](/guardrails). There are two steps: create the guarded VirtualModel, then update the agent's `llms` block to reference it. ### 1. Create a Guarded VirtualModel -=== "CLI" - - ```bash - nemo inference virtual-models create guarded-agent-model \ - --workspace default \ - --models '[{"model":"default/","backend_format":"OPENAI_CHAT"}]' \ - --request-middleware '[{ - "name":"nemo-guardrails", - "config_type":"guardrail_config", - "config_id":"default/" - }]' \ - --response-middleware '[{ - "name":"nemo-guardrails", - "config_type":"guardrail_config", - "config_id":"default/" - }]' - ``` - - Wire the same `` on both `--request-middleware` (for - input rails) and `--response-middleware` (for output rails). Omit a side - if the config defines no flows for it. For the full middleware schema, - entity-backed vs inline configs, and caching behavior, refer to - [Guardrails Architecture](../guardrails/concepts/architecture.md). - -=== "Skill" - - Ask your coding agent: - - > Secure my deployed agent. - - The `agents-secure` skill picks a deployed agent, checks guardrail - coverage, samples recent telemetry, and writes suggestions to the - `nemo-agent-security` fileset. - - Verify the skill is installed: - - ```bash - nemo skills show agents-secure - ``` - - What it does under the hood: - - - Lists deployed agents and prompts you to choose one. - - Inspects each LLM's `base_url`. If it does not route through a - guardrails virtual model, suggests creating one with a content-safety, - topic-control, or safety-guard backend. - - Names the recommended guardrails catalog model and walks you through - creating the guarded virtual model. - - Persists suggestions to the `nemo-agent-security` fileset. - -=== "Python SDK" - - ```python - import os - from nemo_platform import NeMoPlatform - - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace="default", - ) - - guardrail_mw = { - "name": "nemo-guardrails", - "config_type": "guardrail_config", - "config_id": "default/", - } - - client.inference.virtual_models.create( - name="guarded-agent-model", - workspace="default", - models=[{"model": "default/", "backend_format": "OPENAI_CHAT"}], - request_middleware=[guardrail_mw], - response_middleware=[guardrail_mw], - ) - ``` + + +```bash +nemo inference virtual-models create guarded-agent-model \ + --workspace default \ + --models '[{"model":"default/","backend_format":"OPENAI_CHAT"}]' \ + --request-middleware '[{ + "name":"nemo-guardrails", + "config_type":"guardrail_config", + "config_id":"default/" + }]' \ + --response-middleware '[{ + "name":"nemo-guardrails", + "config_type":"guardrail_config", + "config_id":"default/" + }]' +``` + +Wire the same `<guardrail-config>` on both `--request-middleware` (for +input rails) and `--response-middleware` (for output rails). Omit a side +if the config defines no flows for it. For the full middleware schema, +entity-backed vs inline configs, and caching behavior, refer to +[Guardrails Architecture](/guardrails/core-concepts/architecture). + + +Ask your coding agent: + +> Secure my deployed agent. + +The `agents-secure` skill picks a deployed agent, checks guardrail +coverage, samples recent telemetry, and writes suggestions to the +`nemo-agent-security` fileset. + +Verify the skill is installed: +```bash +nemo skills show agents-secure +``` + +What it does under the hood: + +- Lists deployed agents and prompts you to choose one. +- Inspects each LLM's `base_url`. If it does not route through a + guardrails virtual model, suggests creating one with a content-safety, + topic-control, or safety-guard backend. +- Names the recommended guardrails catalog model and walks you through + creating the guarded virtual model. +- Persists suggestions to the `nemo-agent-security` fileset. + + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) + +guardrail_mw = { + "name": "nemo-guardrails", + "config_type": "guardrail_config", + "config_id": "default/", +} + +client.inference.virtual_models.create( + name="guarded-agent-model", + workspace="default", + models=[{"model": "default/", "backend_format": "OPENAI_CHAT"}], + request_middleware=[guardrail_mw], + response_middleware=[guardrail_mw], +) +``` + + ### 2. Point the Agent at the Guarded VirtualModel In the agent's workflow YAML, set `model_name` on the relevant `llms` entry to the guarded VirtualModel's entity reference, with slashes converted to hyphens -(per the [agent configuration conventions](index.md#agent-definition)): +(per the [agent configuration conventions](/agents#agent-definition)): ```yaml llms: @@ -149,7 +150,7 @@ unchanged and unaware of the rails. For the end-to-end request flow, streaming behavior, header forwarding, and the `guardrails` request options, refer to -[Running Inference with Guardrails](../guardrails/concepts/inference.md). +[Running Inference with Guardrails](/guardrails/core-concepts/running-inference). Redeploy the agent, re-run evaluation, and compare quality, cost, latency, and safety signals against the baseline before promoting. @@ -166,55 +167,54 @@ For leaked secrets, rotate the credential before doing any other cleanup. For PII in trace data, redact or regenerate affected traces before using them for evaluation or optimization. -=== "CLI" - - ```bash - nemo files list nemo-agent-telemetry - ``` - - Inspect the most recent trace files manually, or use the security skill - to do this for you. Cap the data you download — telemetry can be large. - -=== "Skill" - - Ask your coding agent: - - > Scan recent telemetry for my agent for PII and leaked secrets. - - The `agents-secure` skill caps downloaded telemetry at 1 GB and walks the - most recent traces first. Verify it is installed: - - ```bash - nemo skills show agents-secure - ``` - - What it does under the hood: - - - Samples recent files from `nemo-agent-telemetry` until the 1 GB cap. - - Runs a high-confidence regex pass for PII and leaked credentials. - - Writes findings with masked previews and follow-up actions to - `nemo-agent-security/security_suggestions.jsonl`. - - Optionally suggests a higher-recall GLiNER or NemoGuard scan on a - subset of traces when regex coverage is not enough. - -=== "Python SDK" + + +```bash +nemo files list nemo-agent-telemetry +``` - The skill drives this workflow directly. To inspect what was written, - list and download files from the security fileset: +Inspect the most recent trace files manually, or use the security skill +to do this for you. Cap the data you download — telemetry can be large. + + +Ask your coding agent: - ```python - import os - from nemo_platform import NeMoPlatform +> Scan recent telemetry for my agent for PII and leaked secrets. - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace="default", - ) +The `agents-secure` skill caps downloaded telemetry at 1 GB and walks the +most recent traces first. Verify it is installed: - for f in client.files.list(fileset="nemo-agent-security"): - print(f.remote_path) - ``` +```bash +nemo skills show agents-secure +``` +What it does under the hood: + +- Samples recent files from `nemo-agent-telemetry` until the 1 GB cap. +- Runs a high-confidence regex pass for PII and leaked credentials. +- Writes findings with masked previews and follow-up actions to + `nemo-agent-security/security_suggestions.jsonl`. +- Optionally suggests a higher-recall GLiNER or NemoGuard scan on a + subset of traces when regex coverage is not enough. + + +The skill drives this workflow directly. To inspect what was written, +list and download files from the security fileset: + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) + +for f in client.files.list(fileset="nemo-agent-security"): + print(f.remote_path) +``` + + ## Review Results Use the Files service to inspect saved suggestions: @@ -233,15 +233,15 @@ nemo files download nemo-agent-security \ **No data safety findings appear.** Data safety scans require telemetry. Confirm the `nemo-agent-telemetry` fileset exists with `nemo files list nemo-agent-telemetry`. If the fileset is empty, the agent is not exporting traces — verify the agent uses the `nemo_files` telemetry exporter and that recent invocations have completed. -**The `agents-secure` skill is not available.** Run `nemo skills list` to confirm the skill is installed. If it is missing, install it with `nemo skills install --agent `. +**The `agents-secure` skill is not available.** Run `nemo skills list` to confirm the skill is installed. If it is missing, install it with `nemo skills install --agent <claude|codex|cursor|opencode>`. -**Guardrail virtual model creation fails with an unknown model.** Confirm the backend model entity exists with `nemo models list`. The `` and `` placeholders must reference entities the workspace can resolve. +**Guardrail virtual model creation fails with an unknown model.** Confirm the backend model entity exists with `nemo models list`. The `<main-model-entity>` and `<guardrail-config>` placeholders must reference entities the workspace can resolve. ## Next Steps -- [Optimize Agents](optimization.md): reduce cost and improve quality after +- [Optimize Agents](/agents/optimize-agents): reduce cost and improve quality after security coverage is in place. -- [Guardrails](../guardrails/index.md): create and manage guardrail +- [Guardrails](/guardrails): create and manage guardrail configurations. -- [Models and Inference](../run-inference/about.md): manage model providers, +- [Models and Inference](/models-and-inference/about): manage model providers, model entities, and virtual models. diff --git a/docs/anonymizer/cli.mdx b/docs/anonymizer/cli.mdx index 73855f1e3a..8f762306a0 100644 --- a/docs/anonymizer/cli.mdx +++ b/docs/anonymizer/cli.mdx @@ -1,7 +1,11 @@ +--- +title: "CLI Reference" +description: "" +--- # CLI Reference -This reference covers the `nemo anonymizer` commands exposed by the Anonymizer plugin. For end-to-end walkthroughs, see the [tutorials](tutorials/index.md). +This reference covers the `nemo anonymizer` commands exposed by the Anonymizer plugin. For end-to-end walkthroughs, see the [tutorials](/anonymize-data/tutorials/overview). ## Command Surface @@ -11,7 +15,7 @@ This reference covers the `nemo anonymizer` commands exposed by the Anonymizer p | `nemo anonymizer preview run` | Generated from `NemoFunction` | Local streaming preview. | | `nemo anonymizer preview submit` | Generated from `NemoFunction` | Remote streaming preview against the plugin service. | | `nemo anonymizer run run` | Generated from `NemoJob` | Local job execution in the CLI process. | -| `nemo anonymizer run submit` | Generated from `NemoJob` | Submit an `anonymizer.run` job to the {{platform_name}} Jobs worker. | +| `nemo anonymizer run submit` | Generated from `NemoJob` | Submit an `anonymizer.run` job to the NeMo Platform Jobs worker. | | `nemo anonymizer run explain` | Generated from `NemoJob` | Print the job key, submit endpoint, and JSON schemas. | ## `nemo anonymizer validate` @@ -49,7 +53,7 @@ nemo anonymizer preview submit \ | Flag | Description | |------------------|---------------------------------------------------------------------------------------------------| | `--spec-file` | Path to the `PreviewRequest` YAML. | -| `--workspace` | {{platform_name}} workspace. Used to resolve fileset references and Inference Gateway providers. | +| `--workspace` | NeMo Platform workspace. Used to resolve fileset references and Inference Gateway providers. | | `--base-url` | Override the platform base URL (typically auto-populated from the CLI config). | ### Preview source kinds @@ -70,7 +74,7 @@ nemo anonymizer preview run --spec-file /tmp/anonymizer-preview.yaml > /tmp/prev jq -R 'fromjson? | select(.kind == "preview_dataset") | .records' /tmp/preview.ndjson ``` -Frame kinds: `log`, `preview_dataset`, `trace_dataset`, `failed_records`, `heartbeat`, `done`, `error`. See the [preview tutorial](tutorials/preview.md) for details. +Frame kinds: `log`, `preview_dataset`, `trace_dataset`, `failed_records`, `heartbeat`, `done`, `error`. See the [preview tutorial](/anonymize-data/tutorials/preview-a-config) for details. ## `nemo anonymizer run` @@ -137,7 +141,7 @@ results = job.download_artifacts() dataset = results.load_dataset() ``` -See [SDK Resources](sdk-resources.md) for the full `AnonymizerJobResource` / `AnonymizerJobResults` surface. +See [SDK Resources](/anonymize-data/sdk-resources) for the full `AnonymizerJobResource` / `AnonymizerJobResults` surface. Compared to `run run`, `run submit` rejects local file paths in `data.source` (use a fileset reference or `http(s)` URL) and requires explicit `model_configs` because the job runs outside the CLI process. @@ -171,4 +175,4 @@ fileset:///# # ``` -The `#` fragment must resolve to a single `.csv` or `.parquet` file. The plugin downloads the file before constructing the Anonymizer library input. +The `#<path>` fragment must resolve to a single `.csv` or `.parquet` file. The plugin downloads the file before constructing the Anonymizer library input. diff --git a/docs/anonymizer/index.mdx b/docs/anonymizer/index.mdx index dfbd0c07e5..3d39864188 100644 --- a/docs/anonymizer/index.mdx +++ b/docs/anonymizer/index.mdx @@ -1,18 +1,23 @@ +--- +title: "About" +description: "" +--- # Anonymizer Service -The Anonymizer service detects personally identifiable information (PII) in text data on the {{platform_name}} and replaces or rewrites it. +The Anonymizer service detects personally identifiable information (PII) in text data on the NeMo Platform and replaces or rewrites it. ## Overview -The service wraps the open-source [NVIDIA NeMo Anonymizer library](https://github.com/NVIDIA-NeMo/Anonymizer) and exposes it through the {{platform_name}}'s Python SDK and CLI. The library still owns PII detection, replacement, rewrite, and config validation. The platform adds inference routing through the Inference Gateway, fileset-backed inputs, plugin-service execution for streaming preview, and a Jobs-worker path for full anonymization runs. +The service wraps the open-source [NVIDIA NeMo Anonymizer library](https://github.com/NVIDIA-NeMo/Anonymizer) and exposes it through the NeMo Platform's Python SDK and CLI. The library still owns PII detection, replacement, rewrite, and config validation. The platform adds inference routing through the Inference Gateway, fileset-backed inputs, plugin-service execution for streaming preview, and a Jobs-worker path for full anonymization runs. ## How It Works: Library + Platform The library defines **what** to anonymize and **how**. The platform decides **where the work runs** and **how models are reached**. -!!! note - The code snippets below are for conceptual demonstration purposes only. For runnable examples, see the [quickstart](quickstart.md) and [tutorials](tutorials/index.md). + +The code snippets below are for conceptual demonstration purposes only. For runnable examples, see the [quickstart](/anonymize-data/quickstart) and [tutorials](/anonymize-data/tutorials/overview). + ### 1. Build a config with the library @@ -33,7 +38,7 @@ config = AnonymizerConfig( ### 2. Execute on the platform -Submit the config to the Anonymizer service with the {{platform_name}} SDK: +Submit the config to the Anonymizer service with the NeMo Platform SDK: ```python from nemo_anonymizer_plugin.app.task_config import PreviewRequest @@ -67,15 +72,15 @@ The SDK equivalent of `run submit` is `sdk.anonymizer.run(request)`, which retur ## Key Differences from Standalone Library -When using Anonymizer as a {{platform_name}} service: +When using Anonymizer as a NeMo Platform service: -| Feature | Standalone Library | {{platform_name}} Service | +| Feature | Standalone Library | NeMo Platform Service | |-------------------|-----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------| | **Inference** | Direct calls to NVIDIA Build defaults | Routes through the Inference Gateway via `model_configs` | | **Execution** | Local Python process | Streaming preview runs in the plugin service; full runs execute either in the local CLI (`run run`) or on the Jobs worker (`run submit`) | -| **Input sources** | Local file, `http(s)` URL | Local file (`run run` only), `http(s)` URL, or {{platform_name}} Fileset | -| **Artifacts** | Local filesystem | Local artifact directory (`persistent/results/artifacts`) for `run run`; {{platform_name}} job artifact storage for `run submit` | -| **Authentication**| Direct API keys | {{platform_name}} Secrets service | +| **Input sources** | Local file, `http(s)` URL | Local file (`run run` only), `http(s)` URL, or NeMo Platform Fileset | +| **Artifacts** | Local filesystem | Local artifact directory (`persistent/results/artifacts`) for `run run`; NeMo Platform job artifact storage for `run submit` | +| **Authentication**| Direct API keys | NeMo Platform Secrets service | ## Replacement Strategies @@ -98,33 +103,33 @@ This package is a thin wrapper around the [NVIDIA NeMo Anonymizer library](https - A `nemo anonymizer` CLI with `validate`, `preview`, and `run` command groups. - An `sdk.anonymizer` SDK accessor (`AnonymizerResource`, `AsyncAnonymizerResource`). - A streaming `anonymizer.preview` function that emits `preview_dataset`, `trace_dataset`, and `failed_records` frames from the plugin service. -- An `anonymizer.run` job that writes `dataset.parquet`, `trace.parquet`, `metadata.json`, and optional `failed_records.json`. The job can execute in the local CLI process (`nemo anonymizer run run`) or on the {{platform_name}} Jobs worker (`nemo anonymizer run submit` / `sdk.anonymizer.run`). -- Fileset input handling (`fileset:///#`). +- An `anonymizer.run` job that writes `dataset.parquet`, `trace.parquet`, `metadata.json`, and optional `failed_records.json`. The job can execute in the local CLI process (`nemo anonymizer run run`) or on the NeMo Platform Jobs worker (`nemo anonymizer run submit` / `sdk.anonymizer.run`). +- Fileset input handling (`fileset://<workspace>/<fileset>#<path>`). - Inference Gateway routing for model providers referenced from `model_configs`. ## Next Steps
-- **[Quick Start](quickstart.md)** +- **[Quick Start](/anonymize-data/quickstart)** --- Install the plugin, configure inference, and run your first preview and job. -- **[Tutorials](tutorials/index.md)** +- **[Tutorials](/anonymize-data/tutorials/overview)** --- Walk through preview (`anonymizer.preview`) and job execution (`anonymizer.run`) end to end. -- **[SDK Resources](sdk-resources.md)** +- **[SDK Resources](/anonymize-data/sdk-resources)** --- Reference for the `anonymizer` SDK accessor, preview result, and job result objects. -- **[CLI Reference](cli.md)** +- **[CLI Reference](/anonymize-data/cli-reference)** --- diff --git a/docs/anonymizer/quickstart.mdx b/docs/anonymizer/quickstart.mdx index e98f563585..79f5e17a60 100644 --- a/docs/anonymizer/quickstart.mdx +++ b/docs/anonymizer/quickstart.mdx @@ -1,16 +1,20 @@ +--- +title: "Quickstart" +description: "" +--- # Quick Start -This guide walks through previewing and running an Anonymizer job on {{platform_name}}. +This guide walks through previewing and running an Anonymizer job on NeMo Platform. ## Prerequisites -- Access to a {{platform_name}} deployment with the `anonymizer` plugin service enabled. +- Access to a NeMo Platform deployment with the `anonymizer` plugin service enabled. - An API key for a model provider used by the Anonymizer pipeline. ## Step 1: Install the Plugin -Follow the [Setup guide](../get-started/setup.md) to install {{platform_name}} and complete `nemo setup`. From a repo checkout, run `uv sync` at the repo root; the root workspace includes the Anonymizer plugin, so no separate editable plugin install step is needed. `nemo services run` then picks up the plugin automatically and mounts `/apis/anonymizer/...` on the gateway. +Follow the [Setup guide](/get-started/setup) to install NeMo Platform and complete `nemo setup`. From a repo checkout, run `uv sync` at the repo root; the root workspace includes the Anonymizer plugin, so no separate editable plugin install step is needed. `nemo services run` then picks up the plugin automatically and mounts `/apis/anonymizer/...` on the gateway. Verify the CLI is registered: @@ -34,11 +38,19 @@ anonymizer = sdk.anonymizer ## Step 3: Configure Inference -Anonymizer routes inference through the [Inference Gateway service](../run-inference/about.md). You need a model provider configured before running anything that uses `model_configs`. +Anonymizer routes inference through the [Inference Gateway service](/models-and-inference/about). You need a model provider configured before running anything that uses `model_configs`. -`nemo setup` walks you through creating a provider secret and registering an Inference Gateway provider as part of the install flow. If you skipped that step or want to add another provider, re-run `nemo setup` — see the [Setup guide](../get-started/setup.md) for details. +`nemo setup` walks you through creating a provider secret and registering an Inference Gateway provider as part of the install flow. If you skipped that step or want to add another provider, re-run `nemo setup` — see the [Setup guide](/get-started/setup) for details. ---8<-- "_snippets/nvidia-build-model-provider.md" + +The platform pre-configures a `system/nvidia-build` model provider during startup. +This provider routes inference requests to models hosted on `build.nvidia.com` using the API base URL `https://integrate.api.nvidia.com` +and the NGC API key with `Public API Endpoints` permissions provided during deployment (automatically saved as the built-in `system/ngc-api-key` secret). + +You can verify this provider exists by running `nemo inference providers list --workspace system`. + +The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead. + ## Step 4: Upload an Input Fileset @@ -131,38 +143,40 @@ preview.display_record(0) # render a record with entity highlights `preview.dataset` is a regular pandas DataFrame, so you can persist it with `to_csv` or `to_parquet`. -??? "Run preview from the CLI instead" - The same flow is available from the CLI. Write the spec to YAML: + +The same flow is available from the CLI. Write the spec to YAML: - ```python - import yaml - from pathlib import Path +```python +import yaml +from pathlib import Path - preview_spec_path = Path("/tmp/anonymizer-preview.yaml") - preview_spec_path.write_text(yaml.safe_dump(request.model_dump(mode="json", exclude_none=True))) - ``` +preview_spec_path = Path("/tmp/anonymizer-preview.yaml") +preview_spec_path.write_text(yaml.safe_dump(request.model_dump(mode="json", exclude_none=True))) +``` - Then run either of: +Then run either of: - ```bash - nemo anonymizer preview run \ - --spec-file /tmp/anonymizer-preview.yaml \ - --workspace "${NMP_WORKSPACE:-default}" +```bash +nemo anonymizer preview run \ + --spec-file /tmp/anonymizer-preview.yaml \ + --workspace "${NMP_WORKSPACE:-default}" - nemo anonymizer preview submit \ - --spec-file /tmp/anonymizer-preview.yaml \ - --workspace "${NMP_WORKSPACE:-default}" \ - --base-url "${NMP_BASE_URL:-http://localhost:8080}" - ``` +nemo anonymizer preview submit \ + --spec-file /tmp/anonymizer-preview.yaml \ + --workspace "${NMP_WORKSPACE:-default}" \ + --base-url "${NMP_BASE_URL:-http://localhost:8080}" +``` - The CLI streams newline-delimited JSON frames (`preview_dataset`, `trace_dataset`, `failed_records`, ...) to stdout. See the [preview tutorial](tutorials/preview.md) for the frame schema and `jq` recipes. +The CLI streams newline-delimited JSON frames (`preview_dataset`, `trace_dataset`, `failed_records`, ...) to stdout. See the [preview tutorial](/anonymize-data/tutorials/preview-a-config) for the frame schema and `jq` recipes. + -!!! note - `anonymizer.preview` calls the plugin service, so it rejects local file paths in `data.source` and requires `model_configs`. The fileset reference and `model_configs` in the example above satisfy both constraints. + +`anonymizer.preview` calls the plugin service, so it rejects local file paths in `data.source` and requires `model_configs`. The fileset reference and `model_configs` in the example above satisfy both constraints. + ## Step 6: Run a Full Job -When the preview looks correct, run the full pipeline. The `anonymizer.run` job can execute either locally in the CLI process (`run run`) or on the {{platform_name}} Jobs worker (`run submit` / `sdk.anonymizer.run()`). +When the preview looks correct, run the full pipeline. The `anonymizer.run` job can execute either locally in the CLI process (`run run`) or on the NeMo Platform Jobs worker (`run submit` / `sdk.anonymizer.run()`). Build an `AnonymizerRequest`: @@ -191,7 +205,7 @@ print(dataset.head()) print(f"records={len(dataset)} failures={len(results.load_failed_records())}") ``` -`sdk.anonymizer.run()` returns an `AnonymizerJobResource`. `wait_until_done=True` blocks until the job reaches a terminal state; `download_artifacts()` fetches the job artifacts and returns an `AnonymizerJobResults` for in-memory access. See [SDK Resources](sdk-resources.md) for the full surface. +`sdk.anonymizer.run()` returns an `AnonymizerJobResource`. `wait_until_done=True` blocks until the job reaches a terminal state; `download_artifacts()` fetches the job artifacts and returns an `AnonymizerJobResults` for in-memory access. See [SDK Resources](/anonymize-data/sdk-resources) for the full surface. The CLI equivalent submits the same spec. First write it to YAML: @@ -212,7 +226,7 @@ nemo anonymizer run submit \ --base-url "${NMP_BASE_URL:-http://localhost:8080}" ``` -Track the submitted job with `nemo jobs get-status --workspace "${NMP_WORKSPACE:-default}"` and `nemo jobs get-logs --workspace "${NMP_WORKSPACE:-default}"`. +Track the submitted job with `nemo jobs get-status <job-name> --workspace "${NMP_WORKSPACE:-default}"` and `nemo jobs get-logs <job-name> --workspace "${NMP_WORKSPACE:-default}"`. **Option B — run locally in the CLI process:** @@ -235,8 +249,10 @@ The CLI prints `{"exit_code": 0}` on success and logs the artifact directory (`f - `metadata.json`: run metadata. - `failed_records.json`: per-record failures, only when there were failures. -!!! note "Differences between `run run` and `run submit`" - `run submit` rejects local file paths in `data.source` (use a fileset reference or `http(s)` URL) and requires explicit `model_configs` referencing Inference Gateway providers. `run run` accepts local paths and can run without `model_configs` when the library defaults suffice. + +Differences between `run run` and `run submit` +`run submit` rejects local file paths in `data.source` (use a fileset reference or `http(s)` URL) and requires explicit `model_configs` referencing Inference Gateway providers. `run run` accepts local paths and can run without `model_configs` when the library defaults suffice. + ## Step 7: Inspect Artifacts @@ -257,7 +273,7 @@ trace = pd.read_parquet(ARTIFACTS_DIR / "trace.parquet", dtype_backend="pyar print(dataset.head()) ``` -The trace dataset (and the dataset itself for `annotate` / `substitute` strategies) contains pyarrow-backed `struct>` columns. Use `pyarrow.parquet.read_table(...).to_pylist()` if you need plain Python `dict`/`list` values for JSON output. +The trace dataset (and the dataset itself for `annotate` / `substitute` strategies) contains pyarrow-backed `struct<entities: list<...>>` columns. Use `pyarrow.parquet.read_table(...).to_pylist()` if you need plain Python `dict`/`list` values for JSON output. ## Troubleshooting @@ -266,12 +282,12 @@ The trace dataset (and the dataset itself for `annotate` / `substitute` strategi | `nemo anonymizer preview submit` returns 404 | The `anonymizer` plugin service isn't mounted on the gateway | Confirm `uv sync` ran successfully at the repo root and re-run `nemo services run` so the plugin is discovered. See [Step 1](#step-1-install-the-plugin). | | `model_configs are required for remote execution` | `anonymizer.preview` / `preview submit` requires explicit `model_configs` | Add `model_configs` referencing an Inference Gateway provider. | | `Input source ... is a local path` | Plugin-service execution rejects local paths | Use an `http(s)` URL or a fileset reference. | -| `Fileset input ... must resolve to a .csv or .parquet file` | Fileset path is a directory or wrong extension | Point the `#` fragment at a single `.csv` or `.parquet` file. | +| `Fileset input ... must resolve to a .csv or .parquet file` | Fileset path is a directory or wrong extension | Point the `#<path>` fragment at a single `.csv` or `.parquet` file. | | `provider not found` | Inference provider missing | Inspect or create the provider using the inference/model-provider docs, then reference it in `model_configs`. | ## Next Steps -- **Tutorials:** Walk through preview and run flows in detail in the [tutorials](tutorials/index.md). -- **SDK reference:** See [SDK Resources](sdk-resources.md) for the `anonymizer` accessor, preview result, and job result types. -- **CLI reference:** See [CLI Reference](cli.md) for spec-file fields and command flags. +- **Tutorials:** Walk through preview and run flows in detail in the [tutorials](/anonymize-data/tutorials/overview). +- **SDK reference:** See [SDK Resources](/anonymize-data/sdk-resources) for the `anonymizer` accessor, preview result, and job result types. +- **CLI reference:** See [CLI Reference](/anonymize-data/cli-reference) for spec-file fields and command flags. - **Library docs:** Detection, replacement strategy parameters, and rewrite mode are documented in the [open-source library](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs). diff --git a/docs/anonymizer/sdk-resources.mdx b/docs/anonymizer/sdk-resources.mdx index d6882a817e..f28e9b98f6 100644 --- a/docs/anonymizer/sdk-resources.mdx +++ b/docs/anonymizer/sdk-resources.mdx @@ -1,11 +1,15 @@ +--- +title: "SDK Resources" +description: "" +--- -# Anonymizer {{platform_name}} SDK Resources +# Anonymizer NeMo Platform SDK Resources -The `anonymizer.config` module (from the [NVIDIA NeMo Anonymizer library](https://github.com/NVIDIA-NeMo/Anonymizer)) builds `AnonymizerConfig` objects in a context-agnostic way. Once you are ready to execute that config against the {{platform_name}} Anonymizer service, you use objects from the `nemo_platform` SDK. This page describes the {{platform_name}}-specific objects. +The `anonymizer.config` module (from the [NVIDIA NeMo Anonymizer library](https://github.com/NVIDIA-NeMo/Anonymizer)) builds `AnonymizerConfig` objects in a context-agnostic way. Once you are ready to execute that config against the NeMo Platform Anonymizer service, you use objects from the `nemo_platform` SDK. This page describes the NeMo Platform-specific objects. ## AnonymizerResource -The `AnonymizerResource` is the entry point for working with Anonymizer on {{platform_name}}. It wraps the streaming preview endpoint and job submission for the plugin service. +The `AnonymizerResource` is the entry point for working with Anonymizer on NeMo Platform. It wraps the streaming preview endpoint and job submission for the plugin service. A `AnonymizerResource` is accessed directly from a `NeMoPlatform` instance: @@ -25,7 +29,7 @@ An `AsyncAnonymizerResource` with the same surface is available via `AsyncNeMoPl | Method | Description | |----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------| | `preview(request, *, workspace=None)` | Runs a streaming preview against the plugin service and returns an `AnonymizerPreviewResult` after the stream completes. | -| `run(request, *, workspace=None, wait_until_done=False)` | Submits an `anonymizer.run` job to the {{platform_name}} Jobs worker. Returns an `AnonymizerJobResource`. When `wait_until_done=True`, blocks until the job reaches a terminal state. | +| `run(request, *, workspace=None, wait_until_done=False)` | Submits an `anonymizer.run` job to the NeMo Platform Jobs worker. Returns an `AnonymizerJobResource`. When `wait_until_done=True`, blocks until the job reaches a terminal state. | | `get_job_resource(job_name, workspace=None)` | Returns an `AnonymizerJobResource` for an existing job (by job name). | `request` is a `PreviewRequest` or `AnonymizerRequest` instance from `nemo_anonymizer_plugin.app.task_config`. Both accept the same `config`, `data`, `model_configs`, and `selected_models` fields; `PreviewRequest` adds `num_records`. @@ -43,8 +47,9 @@ Both `preview` and `run` call the plugin service, so they require `model_configs | `failed_records` | `list[dict]` of per-record failures with reasons. Empty when nothing failed. | | `display_record(index=None)` | Renders a single trace record as HTML in a notebook. When `index` is omitted, cycles through records. | -??? "More about preview results" - `AnonymizerPreviewResult` holds everything in memory; nothing is persisted to disk by default. The `dataset` and `trace_dataset` fields are regular pandas DataFrames and can be saved with `to_csv` / `to_parquet`. + +`AnonymizerPreviewResult` holds everything in memory; nothing is persisted to disk by default. The `dataset` and `trace_dataset` fields are regular pandas DataFrames and can be saved with `to_csv` / `to_parquet`. + ## AnonymizerJobResource @@ -87,18 +92,19 @@ dataset = results.load_dataset() | `load_failed_records()` | Returns `failed_records.json` as `list[dict]`. Returns `[]` when the file isn't present. | | `display_record(index=None)` | Renders a single trace record as HTML in a notebook. When `index` is omitted, cycles through records. | -??? "More about job results" - `AnonymizerJobResults` reads files lazily — methods load the corresponding parquet or JSON only when called. The underlying directory layout is: + +`AnonymizerJobResults` reads files lazily — methods load the corresponding parquet or JSON only when called. The underlying directory layout is: - ```text - / - dataset.parquet - trace.parquet - metadata.json - failed_records.json # only when there were failures - ``` +```text +/ + dataset.parquet + trace.parquet + metadata.json + failed_records.json # only when there were failures +``` - By default, `download_artifacts` saves the tarball contents to a local directory named after the job; pass `path=` to override. +By default, `download_artifacts` saves the tarball contents to a local directory named after the job; pass `path=` to override. + ## Request Models @@ -136,7 +142,7 @@ The plugin-owned API-boundary input spec: | `id_column` | `str \| None` | Optional record identifier column. | | `data_summary` | `str \| None` | Optional short description of the data passed to Anonymizer library prompts. | -Fileset references can take any of the three forms `fileset:///#`, `/#`, or `#`, and must resolve to a single `.csv` or `.parquet` file. +Fileset references can take any of the three forms `fileset://<workspace>/<fileset>#<path>`, `<workspace>/<fileset>#<path>`, or `<fileset>#<path>`, and must resolve to a single `.csv` or `.parquet` file. ### SelectedModelsOverrides diff --git a/docs/anonymizer/tutorials/index.mdx b/docs/anonymizer/tutorials/index.mdx index 0ba4ffe897..0b4ed8bd8a 100644 --- a/docs/anonymizer/tutorials/index.mdx +++ b/docs/anonymizer/tutorials/index.mdx @@ -1,3 +1,7 @@ +--- +title: "Overview" +description: "" +--- # Tutorials @@ -9,7 +13,7 @@ Anonymizer separates **configuration** (what to detect and how to replace it) fr **Part 1: Build the config (library)** -Use [`anonymizer.config`](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs) to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the {{platform_name}} service. +Use [`anonymizer.config`](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs) to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the NeMo Platform service. ```python from anonymizer.config.anonymizer_config import AnonymizerConfig @@ -66,17 +70,17 @@ preview = anonymizer.preview(PreviewRequest( ## Service-Specific Considerations -When using Anonymizer as a {{platform_name}} service: +When using Anonymizer as a NeMo Platform service: | Feature | Difference | Details | |----------------|---------------------------------------------------------|----------------------------------------------------------------------------------------| | **Inference** | Routes through the Inference Gateway | Configure providers once and reference them by name from `model_configs`. | -| **Input data** | Filesets and HTTP(S) URLs (local paths only in local CLI execution) | Use `sdk.files.filesets.create` / `sdk.files.upload`, then reference with `#`. | -| **Artifacts** | Local or platform-managed | `run run` writes to `persistent/results/artifacts` locally; `run submit` stores artifacts in {{platform_name}} job storage. | +| **Input data** | Filesets and HTTP(S) URLs (local paths only in local CLI execution) | Use `sdk.files.filesets.create` / `sdk.files.upload`, then reference with `#<path>`. | +| **Artifacts** | Local or platform-managed | `run run` writes to `persistent/results/artifacts` locally; `run submit` stores artifacts in NeMo Platform job storage. | ## Prerequisites -Before starting these tutorials, complete the [Quick Start](../quickstart.md) to: +Before starting these tutorials, complete the [Quick Start](/anonymize-data/quickstart) to: - Install the plugin and verify the `nemo anonymizer` CLI. - Configure an inference provider used in `model_configs`. @@ -86,7 +90,7 @@ Before starting these tutorials, complete the [Quick Start](../quickstart.md) to
-- **[Preview a Config](preview.md)** +- **[Preview a Config](/anonymize-data/tutorials/preview-a-config)** --- @@ -94,7 +98,7 @@ Before starting these tutorials, complete the [Quick Start](../quickstart.md) to beginner anonymizer -- **[Run an Anonymizer Job](run.md)** +- **[Run an Anonymizer Job](/anonymize-data/tutorials/run-an-anonymizer-job)** --- diff --git a/docs/anonymizer/tutorials/preview.mdx b/docs/anonymizer/tutorials/preview.mdx index d5a9a9d41a..0141b69a14 100644 --- a/docs/anonymizer/tutorials/preview.mdx +++ b/docs/anonymizer/tutorials/preview.mdx @@ -1,3 +1,7 @@ +--- +title: "Preview a Config" +description: "" +--- # Preview a Config @@ -7,7 +11,7 @@ For detection and replacement strategy details, see the [open-source library doc ## Prerequisites -- The Anonymizer plugin installed and the `nemo anonymizer` CLI available. See the [Quick Start](../quickstart.md). +- The Anonymizer plugin installed and the `nemo anonymizer` CLI available. See the [Quick Start](/anonymize-data/quickstart). - An inference provider configured (default examples use `nvidia-build`). - A fileset named `anonymizer-inputs` with `anonymizer-input.csv` uploaded (created in the Quick Start). @@ -103,19 +107,20 @@ preview.display_record(0) # render record 0 with entity highlights in a noteboo `AnonymizerResource.preview` calls the plugin service, so the same constraints as `preview submit` apply: use a fileset or `http(s)` source, and include `model_configs`. -??? "Async client" - `AsyncAnonymizerResource.preview` has the same signature and return type: + +`AsyncAnonymizerResource.preview` has the same signature and return type: - ```python - import os - from nemo_platform import AsyncNeMoPlatform +```python +import os +from nemo_platform import AsyncNeMoPlatform - async_sdk = AsyncNeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace=WORKSPACE, - ) - preview = await async_sdk.anonymizer.preview(request) - ``` +async_sdk = AsyncNeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace=WORKSPACE, +) +preview = await async_sdk.anonymizer.preview(request) +``` + ## Step 3: Persist Preview Records @@ -126,8 +131,9 @@ preview.dataset.to_csv("anonymized-preview.csv", index=False) preview.dataset.to_parquet("anonymized-preview.parquet", index=False) ``` -??? "More about preview results" - `AnonymizerPreviewResult` stores everything in memory; nothing is persisted to disk by default. The `dataset` field is a regular pandas DataFrame and can be saved with `to_csv` or `to_parquet`. + +`AnonymizerPreviewResult` stores everything in memory; nothing is persisted to disk by default. The `dataset` field is a regular pandas DataFrame and can be saved with `to_csv` or `to_parquet`. + ## Step 4: Run from the CLI (Alternative) @@ -172,7 +178,7 @@ jq -R 'fromjson? | select(.kind == "preview_dataset") | .records' \ /tmp/anonymizer-preview.ndjson ``` -If `preview submit` returns 404 against the gateway, the plugin service isn't mounted. Confirm the plugin is installed and restart `nemo services run`; see [Quick Start — Step 1](../quickstart.md#step-1-install-the-plugin). +If `preview submit` returns 404 against the gateway, the plugin service isn't mounted. Confirm the plugin is installed and restart `nemo services run`; see [Quick Start — Step 1](/anonymize-data/quickstart#step-1-install-the-plugin). ## Input Source Forms @@ -192,7 +198,7 @@ fileset:///# # ``` -The `#` fragment must resolve to a single `.csv` or `.parquet` file. The plugin downloads the file before constructing the Anonymizer library input and cleans up the temp directory when the preview completes. +The `#<path>` fragment must resolve to a single `.csv` or `.parquet` file. The plugin downloads the file before constructing the Anonymizer library input and cleans up the temp directory when the preview completes. ## Validate the Config Independently @@ -215,6 +221,6 @@ nemo anonymizer validate --config /tmp/anonymizer-config.yaml ## Next Steps -- Run a full job and inspect parquet output in the [run tutorial](run.md). -- Refer to [SDK Resources](../sdk-resources.md) for `AnonymizerPreviewResult` and `AnonymizerResource.preview` details. +- Run a full job and inspect parquet output in the [run tutorial](/anonymize-data/tutorials/run-an-anonymizer-job). +- Refer to [SDK Resources](/anonymize-data/sdk-resources) for `AnonymizerPreviewResult` and `AnonymizerResource.preview` details. - Learn about rewrite and replacement strategy parameters in the [library docs](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs). diff --git a/docs/anonymizer/tutorials/run.mdx b/docs/anonymizer/tutorials/run.mdx index 16f3c52626..87346f8151 100644 --- a/docs/anonymizer/tutorials/run.mdx +++ b/docs/anonymizer/tutorials/run.mdx @@ -1,13 +1,17 @@ +--- +title: "Run an Anonymizer Job" +description: "" +--- # Run an Anonymizer Job -This tutorial walks through the `anonymizer.run` job: defining a run spec, executing it locally or on the {{platform_name}} Jobs worker, and loading the parquet artifacts it produces. +This tutorial walks through the `anonymizer.run` job: defining a run spec, executing it locally or on the NeMo Platform Jobs worker, and loading the parquet artifacts it produces. For detection, rewrite, and replacement strategy details, see the [open-source library documentation](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs). ## Prerequisites -- The Anonymizer plugin installed and the `nemo anonymizer` CLI available. See the [Quick Start](../quickstart.md). +- The Anonymizer plugin installed and the `nemo anonymizer` CLI available. See the [Quick Start](/anonymize-data/quickstart). - An inference provider configured (default examples use `nvidia-build`). - A fileset named `anonymizer-inputs` with `anonymizer-input.csv` uploaded (created in the Quick Start). @@ -20,7 +24,7 @@ There are three run commands: | Command | Where it runs | Local paths | `model_configs` required | Artifacts | |-------------------------------|----------------------------------------------|-------------|--------------------------|--------------------------------------------------------| | `nemo anonymizer run run` | Local CLI process via generated `is_local` path | Allowed | Optional | Written under `persistent/results/artifacts` locally | -| `nemo anonymizer run submit` | {{platform_name}} Jobs worker | Rejected | Required | Stored in {{platform_name}} job artifact storage; pull with `download_artifacts()` | +| `nemo anonymizer run submit` | NeMo Platform Jobs worker | Rejected | Required | Stored in NeMo Platform job artifact storage; pull with `download_artifacts()` | | `nemo anonymizer run explain` | Local schema introspection | n/a | n/a | Prints job key, submit endpoint, and input/spec schemas | Job artifacts (under the `artifacts/` directory): @@ -82,7 +86,7 @@ spec_path.write_text(yaml.safe_dump(request.model_dump(mode="json", exclude_none ## Step 3: Run the Job -Choose one execution path. Option A runs in the local CLI process. Option B submits the same request to the {{platform_name}} Jobs worker. +Choose one execution path. Option A runs in the local CLI process. Option B submits the same request to the NeMo Platform Jobs worker. ### Option A: Run Locally @@ -108,7 +112,7 @@ Use that path in the next step. ### Option B: Submit to the Jobs Worker -To execute the same spec on the {{platform_name}} Jobs worker instead of in the CLI process, use `run submit`: +To execute the same spec on the NeMo Platform Jobs worker instead of in the CLI process, use `run submit`: ```bash nemo anonymizer run submit \ @@ -134,7 +138,7 @@ job = sdk.anonymizer.run(request) Compared to `run run`, the submit path: -- Rejects local file paths in `data.source` — use a fileset reference (`#`) or `http(s)` URL. +- Rejects local file paths in `data.source` — use a fileset reference (`<fileset>#<path>`) or `http(s)` URL. - Requires explicit `model_configs` referencing Inference Gateway providers, because the job runs outside the CLI process and cannot inherit Data Designer's locally-defined providers. ## Step 4: Get Results @@ -169,7 +173,7 @@ print(dataset.head()) print(f"records={len(dataset)} failures={len(failed_records)}") ``` -The trace dataset (and the dataset itself for `annotate` / `substitute` strategies) contains pyarrow-backed `struct>` columns. If you need plain Python `dict`/`list` values for JSON output, use `pyarrow.parquet`: +The trace dataset (and the dataset itself for `annotate` / `substitute` strategies) contains pyarrow-backed `struct<entities: list<...>>` columns. If you need plain Python `dict`/`list` values for JSON output, use `pyarrow.parquet`: ```python import pyarrow.parquet as pq @@ -231,7 +235,7 @@ trace = results.load_trace() failed = results.load_failed_records() ``` -`AnonymizerJobResults` exposes `load_dataset()`, `load_trace()`, `load_failed_records()`, and `display_record()` over the same underlying files. See [SDK Resources](../sdk-resources.md#anonymizerjobresults). +`AnonymizerJobResults` exposes `load_dataset()`, `load_trace()`, `load_failed_records()`, and `display_record()` over the same underlying files. See [SDK Resources](/anonymize-data/sdk-resources#anonymizerjobresults). ## Inspect the Schema Without Running @@ -259,6 +263,6 @@ For `run submit`, provider endpoints are re-resolved at runtime so the job uses ## Next Steps -- Iterate faster with [preview](preview.md) before scaling to a full job. -- Refer to [SDK Resources](../sdk-resources.md) for `AnonymizerJobResource` and `AnonymizerJobResults` details. +- Iterate faster with [preview](/anonymize-data/tutorials/preview-a-config) before scaling to a full job. +- Refer to [SDK Resources](/anonymize-data/sdk-resources) for `AnonymizerJobResource` and `AnonymizerJobResults` details. - Replacement strategy parameters and rewrite mode are documented in the [library docs](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs). diff --git a/docs/api/index.mdx b/docs/api/index.mdx index ae3f0034ca..10d63efa9b 100644 --- a/docs/api/index.mdx +++ b/docs/api/index.mdx @@ -1,17 +1,12 @@ -# {{platform_name}} API Reference +--- +title: "API Reference" +description: "Explore the full NeMo Platform REST API." +--- -Explore the full {{platform_name}} API. Use the service buttons to filter by microservice. +Explore the full NeMo Platform REST API. The **REST API** reference is generated from the +OpenAPI specification and grouped by service (Auditor, Customizer, Data Designer, Entity +Store, Evaluator, Guardrails, Inference Gateway, Safe Synthesizer, and more). -
- - - - - - - - - -
- - +Use the navigation to browse endpoints by service, or the search bar to find a specific +operation. Each endpoint page includes request and response schemas and an interactive +playground. diff --git a/docs/auditor/configs/index.mdx b/docs/auditor/configs/index.mdx index 824a11d933..722173d224 100644 --- a/docs/auditor/configs/index.mdx +++ b/docs/auditor/configs/index.mdx @@ -1,7 +1,11 @@ +--- +title: "Overview" +description: "" +--- # Manage Audit Configurations -An `AuditConfig` selects which garak probes and detectors run during an audit, how many generations to make per probe, and where the reports land. Configurations are persisted in the {{platform_name}} entity store and referenced by name from `client.auditor.run(...)`. +An `AuditConfig` selects which garak probes and detectors run during an audit, how many generations to make per probe, and where the reports land. Configurations are persisted in the NeMo Platform entity store and referenced by name from `client.auditor.run(...)`. ## What an AuditConfig Holds @@ -14,7 +18,7 @@ A configuration has four sub-blocks plus a free-form description. Each sub-block | `plugins` | The probes and detectors to run (`probe_spec`, `detector_spec`) and any plugin-specific options. | | `reporting` | Output file prefix and directory, optional taxonomy, and which modules are surfaced in summaries. | -The full field reference is in [Configuration Schema](schema.md). +The full field reference is in [Configuration Schema](/vulnerability-scanning/configurations/schema). ## The Seeded Default @@ -51,7 +55,7 @@ config = client.auditor.configs.create( ) ``` -The `probe_spec` field takes a comma-separated list of probe categories or fully-qualified probe classes. See [Selecting Probes](probes.md) for the syntax and worked examples. +The `probe_spec` field takes a comma-separated list of probe categories or fully-qualified probe classes. See [Selecting Probes](/vulnerability-scanning/configurations/selecting-probes) for the syntax and worked examples. ## List Configurations diff --git a/docs/auditor/configs/probes.mdx b/docs/auditor/configs/probes.mdx index ae742b238b..24a9ef7afe 100644 --- a/docs/auditor/configs/probes.mdx +++ b/docs/auditor/configs/probes.mdx @@ -1,3 +1,7 @@ +--- +title: "Selecting Probes" +description: "" +--- # Selecting Probes diff --git a/docs/auditor/configs/schema.mdx b/docs/auditor/configs/schema.mdx index bfa4d5399d..393070715e 100644 --- a/docs/auditor/configs/schema.mdx +++ b/docs/auditor/configs/schema.mdx @@ -1,7 +1,11 @@ +--- +title: "Schema" +description: "" +--- # Configuration Schema -This page lists every field on `AuditConfig` and its four sub-models. Defaults match the pydantic definitions in `nemo_auditor.entities`; the {{platform_name}} entity store validates writes against these schemas. +This page lists every field on `AuditConfig` and its four sub-models. Defaults match the pydantic definitions in `nemo_auditor.entities`; the NeMo Platform entity store validates writes against these schemas. ## `AuditConfig` @@ -43,7 +47,7 @@ Controls how each probe is executed. | `deprefix` | `bool` | `True` | Strip prompt prefixes from responses before detection. | | `eval_threshold` | `float` (0–1) | `0.5` | Detector score threshold above which a response is flagged as a hit. | | `generations` | `int` (≥1) | `5` | Generations per probe. Lower for fast iteration; raise for statistical confidence. | -| `probe_tags` | `str \| None` | `None` | Comma-separated tag filter (`"owasp:llm06,payload:hallucination"`). Intersects with `plugins.probe_spec`. See [Selecting Probes](probes.md). | +| `probe_tags` | `str \| None` | `None` | Comma-separated tag filter (`"owasp:llm06,payload:hallucination"`). Intersects with `plugins.probe_spec`. See [Selecting Probes](/vulnerability-scanning/configurations/selecting-probes). | | `user_agent` | `str` | `"garak/{version} (LLM vulnerability scanner https://garak.ai)"` | HTTP `User-Agent` garak sends. The `{version}` placeholder is substituted at runtime. | ## `AuditPluginsData` @@ -54,7 +58,7 @@ Controls probe, detector, generator, buff, and harness selection. |-------|------|---------|-------------| | `model_type` | `str \| None` | `None` | Garak generator type override. Normally set on the target instead. | | `model_name` | `str \| None` | `None` | Garak model name override. Normally set on the target instead. | -| `probe_spec` | `str` | `"all"` | Comma-separated probe selection. See [Selecting Probes](probes.md). | +| `probe_spec` | `str` | `"all"` | Comma-separated probe selection. See [Selecting Probes](/vulnerability-scanning/configurations/selecting-probes). | | `detector_spec` | `str` | `"auto"` | Detector selection — `"auto"` lets each probe pick its detectors. | | `extended_detectors` | `bool` | `False` | Enable garak's extended detector set. | | `buff_spec` | `str \| None` | `None` | Comma-separated buff selection (prompt mutations applied before sending). | diff --git a/docs/auditor/index.mdx b/docs/auditor/index.mdx index 20b3e1a332..42b4a0668a 100644 --- a/docs/auditor/index.mdx +++ b/docs/auditor/index.mdx @@ -1,17 +1,19 @@ --- -description: Scan and audit large language models for jailbreaks, prompt injection, encoding bypasses, and other safety failures using {{__auditor_short_name}}, powered by garak. +title: "About" +description: "Scan and audit large language models for jailbreaks, prompt injection, encoding bypasses, and other safety failures using NeMo Auditor, powered by garak." --- - # About Auditing Models -!!! important "{{__auditor_long_name}} is released with _early access_ availability and is subject to limited support and potential API changes in future releases." + +NVIDIA NeMo Auditor is released with _early access_ availability and is subject to limited support and potential API changes in future releases. + -{{__auditor_long_name}} audits LLMs by probing them with adversarial prompts and detecting failures such as jailbreaks, prompt injection, encoding bypasses, and unsafe output generation. -It is powered by [garak](https://github.com/NVIDIA/garak), NVIDIA's open-source LLM vulnerability scanner, and integrates with {{platform_name}} so audits can target any model reachable through the Inference Gateway. +NVIDIA NeMo Auditor audits LLMs by probing them with adversarial prompts and detecting failures such as jailbreaks, prompt injection, encoding bypasses, and unsafe output generation. +It is powered by [garak](https://github.com/NVIDIA/garak), NVIDIA's open-source LLM vulnerability scanner, and integrates with NeMo Platform so audits can target any model reachable through the Inference Gateway. -[**Tutorials**](tutorials/index.md){ .md-button } -[**SDK Resources**](sdk-resources.md){ .md-button } +[**Tutorials**](/vulnerability-scanning/tutorials/overview) +[**SDK Resources**](/vulnerability-scanning/sdk-resources) --- @@ -19,21 +21,21 @@ It is powered by [garak](https://github.com/NVIDIA/garak), NVIDIA's open-source A typical audit looks like the following: -1. Create an [audit target](targets/index.md) for the model you want to test. -1. Create an [audit configuration](configs/index.md) that selects which garak probes and detectors to run, along with reporting settings. -1. [Run the audit](tutorials/run-audit-locally.md) and inspect the resulting JSONL, HTML, and hitlog reports. +1. Create an [audit target](/vulnerability-scanning/targets/overview) for the model you want to test. +1. Create an [audit configuration](/vulnerability-scanning/configurations/overview) that selects which garak probes and detectors to run, along with reporting settings. +1. [Run the audit](/vulnerability-scanning/tutorials/run-an-audit-locally) and inspect the resulting JSONL, HTML, and hitlog reports. -The plugin exposes both [synchronous and asynchronous](sdk-resources.md) Python entry points for each step. +The plugin exposes both [synchronous and asynchronous](/vulnerability-scanning/sdk-resources) Python entry points for each step. --- ## Setup -Before you can run audits, you need a working {{platform_name}} install with the auditor plugin enabled and a garak interpreter on disk. +Before you can run audits, you need a working NeMo Platform install with the auditor plugin enabled and a garak interpreter on disk. -- Follow [Setup](../get-started/setup.md) to install the platform and start local services. +- Follow [Setup](/get-started/setup) to install the platform and start local services. - Install garak in a Python virtual environment. By default the plugin invokes `~/.auditor/.venv/bin/python -m garak`; override the interpreter path with `NEMO_AUDITOR_GARAK_PYTHON` if you installed it elsewhere. -- Configure at least one [Inference Gateway provider](../run-inference/about.md) so audits can route requests to the model under test. +- Configure at least one [Inference Gateway provider](/models-and-inference/about) so audits can route requests to the model under test. --- @@ -41,25 +43,25 @@ Before you can run audits, you need a working {{platform_name}} install with the
-- **[Audit Targets](targets/index.md)** +- **[Audit Targets](/vulnerability-scanning/targets/overview)** --- Define the model under test — generator type, model identifier, and inference endpoint. -- **[Audit Configurations](configs/index.md)** +- **[Audit Configurations](/vulnerability-scanning/configurations/overview)** --- Choose probes, detectors, and reporting settings for the audit. -- **[Run an Audit Locally](tutorials/run-audit-locally.md)** +- **[Run an Audit Locally](/vulnerability-scanning/tutorials/run-an-audit-locally)** --- End-to-end walkthrough: create entities, run the audit in-process, read the report artifacts. -- **[SDK Resources](sdk-resources.md)** +- **[SDK Resources](/vulnerability-scanning/sdk-resources)** --- @@ -71,28 +73,28 @@ Before you can run audits, you need a working {{platform_name}} install with the
-- **[Configuration Schema](configs/schema.md)** +- **[Configuration Schema](/vulnerability-scanning/configurations/schema)** --- Field reference for `AuditConfig` and its `system`, `run`, `plugins`, and `reporting` sub-models. -- **[Target Schema](targets/schema.md)** +- **[Target Schema](/vulnerability-scanning/targets/schema)** --- Field reference for `AuditTarget` (`type`, `model`, `options`). -- **[Selecting Probes](configs/probes.md)** +- **[Selecting Probes](/vulnerability-scanning/configurations/selecting-probes)** --- `probe_spec`, `probe_tags`, and `detector_spec` syntax with worked examples. -- **[Inference Gateway](targets/inference-gateway.md)** +- **[Inference Gateway](/vulnerability-scanning/targets/inference-gateway)** --- - How `nmp_uri_spec` resolves a target's URI through a {{platform_name}} provider. + How `nmp_uri_spec` resolves a target's URI through a NeMo Platform provider.
diff --git a/docs/auditor/sdk-resources.mdx b/docs/auditor/sdk-resources.mdx index e356f2c50c..8c09b25fe4 100644 --- a/docs/auditor/sdk-resources.mdx +++ b/docs/auditor/sdk-resources.mdx @@ -1,16 +1,20 @@ +--- +title: "SDK Resources" +description: "" +--- -# {{__auditor_short_name}} {{platform_name}} SDK Resources +# NeMo Auditor NeMo Platform SDK Resources -The {{__auditor_short_name}} plugin mounts a Python SDK surface on the `nemo_platform` client at `client.auditor`. +The NeMo Auditor plugin mounts a Python SDK surface on the `nemo_platform` client at `client.auditor`. This page documents that surface: how to manage audit configurations and targets in the entity store, and how to run an audit in-process using the local execution path. -The CRUD methods exposed on `client.auditor.configs` and `client.auditor.targets` are 1:1 mirrors of the [audit configuration](configs/index.md) and [audit target](targets/index.md) lifecycle and use the same `AuditConfig` and `AuditTarget` pydantic schemas the entity store persists. +The CRUD methods exposed on `client.auditor.configs` and `client.auditor.targets` are 1:1 mirrors of the [audit configuration](/vulnerability-scanning/configurations/overview) and [audit target](/vulnerability-scanning/targets/overview) lifecycle and use the same `AuditConfig` and `AuditTarget` pydantic schemas the entity store persists. ## AuditorPluginResource -The `AuditorPluginResource` is the sync SDK object for working with the {{__auditor_short_name}} plugin. +The `AuditorPluginResource` is the sync SDK object for working with the NeMo Auditor plugin. It is accessed directly from a `NeMoPlatform` instance: ```python @@ -34,7 +38,7 @@ auditor = client.auditor # AuditorPluginResource ### `configs` sub-resource -Five CRUD methods for `AuditConfig` entities. The full field reference is in [Configuration Schema](configs/schema.md). +Five CRUD methods for `AuditConfig` entities. The full field reference is in [Configuration Schema](/vulnerability-scanning/configurations/schema). | Method | Description | Returns | |--------|-------------|---------| @@ -46,7 +50,7 @@ Five CRUD methods for `AuditConfig` entities. The full field reference is in [Co ### `targets` sub-resource -Five CRUD methods for `AuditTarget` entities. The full field reference is in [Target Schema](targets/schema.md). +Five CRUD methods for `AuditTarget` entities. The full field reference is in [Target Schema](/vulnerability-scanning/targets/schema). | Method | Description | Returns | |--------|-------------|---------| diff --git a/docs/auditor/targets/index.mdx b/docs/auditor/targets/index.mdx index 434457a9e8..d41029bb71 100644 --- a/docs/auditor/targets/index.mdx +++ b/docs/auditor/targets/index.mdx @@ -1,7 +1,11 @@ +--- +title: "Overview" +description: "" +--- # Manage Audit Targets -An `AuditTarget` identifies the model under test. It pairs a garak generator class (`type`) with a model identifier (`model`) and a generator-specific options dict (`options`). Targets are persisted in the {{platform_name}} entity store and referenced by name from `client.auditor.run(...)`. +An `AuditTarget` identifies the model under test. It pairs a garak generator class (`type`) with a model identifier (`model`) and a generator-specific options dict (`options`). Targets are persisted in the NeMo Platform entity store and referenced by name from `client.auditor.run(...)`. ## What an AuditTarget Holds @@ -9,16 +13,16 @@ An `AuditTarget` identifies the model under test. It pairs a garak generator cla |-------|-------------| | `type` | A fully-qualified [garak generator class](https://reference.garak.ai/en/latest/generators.html), such as `nim.NVOpenAIChat`, `openai.OpenAIGenerator`, `rest.RestGenerator`, or `test.Blank`. | | `model` | The provider's model identifier passed through to garak (for example, `meta/llama-3.1-8b-instruct`). | -| `options` | A nested dict whose top-level key is the generator namespace (`nim`, `openai`, `rest`, ...). Contents are passed through to garak unchanged, with one exception: the [`nmp_uri_spec`](inference-gateway.md) sentinel is resolved to a concrete `uri` at run time. | +| `options` | A nested dict whose top-level key is the generator namespace (`nim`, `openai`, `rest`, ...). Contents are passed through to garak unchanged, with one exception: the [`nmp_uri_spec`](/vulnerability-scanning/targets/inference-gateway) sentinel is resolved to a concrete `uri` at run time. | | `description` | Optional free-form description shown in listings. | -See [Target Schema](schema.md) for the full field reference. +See [Target Schema](/vulnerability-scanning/targets/schema) for the full field reference. ## Common Target Types ### NIM via Inference Gateway -Audit a NIM (or any OpenAI-compatible chat endpoint) routed through a {{platform_name}} provider: +Audit a NIM (or any OpenAI-compatible chat endpoint) routed through a NeMo Platform provider: ```python target = client.auditor.targets.create( @@ -40,7 +44,7 @@ target = client.auditor.targets.create( ) ``` -The `nmp_uri_spec` block resolves to a concrete `uri` at run time. See [Inference Gateway](inference-gateway.md) for details. +The `nmp_uri_spec` block resolves to a concrete `uri` at run time. See [Inference Gateway](/vulnerability-scanning/targets/inference-gateway) for details. ### OpenAI-compatible Endpoint diff --git a/docs/auditor/targets/inference-gateway.mdx b/docs/auditor/targets/inference-gateway.mdx index fe113ccf64..4320096c97 100644 --- a/docs/auditor/targets/inference-gateway.mdx +++ b/docs/auditor/targets/inference-gateway.mdx @@ -1,11 +1,15 @@ +--- +title: "Inference Gateway" +description: "" +--- # Routing Targets Through the Inference Gateway -`AuditTarget.options` is passed through to garak unchanged, with one exception: the plugin recognizes a `nmp_uri_spec` sentinel and resolves it to a concrete URI at run time using the {{platform_name}} Inference Gateway. This keeps target definitions portable — you store provider references instead of host-specific URLs. +`AuditTarget.options` is passed through to garak unchanged, with one exception: the plugin recognizes a `nmp_uri_spec` sentinel and resolves it to a concrete URI at run time using the NeMo Platform Inference Gateway. This keeps target definitions portable — you store provider references instead of host-specific URLs. ## What `nmp_uri_spec` Is -`nmp_uri_spec` is a nested dict placed inside `options.` that names an Inference Gateway provider. When `client.auditor.run(...)` starts, the plugin walks the options tree, looks each `nmp_uri_spec` block up via `sdk.inference.providers.retrieve(...)`, replaces it with a `uri` key whose value is the provider's resolved OpenAI-compatible URL, and hands the rewritten options dict to garak. +`nmp_uri_spec` is a nested dict placed inside `options.<generator>` that names an Inference Gateway provider. When `client.auditor.run(...)` starts, the plugin walks the options tree, looks each `nmp_uri_spec` block up via `sdk.inference.providers.retrieve(...)`, replaces it with a `uri` key whose value is the provider's resolved OpenAI-compatible URL, and hands the rewritten options dict to garak. The original `AuditTarget` entity stays untouched — only the in-memory copy garak receives is rewritten. diff --git a/docs/auditor/targets/schema.mdx b/docs/auditor/targets/schema.mdx index 243587980b..571309887b 100644 --- a/docs/auditor/targets/schema.mdx +++ b/docs/auditor/targets/schema.mdx @@ -1,7 +1,11 @@ +--- +title: "Schema" +description: "" +--- # Target Schema -This page lists every field on `AuditTarget`. Defaults match the pydantic definition in `nemo_auditor.entities`; the {{platform_name}} entity store validates writes against this schema. +This page lists every field on `AuditTarget`. Defaults match the pydantic definition in `nemo_auditor.entities`; the NeMo Platform entity store validates writes against this schema. ## `AuditTarget` @@ -20,4 +24,4 @@ The entity store adds the standard `NemoEntity` fields on retrieval: `id`, `enti `options` is intentionally opaque to the plugin — its contents are passed through to garak as the generator's `--generator_option_file` payload. Refer to garak's [generator documentation](https://reference.garak.ai/en/latest/generators.html) for the options each generator class accepts. -The plugin recognizes one sentinel inside `options.`: an `nmp_uri_spec` block is resolved at run time to a concrete `uri` value via the {{platform_name}} Inference Gateway. See [Inference Gateway](inference-gateway.md) for the resolution rules and conflict semantics. +The plugin recognizes one sentinel inside `options.<generator>`: an `nmp_uri_spec` block is resolved at run time to a concrete `uri` value via the NeMo Platform Inference Gateway. See [Inference Gateway](/vulnerability-scanning/targets/inference-gateway) for the resolution rules and conflict semantics. diff --git a/docs/auditor/tutorials/index.mdx b/docs/auditor/tutorials/index.mdx index 43fe5e1b1e..7349a4def5 100644 --- a/docs/auditor/tutorials/index.mdx +++ b/docs/auditor/tutorials/index.mdx @@ -1,15 +1,19 @@ +--- +title: "Overview" +description: "" +--- -# {{__auditor_short_name}} Tutorials +# NeMo Auditor Tutorials -Use these tutorials to get hands-on with [{{__auditor_short_name}}](../index.md). +Use these tutorials to get hands-on with [NeMo Auditor](/vulnerability-scanning/about). ## Before You Start -Set up [a local instance of the platform](../../get-started/setup.md), install garak in a Python virtual environment (the plugin invokes `~/.auditor/.venv/bin/python` by default), and configure at least one [Inference Gateway provider](../../run-inference/about.md) before running the tutorial below. +Set up [a local instance of the platform](/get-started/setup), install garak in a Python virtual environment (the plugin invokes `~/.auditor/.venv/bin/python` by default), and configure at least one [Inference Gateway provider](/models-and-inference/about) before running the tutorial below.
-- **[Run an Audit Locally](run-audit-locally.md)** +- **[Run an Audit Locally](/vulnerability-scanning/tutorials/run-an-audit-locally)** --- @@ -21,4 +25,4 @@ Set up [a local instance of the platform](../../get-started/setup.md), install g ## How It Works -For the conceptual overview of the audit workflow, see [About Auditing Models](../index.md). For the full SDK surface, see [SDK Resources](../sdk-resources.md). +For the conceptual overview of the audit workflow, see [About Auditing Models](/vulnerability-scanning/about). For the full SDK surface, see [SDK Resources](/vulnerability-scanning/sdk-resources). diff --git a/docs/auditor/tutorials/run-audit-locally.mdx b/docs/auditor/tutorials/run-audit-locally.mdx index 326671f408..bee28dfa0f 100644 --- a/docs/auditor/tutorials/run-audit-locally.mdx +++ b/docs/auditor/tutorials/run-audit-locally.mdx @@ -1,24 +1,29 @@ +--- +title: "Run an Audit Locally" +description: "" +--- # Run an Audit Locally -This tutorial walks through running a single audit end-to-end with the {{__auditor_short_name}} plugin SDK. You will persist an audit configuration and a target, run the audit in-process, and inspect the resulting report artifacts. +This tutorial walks through running a single audit end-to-end with the NeMo Auditor plugin SDK. You will persist an audit configuration and a target, run the audit in-process, and inspect the resulting report artifacts. **What you will learn:** -- Initialize the {{platform_name}} SDK and reach the `client.auditor` resource. +- Initialize the NeMo Platform SDK and reach the `client.auditor` resource. - Create an `AuditConfig` selecting which garak probes to run. - Create an `AuditTarget` pointing at a model through the Inference Gateway. - Execute the audit locally with `client.auditor.run(...)`. - Read the JSONL, HTML, and hitlog report artifacts the run produces. -!!! tip - This tutorial takes approximately **10 minutes** to complete, plus however long garak takes to run the selected probes against your target. + +This tutorial takes approximately **10 minutes** to complete, plus however long garak takes to run the selected probes against your target. + ## Prerequisites -- Install and start {{platform_name}} using the [Setup guide](../../get-started/setup.md). +- Install and start NeMo Platform using the [Setup guide](/get-started/setup). - Install garak in a Python virtual environment so the plugin can shell out to it. By default the plugin invokes `~/.auditor/.venv/bin/python -m garak`. Override the interpreter path with `NEMO_AUDITOR_GARAK_PYTHON` if your install lives elsewhere. -- Configure at least one Inference Gateway provider — this tutorial uses a `build` provider named `build` that routes to NVIDIA-build models, but any chat-completion-compatible provider works. See [Inference Gateway](../targets/inference-gateway.md) for details on the `nmp_uri_spec` block used below. +- Configure at least one Inference Gateway provider — this tutorial uses a `build` provider named `build` that routes to NVIDIA-build models, but any chat-completion-compatible provider works. See [Inference Gateway](/vulnerability-scanning/targets/inference-gateway) for details on the `nmp_uri_spec` block used below. --- @@ -50,7 +55,7 @@ print(auditor.plugin_status()) # {'plugin': 'auditor', 'status': 'ok', ...} ``` -If `plugin_status()` raises a connection error, the platform is not running or `NMP_BASE_URL` is misconfigured. Refer back to [Setup](../../get-started/setup.md). +If `plugin_status()` raises a connection error, the platform is not running or `NMP_BASE_URL` is misconfigured. Refer back to [Setup](/get-started/setup). --- @@ -79,7 +84,7 @@ config = auditor.configs.create( print(config.model_dump_json(indent=2)) ``` -For the full set of options on each sub-block, see [Configuration Schema](../configs/schema.md). For probe selection syntax, see [Selecting Probes](../configs/probes.md). +For the full set of options on each sub-block, see [Configuration Schema](/vulnerability-scanning/configurations/schema). For probe selection syntax, see [Selecting Probes](/vulnerability-scanning/configurations/selecting-probes). --- @@ -108,7 +113,7 @@ target = auditor.targets.create( print(target.model_dump_json(indent=2)) ``` -The `nmp_uri_spec` sentinel inside `options.nim` tells the plugin to resolve a concrete URI from the Inference Gateway provider at run time. See [Inference Gateway](../targets/inference-gateway.md) for the full conflict rules and resolution behavior. +The `nmp_uri_spec` sentinel inside `options.nim` tells the plugin to resolve a concrete URI from the Inference Gateway provider at run time. See [Inference Gateway](/vulnerability-scanning/targets/inference-gateway) for the full conflict rules and resolution behavior. --- @@ -198,17 +203,25 @@ Report artifacts are left in place under the scheduler's temporary directory; th ## Troubleshooting -**`FileNotFoundError: garak interpreter not found at ...`** -: The plugin couldn't find a garak install. Either install garak at `~/.auditor/.venv/bin/python`, or set `NEMO_AUDITOR_GARAK_PYTHON` to the absolute path of a Python interpreter that has garak installed. +****`FileNotFoundError: garak interpreter not found at ...`**** + +- The plugin couldn't find a garak install. Either install garak at `~/.auditor/.venv/bin/python`, or set `NEMO_AUDITOR_GARAK_PYTHON` to the absolute path of a Python interpreter that has garak installed. + + +****`RuntimeError: Failed to resolve inference gateway provider '<workspace>/<provider>'`**** + +- The `nmp_uri_spec` block in your target's options references an Inference Gateway provider that doesn't exist. List your providers with `client.inference.providers.list(workspace="default")` and update the target to reference an existing one. + + +****`returncode != 0` with empty `results`**** + +- garak started but failed early, usually because the target endpoint is unreachable. Inspect `result["stderr_tail"]` for the error message, and verify the model is reachable through the provider you configured. + -**`RuntimeError: Failed to resolve inference gateway provider '/'`** -: The `nmp_uri_spec` block in your target's options references an Inference Gateway provider that doesn't exist. List your providers with `client.inference.providers.list(workspace="default")` and update the target to reference an existing one. +****Where do the report files live?**** -**`returncode != 0` with empty `results`** -: garak started but failed early, usually because the target endpoint is unreachable. Inspect `result["stderr_tail"]` for the error message, and verify the model is reachable through the provider you configured. +- The scheduler writes garak output under `<temp-dir>/garak/<reporting.report_dir>/<reporting.report_prefix>.*`, then copies the produced files into the local results directory referenced by each `artifact_url`. For a local run this is a temporary directory under the system's `$TMPDIR`. -**Where do the report files live?** -: The scheduler writes garak output under `/garak//.*`, then copies the produced files into the local results directory referenced by each `artifact_url`. For a local run this is a temporary directory under the system's `$TMPDIR`. --- @@ -216,6 +229,6 @@ Report artifacts are left in place under the scheduler's temporary directory; th You created an `AuditConfig` and `AuditTarget`, ran a single audit locally with `client.auditor.run(...)`, and loaded the resulting reports. -- For the full SDK surface — including async variants — see [SDK Resources](../sdk-resources.md). -- For more probe selection options, see [Selecting Probes](../configs/probes.md). -- For other generator types (OpenAI-compatible endpoints, REST targets, test targets), see [Audit Targets](../targets/index.md). +- For the full SDK surface — including async variants — see [SDK Resources](/vulnerability-scanning/sdk-resources). +- For more probe selection options, see [Selecting Probes](/vulnerability-scanning/configurations/selecting-probes). +- For other generator types (OpenAI-compatible endpoints, REST targets, test targets), see [Audit Targets](/vulnerability-scanning/targets/overview). diff --git a/docs/auth/authentication/index.mdx b/docs/auth/authentication/index.mdx index c0ef413948..e7a7ad1106 100644 --- a/docs/auth/authentication/index.mdx +++ b/docs/auth/authentication/index.mdx @@ -1,32 +1,34 @@ -# Authentication +--- +title: "Authentication" +description: "" +--- +NeMo Platform authenticates requests using **OpenID Connect (OIDC)**. You register an OAuth application in your identity provider, configure NeMo Platform with the issuer and client ID, and users sign in via the CLI, SDK, or browser. NeMo Platform validates the JWT on every request and extracts the user's identity for authorization. -{{platform_name}} authenticates requests using **OpenID Connect (OIDC)**. You register an OAuth application in your identity provider, configure {{platform_name}} with the issuer and client ID, and users sign in via the CLI, SDK, or browser. {{platform_name}} validates the JWT on every request and extracts the user's identity for authorization. - -For the quickstart (no IdP), see the [email-based shortcut](../index.md). For the authorization model, see [Authorization Concepts](../concepts.md). +For the quickstart (no IdP), see the [email-based shortcut](/platform/authentication-authorization/overview). For the authorization model, see [Authorization Concepts](/platform/authentication-authorization/concepts). ## Connect Your Identity Provider -Start here — register an OAuth application in your IdP and configure {{platform_name}}: +Start here — register an OAuth application in your IdP and configure NeMo Platform: -- [OIDC Setup](oidc.md) — Step-by-step: register an app, configure {{platform_name}}, verify login. -- [Azure AD (Entra ID)](providers/azure-ad.md) — Azure-specific walkthrough (app registration, scopes, claim mapping). -- [Generic OIDC Provider](providers/generic.md) — Checklist for any OIDC-compliant IdP. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — Step-by-step: register an app, configure NeMo Platform, verify login. +- [Azure AD (Entra ID)](/platform/authentication-authorization/authentication/providers/azure-ad-entra-id) — Azure-specific walkthrough (app registration, scopes, claim mapping). +- [Generic OIDC Provider](/platform/authentication-authorization/authentication/providers/generic-oidc) — Checklist for any OIDC-compliant IdP. ## Log In and Make API Calls -Once your IdP is connected, see [Using Authentication](using-authentication.md) for the full walkthrough: device flow login, SDK and curl examples, token management, and config file reference. +Once your IdP is connected, see [Using Authentication](/platform/authentication-authorization/authentication/using-authentication) for the full walkthrough: device flow login, SDK and curl examples, token management, and config file reference. | Method | Command / Action | Best For | |--------|-----------------|----------| | **Device flow** (browser) | `nemo auth login` | Interactive use — opens browser to sign in with your IdP | -| **Password grant** | `nemo auth login --username --password ` | CI/CD pipelines — non-interactive | +| **Password grant** | `nemo auth login --username <user> --password <pass>` | CI/CD pipelines — non-interactive | | **Direct from IdP** | Use your IdP's token endpoint or workload identity | Custom integrations, service accounts | The CLI stores the token and auto-refreshes it before expiry. The SDK reads the stored token from the CLI config automatically — after `nemo auth login`, `NeMoPlatform()` works with no arguments. ## Discovery Endpoint -{{platform_name}} exposes an unauthenticated endpoint that clients and the SDK use to discover OIDC settings: +NeMo Platform exposes an unauthenticated endpoint that clients and the SDK use to discover OIDC settings: ```text GET {BASE_URL}/apis/auth/discovery @@ -52,5 +54,5 @@ The CLI and SDK call this endpoint automatically during `nemo auth login` or whe ## Related -- [Using Authentication](using-authentication.md) — Log in, make API calls, and manage tokens. -- [Security Model](../security-model.md) — Trust boundaries and the principal model. +- [Using Authentication](/platform/authentication-authorization/authentication/using-authentication) — Log in, make API calls, and manage tokens. +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and the principal model. diff --git a/docs/auth/authentication/oidc.mdx b/docs/auth/authentication/oidc.mdx index 886836b61a..7c779630ef 100644 --- a/docs/auth/authentication/oidc.mdx +++ b/docs/auth/authentication/oidc.mdx @@ -1,29 +1,32 @@ -# OIDC Setup - -Connect {{platform_name}} to an external OIDC identity provider (IdP) so users sign in with your organization's identity and receive OAuth2 access tokens for API access. +--- +title: "OIDC Setup" +description: "" +--- +Connect NeMo Platform to an external OIDC identity provider (IdP) so users sign in with your organization's identity and receive OAuth2 access tokens for API access. **Prerequisites**: You need an OAuth application registered in your IdP with the client ID, issuer URL, and device flow grant enabled. See the [Minimum IdP Checklist](#minimum-idp-checklist) below. -For login after setup, see [Using Authentication](using-authentication.md). For the full config reference, see [Auth Configuration](../deployment/configuration.md). +For login after setup, see [Using Authentication](/platform/authentication-authorization/authentication/using-authentication). For the full config reference, see [Auth Configuration](/platform/authentication-authorization/deployment/configuration). -## How OIDC Fits in {{platform_name}} +## How OIDC Fits in NeMo Platform When OIDC is enabled: 1. **Platform configuration** points to your IdP (issuer URL, client ID, optional scope prefix and claim names). -2. **Discovery**: Clients and the platform discover token and device-auth endpoints from the IdP's `.well-known/openid-configuration` (or from {{platform_name}}'s aggregated discovery at `{BASE_URL}/apis/auth/discovery`). +2. **Discovery**: Clients and the platform discover token and device-auth endpoints from the IdP's `.well-known/openid-configuration` (or from NeMo Platform's aggregated discovery at `{BASE_URL}/apis/auth/discovery`). 3. **Login**: Users obtain access tokens via **device flow** (browser) or **password grant** (CI) using the CLI. -4. **API calls**: Requests send the access token in the `Authorization: Bearer ` header. {{platform_name}} validates the JWT (signature, issuer, audience, expiry) and extracts the principal and scopes for authorization. +4. **API calls**: Requests send the access token in the `Authorization: Bearer <token>` header. NeMo Platform validates the JWT (signature, issuer, audience, expiry) and extracts the principal and scopes for authorization. OIDC provides the identity; authorization (workspace roles, scopes) works the same as with any other auth method. ## Flows Supported - **Device authorization flow**: User runs `nemo auth login`; the CLI shows a code and opens the IdP page; after the user signs in and consents, the CLI receives an access token (and optionally a refresh token). Best for interactive use. -- **Resource owner password grant**: For non-interactive environments (CI), `nemo auth login --username --password ` exchanges credentials for a token. Your IdP must support this grant type. +- **Resource owner password grant**: For non-interactive environments (CI), `nemo auth login --username <user> --password <pass>` exchanges credentials for a token. Your IdP must support this grant type. -!!! warning - Password grant sends user credentials directly to the IdP. It bypasses MFA and is disabled by many production IdPs. Use it only for CI/testing. Prefer device flow for interactive users. + +Password grant sends user credentials directly to the IdP. It bypasses MFA and is disabled by many production IdPs. Use it only for CI/testing. Prefer device flow for interactive users. + ## Step-by-Step Configuration @@ -34,10 +37,10 @@ In your IdP (Azure AD, Okta, Keycloak, etc.): 1. Create a new application registration 2. Note the **client ID** 3. Enable **device flow** (device authorization grant) -4. *(Optional)* Create custom API scopes (`platform:read`, `platform:write`) if you want token-level scope restrictions. See [API Scopes](../authorization/api-scopes.md). +4. *(Optional)* Create custom API scopes (`platform:read`, `platform:write`) if you want token-level scope restrictions. See [API Scopes](/platform/authentication-authorization/authorization/api-scopes). 5. If you created custom scopes, grant admin consent for them (if required by your IdP) -### Step 2: Configure {{platform_name}} +### Step 2: Configure NeMo Platform Set the OIDC settings in your platform config under `auth.oidc`: @@ -59,7 +62,7 @@ auth: ### Step 3: Configure Scopes (Optional) -This step is only needed if you created custom API scopes in Step 1. If you skip scopes, {{platform_name}} still enforces RBAC — scopes add an additional token-level restriction on top. See [API Scopes](../authorization/api-scopes.md). +This step is only needed if you created custom API scopes in Step 1. If you skip scopes, NeMo Platform still enforces RBAC — scopes add an additional token-level restriction on top. See [API Scopes](/platform/authentication-authorization/authorization/api-scopes). If you are using scopes, configure the default scopes requested during login: @@ -69,7 +72,7 @@ auth: default_scopes: "openid profile email offline_access platform:read platform:write" ``` -If your IdP prefixes scopes (e.g., Azure AD uses `api://client-id/platform:read`), set `scope_prefix` so {{platform_name}} strips it before authorization: +If your IdP prefixes scopes (e.g., Azure AD uses `api://client-id/platform:read`), set `scope_prefix` so NeMo Platform strips it before authorization: ```yaml auth: @@ -98,8 +101,8 @@ nemo workspaces list | Field | Type | Description | |-------|------|-------------| | `enabled` | bool | Enable OIDC token validation. | -| `issuer` | string | IdP issuer URL (e.g., `https://login.microsoftonline.com//v2.0`). | -| `client_id` | string | OAuth client ID for {{platform_name}}. | +| `issuer` | string | IdP issuer URL (e.g., `https://login.microsoftonline.com/<tenant>/v2.0`). | +| `client_id` | string | OAuth client ID for NeMo Platform. | | `additional_issuers` | list | Extra issuer URLs to accept (e.g., Azure AD v1 format). | | `audience` | string | Expected token audience (defaults to `client_id`). | | `email_claim` | string | JWT claim for email (default: `email`). | @@ -122,7 +125,7 @@ Regardless of IdP, ensure: ## Claim mapping -{{platform_name}} maps JWT claims to [trusted identity headers](../security-model.md#trusted-identity-headers) using `email_claim`, `subject_claim`, and `groups_claim` (defaults: `email`, `sub`, `groups`). Override them when your IdP uses different claim names — values must match what you use for workspace members and authorization. +NeMo Platform maps JWT claims to [trusted identity headers](/platform/authentication-authorization/security-model#trusted-identity-headers) using `email_claim`, `subject_claim`, and `groups_claim` (defaults: `email`, `sub`, `groups`). Override them when your IdP uses different claim names — values must match what you use for workspace members and authorization. | Purpose | Config key | Default | Header | |---------|------------|---------|--------| @@ -131,17 +134,18 @@ Regardless of IdP, ensure: | Groups | `groups_claim` | `groups` | `X-NMP-Principal-Groups` | -!!! note - Group values are extracted from the JWT and included in the `X-NMP-Principal-Groups` header and OPA policy data, but **automatic group-to-role mapping is not yet implemented**. Role assignments must be made explicitly via the members API. See [Managing Access](../authorization/managing-access.md). + +Group values are extracted from the JWT and included in the `X-NMP-Principal-Groups` header and OPA policy data, but **automatic group-to-role mapping is not yet implemented**. Role assignments must be made explicitly via the members API. See [Managing Access](/platform/authentication-authorization/authorization/managing-access). + ## Provider-Specific Notes -- **Azure AD**: See [Azure AD Setup](providers/azure-ad.md) (issuer, `additional_issuers`, device flow, claim overrides). +- **Azure AD**: See [Azure AD Setup](/platform/authentication-authorization/authentication/providers/azure-ad-entra-id) (issuer, `additional_issuers`, device flow, claim overrides). - **Okta**: Choose the correct authorization server (custom vs org). Ensure device flow is enabled. - **Keycloak**: Configure realm, client, and role mappers. Enable device flow in client settings. -- **Other providers**: See [Generic OIDC Provider](providers/generic.md) for a provider-agnostic checklist. +- **Other providers**: See [Generic OIDC Provider](/platform/authentication-authorization/authentication/providers/generic-oidc) for a provider-agnostic checklist. ## Related -- [Using Authentication](using-authentication.md) — Log in, make API calls, and manage tokens. -- [Auth Configuration](../deployment/configuration.md) — Full auth config reference. +- [Using Authentication](/platform/authentication-authorization/authentication/using-authentication) — Log in, make API calls, and manage tokens. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Full auth config reference. diff --git a/docs/auth/authentication/providers/azure-ad.mdx b/docs/auth/authentication/providers/azure-ad.mdx index 35df44ebf2..56a07f9c1a 100644 --- a/docs/auth/authentication/providers/azure-ad.mdx +++ b/docs/auth/authentication/providers/azure-ad.mdx @@ -1,13 +1,15 @@ -# Azure AD (Entra ID) Setup +--- +title: "Azure AD (Entra ID) Setup" +description: "" +--- +Complete walkthrough for connecting NeMo Platform to Azure AD (Entra ID), from app registration to first successful login. -Complete walkthrough for connecting {{platform_name}} to Azure AD (Entra ID), from app registration to first successful login. - -**Prerequisites**: Access to Azure Portal with permission to create app registrations. Familiarity with [OIDC Setup](../oidc.md). +**Prerequisites**: Access to Azure Portal with permission to create app registrations. Familiarity with [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup). ## App Registration 1. In Azure Portal, go to **Azure Active Directory** → **App registrations** → **New registration**. -2. Name the application (e.g., "{{platform_name}}"). +2. Name the application (e.g., "NeMo Platform"). 3. Set **Supported account types** to your tenant configuration. 4. No redirect URI is needed for device flow, but setting one is good practice. 5. Note the **Application (client) ID** and **Directory (tenant) ID**. @@ -21,10 +23,10 @@ Complete walkthrough for connecting {{platform_name}} to Azure AD (Entra ID), fr ## Expose API Scopes 1. Go to **Expose an API**. -2. Set the **Application ID URI** (e.g., `api://`). +2. Set the **Application ID URI** (e.g., `api://<client-id>`). 3. Add scopes: - - `platform:read` — "Read access to {{platform_name}} platform resources" - - `platform:write` — "Write access to {{platform_name}} platform resources" + - `platform:read` — "Read access to NeMo Platform platform resources" + - `platform:write` — "Write access to NeMo Platform platform resources" 4. Go to **API permissions** → **Add a permission** → **My APIs** → select your app → add the scopes. 5. Click **Grant admin consent** for the scopes. @@ -34,7 +36,7 @@ Complete walkthrough for connecting {{platform_name}} to Azure AD (Entra ID), fr 2. Select **Security groups** (or the group types your organization uses). 3. For the **Access token**, select **Group ID**. -## {{platform_name}} Configuration +## NeMo Platform Configuration ```yaml auth: @@ -73,11 +75,11 @@ nemo auth status |-------|-------|-----| | AADSTS70011 | Scope not configured or no admin consent | Add scopes in "Expose an API" and grant admin consent | | AADSTS50011 | Reply URL mismatch | Not typically needed for device flow; check Authentication settings | -| Audience mismatch | `audience` doesn't match token's `aud` claim | Set `audience: "api://"` | +| Audience mismatch | `audience` doesn't match token's `aud` claim | Set `audience: "api://<client-id>"` | | Empty email claim | Azure AD didn't populate `email` | Use `email_claim: "upn"` instead | ## Related -- [OIDC Setup](../oidc.md) — Generic OIDC configuration. -- [OIDC Setup — Claim mapping](../oidc.md#claim-mapping) — JWT claims vs config defaults. -- [Auth Configuration](../../deployment/configuration.md) — Full config reference. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — Generic OIDC configuration. +- [OIDC Setup — Claim mapping](/platform/authentication-authorization/authentication/oidc-setup#claim-mapping) — JWT claims vs config defaults. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Full config reference. diff --git a/docs/auth/authentication/providers/generic.mdx b/docs/auth/authentication/providers/generic.mdx index 5a98c34422..41a63620ee 100644 --- a/docs/auth/authentication/providers/generic.mdx +++ b/docs/auth/authentication/providers/generic.mdx @@ -1,8 +1,10 @@ -# Generic OIDC Provider +--- +title: "Generic OIDC Provider" +description: "" +--- +A checklist for connecting NeMo Platform to any OIDC-compliant identity provider not covered by the [Azure AD](/platform/authentication-authorization/authentication/providers/azure-ad-entra-id) page. -A checklist for connecting {{platform_name}} to any OIDC-compliant identity provider not covered by the [Azure AD](azure-ad.md) page. - -**Prerequisites**: Familiarity with [OIDC Setup](../oidc.md). +**Prerequisites**: Familiarity with [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup). ## Provider Checklist @@ -46,6 +48,6 @@ auth: ## Related -- [OIDC Setup](../oidc.md) — Full OIDC configuration guide. -- [OIDC Setup — Claim mapping](../oidc.md#claim-mapping) — JWT claims vs config defaults. -- [Auth Configuration](../../deployment/configuration.md) — Full config reference. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — Full OIDC configuration guide. +- [OIDC Setup — Claim mapping](/platform/authentication-authorization/authentication/oidc-setup#claim-mapping) — JWT claims vs config defaults. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Full config reference. diff --git a/docs/auth/authentication/providers/index.mdx b/docs/auth/authentication/providers/index.mdx index 809ed1aa3c..9346fc45a1 100644 --- a/docs/auth/authentication/providers/index.mdx +++ b/docs/auth/authentication/providers/index.mdx @@ -1,3 +1,5 @@ -# Provider Guides - -Step-by-step setup instructions for specific identity providers. If your provider is not listed here, see the [Generic OIDC Provider](generic.md) checklist. +--- +title: "Provider Guides" +description: "" +--- +Step-by-step setup instructions for specific identity providers. If your provider is not listed here, see the [Generic OIDC Provider](/platform/authentication-authorization/authentication/providers/generic-oidc) checklist. diff --git a/docs/auth/authentication/using-authentication.mdx b/docs/auth/authentication/using-authentication.mdx index bece0ef611..a34acf8186 100644 --- a/docs/auth/authentication/using-authentication.mdx +++ b/docs/auth/authentication/using-authentication.mdx @@ -1,8 +1,10 @@ -# Using Authentication - +--- +title: "Using Authentication" +description: "" +--- How to log in, make authenticated API calls, and manage tokens with the CLI and SDK. -**Prerequisites**: OIDC must be configured on the platform. See [OIDC Setup](oidc.md). +**Prerequisites**: OIDC must be configured on the platform. See [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup). ## Log In @@ -42,14 +44,15 @@ By default, the CLI requests the scopes configured in `auth.oidc.default_scopes` nemo auth login --scope "platform:read" ``` -See [API Scopes](../authorization/api-scopes.md) for the full list of available scopes. +See [API Scopes](/platform/authentication-authorization/authorization/api-scopes) for the full list of available scopes. ### Non-Interactive Login (CI/CD) -For CI pipelines, use the password grant to obtain a token without a browser: `nemo auth login --username --password ` (or set `NMP_OIDC_USERNAME` / `NMP_OIDC_PASSWORD` environment variables). If your CI system can obtain tokens directly (e.g., workload identity federation), pass the token via `access_token` as shown in [Make API Calls](#make-api-calls) below. +For CI pipelines, use the password grant to obtain a token without a browser: `nemo auth login --username <user> --password <pass>` (or set `NMP_OIDC_USERNAME` / `NMP_OIDC_PASSWORD` environment variables). If your CI system can obtain tokens directly (e.g., workload identity federation), pass the token via `access_token` as shown in [Make API Calls](#make-api-calls) below. -!!! warning - Password grant sends credentials directly to the IdP and **bypasses MFA**. Many production IdPs disable it. Use a dedicated service account with minimal scopes where possible. + +Password grant sends credentials directly to the IdP and **bypasses MFA**. Many production IdPs disable it. Use a dedicated service account with minimal scopes where possible. + ## Make API Calls @@ -149,17 +152,18 @@ users: The OIDC token endpoint is **not** stored — it is discovered at runtime from your cluster's `/apis/auth/discovery` endpoint. This keeps the config portable across environments. -!!! warning - **Token storage security** — Access and refresh tokens are stored in plaintext. Protect this file: + +**Token storage security** — Access and refresh tokens are stored in plaintext. Protect this file: - - **File permissions**: Ensure `0600` (owner read/write only). The CLI sets this by default — verify after manual edits: `chmod 600 ~/.config/nmp/config.yaml`. - - **Shared directories**: Do not store in cloud-synced folders (Dropbox, OneDrive, Google Drive) or shared home directories. - - **Refresh token rotation**: Configure your IdP to rotate refresh tokens on each use. A stolen refresh token becomes invalid after the legitimate client uses it once. - - **Logout when done**: Run `nemo auth logout` on shared or temporary machines. +- **File permissions**: Ensure `0600` (owner read/write only). The CLI sets this by default — verify after manual edits: `chmod 600 ~/.config/nmp/config.yaml`. +- **Shared directories**: Do not store in cloud-synced folders (Dropbox, OneDrive, Google Drive) or shared home directories. +- **Refresh token rotation**: Configure your IdP to rotate refresh tokens on each use. A stolen refresh token becomes invalid after the legitimate client uses it once. +- **Logout when done**: Run `nemo auth logout` on shared or temporary machines. + ## Related -- [OIDC Setup](oidc.md) — Configure your identity provider. -- [API Scopes](../authorization/api-scopes.md) — Scope model and available scopes. -- [Security Model](../security-model.md) — Trust boundaries and the principal model. -- [Troubleshooting](../troubleshooting.md) — Fix common 401/403 errors and login failures. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — Configure your identity provider. +- [API Scopes](/platform/authentication-authorization/authorization/api-scopes) — Scope model and available scopes. +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and the principal model. +- [Troubleshooting](/platform/authentication-authorization/troubleshooting) — Fix common 401/403 errors and login failures. diff --git a/docs/auth/authorization/api-scopes.mdx b/docs/auth/authorization/api-scopes.mdx index 7eff6ff969..7411e417eb 100644 --- a/docs/auth/authorization/api-scopes.mdx +++ b/docs/auth/authorization/api-scopes.mdx @@ -1,8 +1,10 @@ -# API Scopes - +--- +title: "API Scopes" +description: "" +--- API scopes are token-level access restrictions that sit on top of role-based permissions. They control which parts of the API a token can access, independent of the user's role. -For role-based permissions, see [Roles & Permissions](roles-and-permissions.md). For the RBAC model, see [Authorization Concepts](../concepts.md). +For role-based permissions, see [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions). For the RBAC model, see [Authorization Concepts](/platform/authentication-authorization/concepts). ## How Scopes Work @@ -36,8 +38,9 @@ Each API has a read and write scope. A token with an API-specific scope can only | Safe Synthesizer | `safe-synthesizer:read` | `safe-synthesizer:write` | `/apis/safe-synthesizer/` | | Secrets | `secrets:read` | `secrets:write` | `/apis/secrets/` | -!!! note - The `entities:read` and `entities:write` scopes are for internal service-to-service usage. The generic entities API is not accessible to regular users — only PlatformAdmin and service principals can access it. Users interact with entities through feature-specific APIs (Models, Evaluation, etc.) which use their own scopes. + +The `entities:read` and `entities:write` scopes are for internal service-to-service usage. The generic entities API is not accessible to regular users — only PlatformAdmin and service principals can access it. Users interact with entities through feature-specific APIs (Models, Evaluation, etc.) which use their own scopes. + ### Platform Scopes @@ -87,7 +90,7 @@ This enables least-privilege tokens: an Editor can create a read-only token (`pl Scopes must be registered in your IdP as custom API scopes. The CLI requests them during the OAuth flow, and the IdP includes granted scopes in the access token. -If your IdP prefixes scopes (e.g., Azure AD uses `api://client-id/platform:read`), configure `scope_prefix` in the {{platform_name}} OIDC settings so the platform strips the prefix before authorization: +If your IdP prefixes scopes (e.g., Azure AD uses `api://client-id/platform:read`), configure `scope_prefix` in the NeMo Platform OIDC settings so the platform strips the prefix before authorization: ```yaml auth: @@ -116,7 +119,7 @@ The PDP distinguishes between OIDC standard scopes and platform scopes: ## Related -- [Roles & Permissions](roles-and-permissions.md) — Role-based permission model. -- [Using Authentication](../authentication/using-authentication.md) — Log in, requesting scopes, and token management. -- [OIDC Setup](../authentication/oidc.md) — Configuring scope prefix and default scopes. -- [Security Model](../security-model.md) — Trust boundaries and two-layer authorization. +- [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions) — Role-based permission model. +- [Using Authentication](/platform/authentication-authorization/authentication/using-authentication) — Log in, requesting scopes, and token management. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — Configuring scope prefix and default scopes. +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and two-layer authorization. diff --git a/docs/auth/authorization/index.mdx b/docs/auth/authorization/index.mdx index 10e43e3444..d73829b03c 100644 --- a/docs/auth/authorization/index.mdx +++ b/docs/auth/authorization/index.mdx @@ -1,6 +1,8 @@ -# Authorization - -{{platform_name}} authorization controls what authenticated users can do. Every API request is evaluated against the user's token scopes and role bindings before it is allowed. +--- +title: "Authorization" +description: "" +--- +NeMo Platform authorization controls what authenticated users can do. Every API request is evaluated against the user's token scopes and role bindings before it is allowed. The authorization model has four building blocks: @@ -15,37 +17,37 @@ Request → PDP → Scope check → Role binding check → Allow / Deny For a request to succeed, both the scope check (does the token allow it?) and the role check (does the user have permission?) must pass. -For the full conceptual background, see [Authorization Concepts](../concepts.md). For the security architecture, see [Security Model](../security-model.md). +For the full conceptual background, see [Authorization Concepts](/platform/authentication-authorization/concepts). For the security architecture, see [Security Model](/platform/authentication-authorization/security-model). ## Key Pages
-- **[Roles & Permissions](roles-and-permissions.md)** +- **[Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions)** --- Complete permission matrix — what each role can do. -- **[Managing Access](managing-access.md)** +- **[Managing Access](/platform/authentication-authorization/authorization/managing-access)** --- Add users to workspaces, assign roles, manage members. -- **[API Scopes](api-scopes.md)** +- **[API Scopes](/platform/authentication-authorization/authorization/api-scopes)** --- Token-level scope model and two-layer authorization. -- **[Permissions Reference](permissions-reference.md)** +- **[Permissions Reference](/platform/authentication-authorization/authorization/permissions-reference)** --- Complete list of all permissions with role assignments. -- **[Policy Engine](policy-engine.md)** +- **[Policy Engine](/platform/authentication-authorization/authorization/policy-engine)** --- diff --git a/docs/auth/authorization/managing-access.mdx b/docs/auth/authorization/managing-access.mdx index 9c373c7110..d30f34dcf5 100644 --- a/docs/auth/authorization/managing-access.mdx +++ b/docs/auth/authorization/managing-access.mdx @@ -1,38 +1,41 @@ -# Managing Access +--- +title: "Managing Access" +description: "" +--- +Add users to workspaces, assign roles, and control who can access your resources. For background on the authorization model, refer to [Authorization Concepts](/platform/authentication-authorization/concepts). For what each role can do, refer to [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions). -Add users to workspaces, assign roles, and control who can access your resources. For background on the authorization model, refer to [Authorization Concepts](../concepts.md). For what each role can do, refer to [Roles & Permissions](roles-and-permissions.md). - -!!! note - The SDK examples on this page use `NeMoPlatform()` with no arguments so that the client reads your active CLI context (set by `nemo auth login`). That is the right pattern for authorization workflows: you act as your logged-in identity and pass the workspace explicitly in each API call. For the standard local initialization pattern, see [CLI and SDK initialization](../../get-started/setup.md#setup-init). + +The SDK examples on this page use `NeMoPlatform()` with no arguments so that the client reads your active CLI context (set by `nemo auth login`). That is the right pattern for authorization workflows: you act as your logged-in identity and pass the workspace explicitly in each API call. For the standard local initialization pattern, see [CLI and SDK initialization](/get-started/setup#setup-init). + ## Creating Workspaces Workspaces are the primary authorization boundary — all resources belong to a workspace, and access is controlled at the workspace level. When you create a workspace, you automatically become its Admin. -Create separate workspaces to isolate teams (`ml-research`, `nlp-team`), environments (`dev`, `staging`, `prod`), or projects. For detailed workspace management, refer to [Workspaces](../../get-started/concepts/workspaces.md). - - -=== "CLI" - - ```bash - nemo workspaces create ml-team - - # Set the workspace as your default for subsequent commands - nemo config set --workspace ml-team - ``` +Create separate workspaces to isolate teams (`ml-research`, `nlp-team`), environments (`dev`, `staging`, `prod`), or projects. For detailed workspace management, refer to [Workspaces](/get-started/core-concepts/workspaces). -=== "Python SDK" - ```python - from nemo_platform import NeMoPlatform + + +```bash +nemo workspaces create ml-team - client = NeMoPlatform() +# Set the workspace as your default for subsequent commands +nemo config set --workspace ml-team +``` + + +```python +from nemo_platform import NeMoPlatform - workspace = client.workspaces.create( - name="ml-team", description="Machine learning team workspace" - ) - ``` +client = NeMoPlatform() +workspace = client.workspaces.create( + name="ml-team", description="Machine learning team workspace" +) +``` + + ## Managing Workspace Members Members are users who have been granted access to a workspace. Each member has one of three roles: @@ -41,154 +44,155 @@ Members are users who have been granted access to a workspace. Each member has o - **Editor** — Can create, modify, and delete resources - **Admin** — Full control, including managing members -!!! note - **Role Propagation** + +**Role Propagation** - When you add or change a member, the CLI and SDK wait for the change to propagate to the authorization engine before returning (up to 30 seconds). The member can use their new permissions immediately after the command completes. +When you add or change a member, the CLI and SDK wait for the change to propagate to the authorization engine before returning (up to 30 seconds). The member can use their new permissions immediately after the command completes. + ### Add a Member Grant someone access to a workspace by adding them as a member with a specific role. The principal is typically an email address that identifies the user in your identity provider. -=== "CLI" - - ```bash - nemo workspaces members create --principal alice@example.com --roles Editor --workspace ml-team - ``` - - ```json - { - "principal": "alice@example.com", - "roles": [ - "Editor" - ], - "granted_at": "2026-01-20T10:00:00Z", - "granted_by": "admin@example.com" - } - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform - - client = NeMoPlatform() - - # Add a member with Editor role - client.workspaces.members.create( - workspace="ml-team", principal="alice@example.com", roles=["Editor"] - ) - - # Add a member with Viewer role (read-only) - client.workspaces.members.create( - workspace="ml-team", principal="bob@example.com", roles=["Viewer"] - ) - - # Add a member with Admin role (full control) - client.workspaces.members.create( - workspace="ml-team", principal="charlie@example.com", roles=["Admin"] - ) - ``` + + +```bash +nemo workspaces members create --principal alice@example.com --roles Editor --workspace ml-team +``` +```json +{ + "principal": "alice@example.com", + "roles": [ + "Editor" + ], + "granted_at": "2026-01-20T10:00:00Z", + "granted_by": "admin@example.com" +} +``` + + +```python +from nemo_platform import NeMoPlatform + +client = NeMoPlatform() + +# Add a member with Editor role +client.workspaces.members.create( + workspace="ml-team", principal="alice@example.com", roles=["Editor"] +) + +# Add a member with Viewer role (read-only) +client.workspaces.members.create( + workspace="ml-team", principal="bob@example.com", roles=["Viewer"] +) + +# Add a member with Admin role (full control) +client.workspaces.members.create( + workspace="ml-team", principal="charlie@example.com", roles=["Admin"] +) +``` + + ### List Members View all members of a workspace to audit access or verify permissions. The response includes each member's principal, roles, and when access was granted. -=== "CLI" - - ```bash - nemo workspaces members list --workspace ml-team - ``` - - ```json - [ - { - "principal": "alice@example.com", - "roles": [ - "Editor" - ], - "granted_at": "2026-01-20T10:00:00Z", - "granted_by": "admin@example.com" - }, - { - "principal": "bob@example.com", - "roles": [ - "Viewer" - ], - "granted_at": "2026-01-20T10:01:00Z", - "granted_by": "admin@example.com" - }, - { - "principal": "charlie@example.com", - "roles": [ - "Admin" - ], - "granted_at": "2026-01-20T10:02:00Z", - "granted_by": "admin@example.com" - } - ] - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform - - client = NeMoPlatform() - - members = client.workspaces.members.list(workspace="ml-team") - - for member in members.data: - print(f"{member.principal}: {member.roles}") - ``` - -### Update Member Roles + + +```bash +nemo workspaces members list --workspace ml-team +``` -Change a member role to adjust their permissions, for example, promoting a Viewer to Editor when they need to create resources. +```json +[ + { + "principal": "alice@example.com", + "roles": [ + "Editor" + ], + "granted_at": "2026-01-20T10:00:00Z", + "granted_by": "admin@example.com" + }, + { + "principal": "bob@example.com", + "roles": [ + "Viewer" + ], + "granted_at": "2026-01-20T10:01:00Z", + "granted_by": "admin@example.com" + }, + { + "principal": "charlie@example.com", + "roles": [ + "Admin" + ], + "granted_at": "2026-01-20T10:02:00Z", + "granted_by": "admin@example.com" + } +] +``` + + +```python +from nemo_platform import NeMoPlatform +client = NeMoPlatform() -=== "CLI" +members = client.workspaces.members.list(workspace="ml-team") - ```bash - nemo workspaces members update bob@example.com --roles Editor --workspace ml-team - ``` +for member in members.data: + print(f"{member.principal}: {member.roles}") +``` + + +### Update Member Roles -=== "Python SDK" +Change a member role to adjust their permissions, for example, promoting a Viewer to Editor when they need to create resources. - ```python - from nemo_platform import NeMoPlatform - client = NeMoPlatform() + + +```bash +nemo workspaces members update bob@example.com --roles Editor --workspace ml-team +``` + + +```python +from nemo_platform import NeMoPlatform - # Promote a Viewer to Editor - client.workspaces.members.update( - workspace="ml-team", principal_id="bob@example.com", roles=["Editor"] - ) - ``` +client = NeMoPlatform() +# Promote a Viewer to Editor +client.workspaces.members.update( + workspace="ml-team", principal_id="bob@example.com", roles=["Editor"] +) +``` + + ### Remove a Member Revoke a member's access by removing them from the workspace. This removes all their role bindings in the workspace — they will no longer be able to access any resources unless re-added. -=== "CLI" - - ```bash - nemo workspaces members delete alice@example.com --workspace ml-team - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform - - client = NeMoPlatform() + + +```bash +nemo workspaces members delete alice@example.com --workspace ml-team +``` + + +```python +from nemo_platform import NeMoPlatform - client.workspaces.members.delete(workspace="ml-team", principal_id="alice@example.com") - ``` +client = NeMoPlatform() +client.workspaces.members.delete(workspace="ml-team", principal_id="alice@example.com") +``` + + ## Granting Access to All Users Use the wildcard principal `*` to grant a role to all authenticated users. This is useful for shared workspaces where you want broad access without adding each user individually. @@ -204,84 +208,85 @@ Common use cases: Grant the Viewer role to `*` so all authenticated users can view resources. -=== "CLI" - - ```bash - nemo workspaces members create --principal "*" --roles Viewer --workspace shared-models - ``` - - ```json - { - "principal": "*", - "roles": [ - "Viewer" - ], - "granted_at": "2026-01-20T10:05:00Z", - "granted_by": "admin@example.com" - } - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform + + +```bash +nemo workspaces members create --principal "*" --roles Viewer --workspace shared-models +``` - client = NeMoPlatform() +```json +{ + "principal": "*", + "roles": [ + "Viewer" + ], + "granted_at": "2026-01-20T10:05:00Z", + "granted_by": "admin@example.com" +} +``` + + +```python +from nemo_platform import NeMoPlatform - client.workspaces.members.create(workspace="shared-models", principal="*", roles=["Viewer"]) - ``` +client = NeMoPlatform() +client.workspaces.members.create(workspace="shared-models", principal="*", roles=["Viewer"]) +``` + + ### Make a Workspace Editable by Everyone Grant the Editor role to `*` so all authenticated users can create and modify resources. -=== "CLI" - - ```bash - nemo workspaces members create --principal "*" --roles Editor --workspace shared-datasets - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform - - client = NeMoPlatform() + + +```bash +nemo workspaces members create --principal "*" --roles Editor --workspace shared-datasets +``` + + +```python +from nemo_platform import NeMoPlatform - client.workspaces.members.create(workspace="shared-datasets", principal="*", roles=["Editor"]) - ``` +client = NeMoPlatform() +client.workspaces.members.create(workspace="shared-datasets", principal="*", roles=["Editor"]) +``` + + ### Remove Public Access Remove the wildcard binding to restrict the workspace to explicit members only. -=== "CLI" - - ```bash - nemo workspaces members delete "*" --workspace ml-team - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform - - client = NeMoPlatform() + + +```bash +nemo workspaces members delete "*" --workspace ml-team +``` + + +```python +from nemo_platform import NeMoPlatform - client.workspaces.members.delete(workspace="ml-team", principal_id="*") - ``` +client = NeMoPlatform() -!!! note - **Default Workspace Access** +client.workspaces.members.delete(workspace="ml-team", principal_id="*") +``` + + + +**Default Workspace Access** - The platform automatically grants wildcard access to built-in workspaces: +The platform automatically grants wildcard access to built-in workspaces: - - `default` workspace: All users have **Editor** access - - `system` workspace: All users have **Viewer** access (read-only) +- `default` workspace: All users have **Editor** access +- `system` workspace: All users have **Viewer** access (read-only) - This allows users to start working immediately without explicit role assignment. +This allows users to start working immediately without explicit role assignment. + ## Admin Protection @@ -293,37 +298,37 @@ Every workspace must have at least one Admin to prevent orphaned workspaces. The If you need to leave a workspace where you are the only Admin, add another Admin first: -=== "CLI" - - ```bash - # Add another admin first - nemo workspaces members create --principal charlie@example.com --roles Admin --workspace ml-team - - # Now you can remove yourself - nemo workspaces members delete alice@example.com --workspace ml-team - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform + + +```bash +# Add another admin first +nemo workspaces members create --principal charlie@example.com --roles Admin --workspace ml-team - client = NeMoPlatform() +# Now you can remove yourself +nemo workspaces members delete alice@example.com --workspace ml-team +``` + + +```python +from nemo_platform import NeMoPlatform - # Add another admin first - client.workspaces.members.create( - workspace="ml-team", principal="charlie@example.com", roles=["Admin"] - ) +client = NeMoPlatform() - # Now you can remove yourself - client.workspaces.members.delete(workspace="ml-team", principal_id="alice@example.com") - ``` +# Add another admin first +client.workspaces.members.create( + workspace="ml-team", principal="charlie@example.com", roles=["Admin"] +) +# Now you can remove yourself +client.workspaces.members.delete(workspace="ml-team", principal_id="alice@example.com") +``` + + ## Platform Admin Access The **PlatformAdmin** role (set using `admin_email` in config) has full access to all workspaces and bypasses authorization checks. PlatformAdmin is typically used for initial platform setup, creating the first workspaces and granting Admin roles to team leads. After bootstrap, day-to-day access management should use workspace-level members (above). -For details on configuring the platform admin, refer to [Auth Configuration](../deployment/configuration.md). For the full security implications, refer to [Security Model](../security-model.md). +For details on configuring the platform admin, refer to [Auth Configuration](/platform/authentication-authorization/deployment/configuration). For the full security implications, refer to [Security Model](/platform/authentication-authorization/security-model). ## Deleting Workspaces @@ -338,32 +343,33 @@ Admins can delete workspaces they manage. However, a workspace cannot be deleted Delete all resources in the workspace before deleting the workspace itself: -=== "CLI" - - ```bash - # List and delete projects first - nemo projects list --workspace ml-team - nemo projects delete my-project --workspace ml-team - - # Then delete the workspace - nemo workspaces delete ml-team - ``` - -=== "Python SDK" + + +```bash +# List and delete projects first +nemo projects list --workspace ml-team +nemo projects delete my-project --workspace ml-team - ```python - from nemo_platform import NeMoPlatform, ConflictError - - client = NeMoPlatform() - - try: - client.workspaces.delete("ml-team") - except ConflictError as e: - print(f"Cannot delete workspace: {e}") - # Delete resources first, then retry - projects = client.projects.list(workspace="ml-team") - for project in projects.data: - client.projects.delete(project.name, workspace="ml-team") - # Now delete the workspace - client.workspaces.delete("ml-team") - ``` +# Then delete the workspace +nemo workspaces delete ml-team +``` + + +```python +from nemo_platform import NeMoPlatform, ConflictError + +client = NeMoPlatform() + +try: +client.workspaces.delete("ml-team") +except ConflictError as e: +print(f"Cannot delete workspace: {e}") +# Delete resources first, then retry +projects = client.projects.list(workspace="ml-team") +for project in projects.data: +client.projects.delete(project.name, workspace="ml-team") +# Now delete the workspace +client.workspaces.delete("ml-team") +``` + + \ No newline at end of file diff --git a/docs/auth/authorization/permissions-reference.mdx b/docs/auth/authorization/permissions-reference.mdx index 615e7a3966..3ef6194594 100644 --- a/docs/auth/authorization/permissions-reference.mdx +++ b/docs/auth/authorization/permissions-reference.mdx @@ -1,13 +1,16 @@ -(permissions-reference)= +--- +title: "Permissions Reference" +description: "" +--- + # Permissions Reference -Complete reference of all permissions across the NeMo Platform APIs. Each permission controls access to a specific operation within an individual API. Permissions are assigned to users through [roles](roles-and-permissions.md). +Complete reference of all permissions across the NeMo Platform APIs. Each permission controls access to a specific operation within an individual API. Permissions are assigned to users through [roles](/platform/authentication-authorization/authorization/roles-and-permissions). -For token-level access restrictions, see [API Scopes](api-scopes.md). For the RBAC model, see [Authorization Concepts](../concepts.md). +For token-level access restrictions, see [API Scopes](/platform/authentication-authorization/authorization/api-scopes). For the RBAC model, see [Authorization Concepts](/platform/authentication-authorization/concepts). -!!! note - PlatformAdmin is omitted — it bypasses permission checks entirely at the policy level. +PlatformAdmin is omitted — it bypasses permission checks entirely at the policy level. ## Entities API @@ -137,7 +140,7 @@ For token-level access restrictions, see [API Scopes](api-scopes.md). For the RB ## Related -- [Roles & Permissions](roles-and-permissions.md) — Role descriptions and hierarchy. -- [API Scopes](api-scopes.md) — Token-level scope restrictions. -- [Authorization Concepts](../concepts.md) — Workspaces, roles, bindings, and the RBAC model. -- [Security Model](../security-model.md) — Trust boundaries and authorization layers. +- [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions) — Role descriptions and hierarchy. +- [API Scopes](/platform/authentication-authorization/authorization/api-scopes) — Token-level scope restrictions. +- [Authorization Concepts](/platform/authentication-authorization/concepts) — Workspaces, roles, bindings, and the RBAC model. +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and authorization layers. diff --git a/docs/auth/authorization/policy-engine.mdx b/docs/auth/authorization/policy-engine.mdx index a78ab2f4ad..4faf28bb3b 100644 --- a/docs/auth/authorization/policy-engine.mdx +++ b/docs/auth/authorization/policy-engine.mdx @@ -1,8 +1,10 @@ -# Policy Engine +--- +title: "Policy Engine" +description: "" +--- +The Policy Decision Point (PDP) evaluates every authorization request in NeMo Platform. It checks role bindings and scopes against the operation's requirements and returns allow or deny. This page covers the PDP internals, configuration, and operational details. -The Policy Decision Point (PDP) evaluates every authorization request in {{platform_name}}. It checks role bindings and scopes against the operation's requirements and returns allow or deny. This page covers the PDP internals, configuration, and operational details. - -For the conceptual overview, see [Authorization Concepts](../concepts.md). For configuration, see [Auth Configuration](../deployment/configuration.md). +For the conceptual overview, see [Authorization Concepts](/platform/authentication-authorization/concepts). For configuration, see [Auth Configuration](/platform/authentication-authorization/deployment/configuration). ## How the PDP Works @@ -84,7 +86,7 @@ Use external OPA when you: - Already run OPA for other services and want a single policy engine - Need gateway-level auth via Envoy `ext_authz` with gRPC -- Want to add custom policy rules alongside {{platform_name}} authorization +- Want to add custom policy rules alongside NeMo Platform authorization - Prefer to manage OPA's lifecycle separately ### Bundle Caching and Propagation Delay @@ -152,7 +154,7 @@ Loaded from the entity store: ## Related -- [Auth Configuration](../deployment/configuration.md) — PDP provider settings and environment variables. -- [Authorization Concepts](../concepts.md) — RBAC model and role propagation. -- [API Scopes](api-scopes.md) — Scope checking in the PDP. -- [Security Model](../security-model.md) — Architecture and trust boundaries. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — PDP provider settings and environment variables. +- [Authorization Concepts](/platform/authentication-authorization/concepts) — RBAC model and role propagation. +- [API Scopes](/platform/authentication-authorization/authorization/api-scopes) — Scope checking in the PDP. +- [Security Model](/platform/authentication-authorization/security-model) — Architecture and trust boundaries. diff --git a/docs/auth/authorization/roles-and-permissions.mdx b/docs/auth/authorization/roles-and-permissions.mdx index 5a63a74ce6..0dd8a5df2c 100644 --- a/docs/auth/authorization/roles-and-permissions.mdx +++ b/docs/auth/authorization/roles-and-permissions.mdx @@ -1,10 +1,12 @@ -# Roles and Permissions - -The authoritative reference for {{platform_name}} roles and their permissions. For background on how RBAC works, see [Authorization Concepts](../concepts.md). For managing workspace members, see [Managing Access](managing-access.md). +--- +title: "Roles and Permissions" +description: "" +--- +The authoritative reference for NeMo Platform roles and their permissions. For background on how RBAC works, see [Authorization Concepts](/platform/authentication-authorization/concepts). For managing workspace members, see [Managing Access](/platform/authentication-authorization/authorization/managing-access). ## Role Descriptions -{{platform_name}} provides four predefined roles, each designed for a specific user persona: +NeMo Platform provides four predefined roles, each designed for a specific user persona: **Viewer** — For stakeholders who need visibility into resources but should not modify them. @@ -28,7 +30,7 @@ The authoritative reference for {{platform_name}} roles and their permissions. F - Grant wildcard access (`*`) to the workspace - Change workspace visibility -**PlatformAdmin** — For platform operators who manage the entire {{platform_name}} deployment. This role bypasses all workspace-level authorization. +**PlatformAdmin** — For platform operators who manage the entire NeMo Platform deployment. This role bypasses all workspace-level authorization. - All Admin permissions across every workspace - Access all workspaces regardless of role bindings @@ -64,8 +66,7 @@ Rows are operations; columns are roles. A checkmark indicates the role has permi | Add / remove members | | | ✓ | ✓ | | Change workspace visibility | | | ✓ | ✓ | -!!! note - All authenticated users can create workspaces. The creator automatically becomes Admin. +All authenticated users can create workspaces. The creator automatically becomes Admin. ### Resource Operations (Models, Datasets, Projects) @@ -115,7 +116,7 @@ Example: ## Default Workspace Bindings -{{platform_name}} automatically provisions wildcard bindings on built-in workspaces: +NeMo Platform automatically provisions wildcard bindings on built-in workspaces: | Workspace | Wildcard Role | Effect | |-----------|--------------|--------| @@ -133,7 +134,7 @@ To leave a workspace where you are the only Admin, add another Admin first. ## Related -- [Authorization Concepts](../concepts.md) — Workspaces, roles, bindings, and the RBAC model. -- [Managing Access](managing-access.md) — Add users, assign roles, manage workspace members. -- [API Scopes](api-scopes.md) — Token-level scope restrictions. -- [Security Model](../security-model.md) — Trust boundaries and authorization layers. +- [Authorization Concepts](/platform/authentication-authorization/concepts) — Workspaces, roles, bindings, and the RBAC model. +- [Managing Access](/platform/authentication-authorization/authorization/managing-access) — Add users, assign roles, manage workspace members. +- [API Scopes](/platform/authentication-authorization/authorization/api-scopes) — Token-level scope restrictions. +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and authorization layers. diff --git a/docs/auth/concepts.mdx b/docs/auth/concepts.mdx index 6922e1b2ba..98af1e7d91 100644 --- a/docs/auth/concepts.mdx +++ b/docs/auth/concepts.mdx @@ -1,8 +1,10 @@ -# Authorization Concepts +--- +title: "Authorization Concepts" +description: "" +--- +NeMo Platform uses **workspace-scoped RBAC**: resources live in workspaces, users get roles (Viewer, Editor, Admin) per workspace, and a policy engine evaluates every request against role bindings and token scopes. -{{platform_name}} uses **workspace-scoped RBAC**: resources live in workspaces, users get roles (Viewer, Editor, Admin) per workspace, and a policy engine evaluates every request against role bindings and token scopes. - -For the security architecture and trust boundaries, see [Security Model](security-model.md). For hands-on setup, see [Managing Access](authorization/managing-access.md). +For the security architecture and trust boundaries, see [Security Model](/platform/authentication-authorization/security-model). For hands-on setup, see [Managing Access](/platform/authentication-authorization/authorization/managing-access). ## Authorization Model Overview @@ -11,7 +13,7 @@ The model has four building blocks: 1. **Workspaces** — containers that own resources (models, datasets, jobs) 2. **Roles** — permission bundles (Viewer, Editor, Admin) granted to users per workspace 3. **Role bindings** — the link between a user, a role, and a workspace -4. **Scopes** — token-level restrictions on top of role permissions (see [API Scopes](authorization/api-scopes.md)) +4. **Scopes** — token-level restrictions on top of role permissions (see [API Scopes](/platform/authentication-authorization/authorization/api-scopes)) ## Workspaces @@ -68,7 +70,7 @@ PlatformAdmin (all operations across all workspaces) Viewer (list, read, run inference) ``` -Custom roles can be defined at deployment time with arbitrary permission sets — they do not need to follow this hierarchy. For the complete permission matrix, see [Roles & Permissions](authorization/roles-and-permissions.md). +Custom roles can be defined at deployment time with arbitrary permission sets — they do not need to follow this hierarchy. For the complete permission matrix, see [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions). ### Predefined Roles @@ -104,7 +106,7 @@ Platform admins can: - Create and delete any workspace - Bypass all authorization checks -The PlatformAdmin role is granted via the `admin_email` config setting. See [Auth Configuration](deployment/configuration.md). +The PlatformAdmin role is granted via the `admin_email` config setting. See [Auth Configuration](/platform/authentication-authorization/deployment/configuration). ## Role Bindings @@ -144,7 +146,7 @@ Creating a workspace is a special operation: ### Default Workspaces -{{platform_name}} automatically provisions role bindings for the wildcard principal `*` on certain workspaces: +NeMo Platform automatically provisions role bindings for the wildcard principal `*` on certain workspaces: - **`default`**: All authenticated users have **Editor** role, allowing everyone to create and manage resources immediately - **`system`**: All authenticated users have **Viewer** role, providing read-only access to system-level resources @@ -162,7 +164,7 @@ Every authorized request is evaluated by the PDP, which checks role bindings and - **Embedded** (default): A WASM-based policy engine built into the auth service. No external dependencies. - **External OPA**: An Open Policy Agent instance (sidecar or standalone service) fetches policy bundles from the auth service. -For technical details on the policy engine, see [Policy Engine](authorization/policy-engine.md). For configuration, see [Auth Configuration](deployment/configuration.md). +For technical details on the policy engine, see [Policy Engine](/platform/authentication-authorization/authorization/policy-engine). For configuration, see [Auth Configuration](/platform/authentication-authorization/deployment/configuration). ### Role Propagation Delay @@ -170,8 +172,8 @@ When you add or remove a member, the change takes effect after the next policy d ## Related -- [Security Model](security-model.md) — Architecture, trust boundaries, and authorization layers. -- [Roles & Permissions](authorization/roles-and-permissions.md) — Complete permission matrix. -- [API Scopes](authorization/api-scopes.md) — Token-level scope restrictions. -- [Managing Access](authorization/managing-access.md) — Add users to workspaces and assign roles. -- [Policy Engine](authorization/policy-engine.md) — OPA/WASM policy engine internals. +- [Security Model](/platform/authentication-authorization/security-model) — Architecture, trust boundaries, and authorization layers. +- [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions) — Complete permission matrix. +- [API Scopes](/platform/authentication-authorization/authorization/api-scopes) — Token-level scope restrictions. +- [Managing Access](/platform/authentication-authorization/authorization/managing-access) — Add users to workspaces and assign roles. +- [Policy Engine](/platform/authentication-authorization/authorization/policy-engine) — OPA/WASM policy engine internals. diff --git a/docs/auth/deployment/configuration.mdx b/docs/auth/deployment/configuration.mdx index 2cba801685..03b8ed639d 100644 --- a/docs/auth/deployment/configuration.mdx +++ b/docs/auth/deployment/configuration.mdx @@ -1,8 +1,10 @@ -# Configuration Reference - +--- +title: "Configuration Reference" +description: "" +--- Complete reference for enabling and configuring platform authorization: the `auth` section in config, Helm values, environment variables, and the choice between embedded and external OPA. -For quickstart setup, see [Authentication and Authorization](../index.md). For OIDC settings, see [OIDC Setup](../authentication/oidc.md). +For quickstart setup, see [Authentication and Authorization](/platform/authentication-authorization/overview). For OIDC settings, see [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup). ## Enabling Authorization @@ -27,7 +29,7 @@ When `auth.enabled` is `false` (the default), all API requests are allowed witho ### Bootstrap Admin -When authorization is enabled, a platform administrator can be configured. Setting **`admin_email`** gives that identity the **PlatformAdmin** role at platform start. Use it to create the first workspaces and grant roles to other users. After bootstrap, manage access via workspaces and members as described in [Managing Access](../authorization/managing-access.md). +When authorization is enabled, a platform administrator can be configured. Setting **`admin_email`** gives that identity the **PlatformAdmin** role at platform start. Use it to create the first workspaces and grant roles to other users. After bootstrap, manage access via workspaces and members as described in [Managing Access](/platform/authentication-authorization/authorization/managing-access). ```yaml auth: @@ -35,13 +37,13 @@ auth: admin_email: "your-admin@company.com" ``` -For a complete reference of all `auth` fields and their defaults, see the [platform configuration reference](../../set-up/config-reference.md). Auth-related values are found under `platformConfig.auth` in the values file. +For a complete reference of all `auth` fields and their defaults, see the [platform configuration reference](/reference/config-reference). Auth-related values are found under `platformConfig.auth` in the values file. -For OIDC-specific fields (`auth.oidc`), see [OIDC Setup](../authentication/oidc.md). +For OIDC-specific fields (`auth.oidc`), see [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup). ## Authorization Engine: Embedded vs External OPA -The PDP can run in two modes. For technical details, see [Policy Engine](../authorization/policy-engine.md). +The PDP can run in two modes. For technical details, see [Policy Engine](/platform/authentication-authorization/authorization/policy-engine). ### Embedded (default) @@ -121,8 +123,8 @@ auth: ## Related -- [Authentication and Authorization](../index.md) — Overview, auth methods, and getting started. -- [OIDC Setup](../authentication/oidc.md) — IdP configuration and CLI login. -- [Gateway Integration](gateway.md) — Using a gateway for authorization. -- [Managing Access](../authorization/managing-access.md) — Workspaces and member management. -- [Policy Engine](../authorization/policy-engine.md) — PDP internals and configuration. +- [Authentication and Authorization](/platform/authentication-authorization/overview) — Overview, auth methods, and getting started. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — IdP configuration and CLI login. +- [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration) — Using a gateway for authorization. +- [Managing Access](/platform/authentication-authorization/authorization/managing-access) — Workspaces and member management. +- [Policy Engine](/platform/authentication-authorization/authorization/policy-engine) — PDP internals and configuration. diff --git a/docs/auth/deployment/credential-propagation.mdx b/docs/auth/deployment/credential-propagation.mdx index 00f289a28a..72d2632601 100644 --- a/docs/auth/deployment/credential-propagation.mdx +++ b/docs/auth/deployment/credential-propagation.mdx @@ -1,19 +1,21 @@ -# Credential Propagation - -How credentials flow through the system when {{platform_name}} runs jobs or serves inference. +--- +title: "Credential Propagation" +description: "" +--- +How credentials flow through the system when NeMo Platform runs jobs or serves inference. ## Job Credential Propagation -When a user submits a job (customization, evaluation, data generation), the job runs in a Kubernetes pod that needs to call {{platform_name}} APIs — to download datasets, upload results, and read secrets. The platform propagates the submitting user's identity into the job container so it operates with the user's permissions, not elevated service credentials. +When a user submits a job (customization, evaluation, data generation), the job runs in a Kubernetes pod that needs to call NeMo Platform APIs — to download datasets, upload results, and read secrets. The platform propagates the submitting user's identity into the job container so it operates with the user's permissions, not elevated service credentials. The flow: 1. The user submits a job via the API. The platform records the user's principal identity. 2. The platform creates a Kubernetes job with the `NMP_PRINCIPAL` environment variable set to the submitting user's identity. 3. Secrets needed by the job are fetched on behalf of the user (the platform checks that the user has access before injecting them). -4. The job container uses the propagated principal to authenticate API calls back to {{platform_name}}. +4. The job container uses the propagated principal to authenticate API calls back to NeMo Platform. -Job containers need to run inside the trust boundary so that their `X-NMP-Principal-*` headers are accepted by downstream services. Network policies and gateway configuration enforce this boundary. For the full architecture, see [Security Model](../security-model.md#job-credential-propagation). +Job containers need to run inside the trust boundary so that their `X-NMP-Principal-*` headers are accepted by downstream services. Network policies and gateway configuration enforce this boundary. For the full architecture, see [Security Model](/platform/authentication-authorization/security-model#job-credential-propagation). ## Inference Auth Context @@ -21,12 +23,13 @@ When a model is deployed as an inference endpoint, incoming requests are authent ## Trust Implications -The `NMP_PRINCIPAL` environment variable is trusted by {{platform_name}} services. If a user can exec into a job pod and read this variable, they have the submitting user's identity for the duration of the job. +The `NMP_PRINCIPAL` environment variable is trusted by NeMo Platform services. If a user can exec into a job pod and read this variable, they have the submitting user's identity for the duration of the job. -!!! warning - Ensure Kubernetes RBAC prevents unauthorized access to job pods. Network policies should restrict which pods can reach {{platform_name}} internal endpoints. + +Ensure Kubernetes RBAC prevents unauthorized access to job pods. Network policies should restrict which pods can reach NeMo Platform internal endpoints. + ## Related -- [Security Model](../security-model.md) — Trust boundaries and job credential propagation. -- [Auth Configuration](configuration.md) — Platform auth configuration. +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and job credential propagation. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Platform auth configuration. diff --git a/docs/auth/deployment/gateway.mdx b/docs/auth/deployment/gateway.mdx index dc3a2b2ed1..8cb73cb490 100644 --- a/docs/auth/deployment/gateway.mdx +++ b/docs/auth/deployment/gateway.mdx @@ -1,12 +1,14 @@ -# Gateway Integration +--- +title: "Gateway Integration" +description: "" +--- +In production, a gateway (reverse proxy, ingress controller, or service mesh) often sits in front of the NeMo Platform. This page explains how authorization works with and without gateway-level auth, what headers the gateway must set, and which paths skip authorization. -In production, a gateway (reverse proxy, ingress controller, or service mesh) often sits in front of the {{platform_name}}. This page explains how authorization works with and without gateway-level auth, what headers the gateway must set, and which paths skip authorization. - -For the security architecture, see [Security Model](../security-model.md). +For the security architecture, see [Security Model](/platform/authentication-authorization/security-model). ## Overview -{{platform_name}}'s authorization middleware runs **inside each service**. Every request is evaluated there unless auth is disabled or the request matches a bypass path. Optionally, the **gateway** can perform the authorization check (e.g., via Envoy `ext_authz`) and forward the request with a special header so that services **trust the gateway's decision** and do not call the PDP again. That reduces latency and centralizes auth at the edge. +NeMo Platform's authorization middleware runs **inside each service**. Every request is evaluated there unless auth is disabled or the request matches a bypass path. Optionally, the **gateway** can perform the authorization check (e.g., via Envoy `ext_authz`) and forward the request with a special header so that services **trust the gateway's decision** and do not call the PDP again. That reduces latency and centralizes auth at the edge. ## Two Authorization Models @@ -23,15 +25,16 @@ For the security architecture, see [Security Model](../security-model.md). - Services see `x-nmp-authorized: true` and the principal headers, and **skip** their own PDP call. - **Benefit**: One auth check per request at the edge; lower latency and fewer PDP calls. -To use gateway-level auth you must configure your gateway to call the {{platform_name}} PDP and set the headers described below on allowed requests. +To use gateway-level auth you must configure your gateway to call the NeMo Platform PDP and set the headers described below on allowed requests. -!!! warning - **Security Requirement**: Your ingress/gateway **must** strip the following headers from all incoming external requests before forwarding to {{platform_name}}: + +**Security Requirement**: Your ingress/gateway **must** strip the following headers from all incoming external requests before forwarding to NeMo Platform: - - `X-NMP-Principal-Id`, `X-NMP-Principal-Email`, `X-NMP-Principal-Groups`, `X-NMP-Principal-On-Behalf-Of` - - `X-NMP-Authorized`, `X-NMP-Scopes` +- `X-NMP-Principal-Id`, `X-NMP-Principal-Email`, `X-NMP-Principal-Groups`, `X-NMP-Principal-On-Behalf-Of` +- `X-NMP-Authorized`, `X-NMP-Scopes` - If external clients can set these headers, they can forge any identity or bypass authorization entirely. The gateway should also block external access to `/internal/*` paths (used for service-to-service communication). +If external clients can set these headers, they can forge any identity or bypass authorization entirely. The gateway should also block external access to `/internal/*` paths (used for service-to-service communication). + ## Required Headers (Gateway-Level Auth) @@ -61,73 +64,71 @@ Configure the gateway so these paths are not sent to the PDP (or are always allo ### Envoy `ext_authz` -??? "Envoy ext_authz filter configuration" - :icon: gear - - Replace placeholder values before applying this configuration. - - ```yaml - http_filters: - - name: envoy.filters.http.jwt_authn - typed_config: - "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication - providers: - : - issuer: "" - remote_jwks: - http_uri: - uri: "" - cluster: - timeout: 5s - cache_duration: 600s - claim_to_headers: - - header_name: "X-NMP-Principal-Id" - claim_name: "sub" - - header_name: "X-NMP-Principal-Email" - claim_name: "email" - rules: - - match: - prefix: "/" - requires: - provider_name: "" - - name: envoy.filters.http.ext_authz - typed_config: - "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz - grpc_service: - envoy_grpc: - cluster_name: - transport_api_version: V3 - failure_mode_allow: false - - name: envoy.filters.http.router - typed_config: - "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router - ``` + +Replace placeholder values before applying this configuration. + +```yaml +http_filters: + - name: envoy.filters.http.jwt_authn + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication + providers: + : + issuer: "" + remote_jwks: + http_uri: + uri: "" + cluster: + timeout: 5s + cache_duration: 600s + claim_to_headers: + - header_name: "X-NMP-Principal-Id" + claim_name: "sub" + - header_name: "X-NMP-Principal-Email" + claim_name: "email" + rules: + - match: + prefix: "/" + requires: + provider_name: "" + - name: envoy.filters.http.ext_authz + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz + grpc_service: + envoy_grpc: + cluster_name: + transport_api_version: V3 + failure_mode_allow: false + - name: envoy.filters.http.router + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router +``` + ### Header Stripping -Configure your gateway to remove {{platform_name}} auth headers from incoming external requests. This prevents clients from forging identities. - -??? "Envoy header stripping" - :icon: shield - - Add `request_headers_to_remove` to your route configuration: - - ```yaml - route_config: - virtual_hosts: - - name: nmp_service - domains: ["*"] - request_headers_to_remove: - - "x-nmp-principal-id" - - "x-nmp-principal-email" - - "x-nmp-principal-groups" - - "x-nmp-principal-on-behalf-of" - - "x-nmp-scopes" - - "x-nmp-authorized" - routes: - - match: { prefix: "/" } - route: { cluster: nmp_backend } - ``` +Configure your gateway to remove NeMo Platform auth headers from incoming external requests. This prevents clients from forging identities. + + +Add `request_headers_to_remove` to your route configuration: + +```yaml +route_config: +virtual_hosts: +- name: nmp_service +domains: ["*"] +request_headers_to_remove: +- "x-nmp-principal-id" +- "x-nmp-principal-email" +- "x-nmp-principal-groups" +- "x-nmp-principal-on-behalf-of" +- "x-nmp-scopes" +- "x-nmp-authorized" +routes: +- match: { prefix: "/" } +route: { cluster: nmp_backend } +``` + ## Testing Gateway Auth @@ -138,6 +139,6 @@ After configuring gateway-level auth, verify: ## Related -- [Auth Configuration](configuration.md) — Enabling auth and PDP provider (embedded vs OPA). -- [Security Model](../security-model.md) — Trust boundaries and gateway trust model. -- [Production Hardening](hardening.md) — Security checklist including gateway requirements. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Enabling auth and PDP provider (embedded vs OPA). +- [Security Model](/platform/authentication-authorization/security-model) — Trust boundaries and gateway trust model. +- [Production Hardening](/platform/authentication-authorization/deployment/production-hardening) — Security checklist including gateway requirements. diff --git a/docs/auth/deployment/hardening.mdx b/docs/auth/deployment/hardening.mdx index c6a10dc7e9..c29eb57d7d 100644 --- a/docs/auth/deployment/hardening.mdx +++ b/docs/auth/deployment/hardening.mdx @@ -1,41 +1,43 @@ -# Production Hardening +--- +title: "Production Hardening" +description: "" +--- +Security checklist for deploying NeMo Platform to production. Work through each section and verify your deployment meets these requirements. -Security checklist for deploying {{platform_name}} to production. Work through each section and verify your deployment meets these requirements. - -For the security architecture, see [Security Model](../security-model.md). For configuration details, see [Auth Configuration](configuration.md). +For the security architecture, see [Security Model](/platform/authentication-authorization/security-model). For configuration details, see [Auth Configuration](/platform/authentication-authorization/deployment/configuration). ## Authentication - [ ] **Enable auth**: Set `auth.enabled: true` in platform config. -- [ ] **Configure OIDC**: Connect a production identity provider. See [OIDC Setup](../authentication/oidc.md). -- [ ] **Disable password grant in production**: Password grant bypasses MFA. If your IdP supports it, disable the resource owner password grant for the {{platform_name}} application registration. Restrict it to dedicated service accounts if CI/CD requires it. +- [ ] **Configure OIDC**: Connect a production identity provider. See [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup). +- [ ] **Disable password grant in production**: Password grant bypasses MFA. If your IdP supports it, disable the resource owner password grant for the NeMo Platform application registration. Restrict it to dedicated service accounts if CI/CD requires it. - [ ] **Set `admin_email` to a real platform admin**: Use a specific person's email, not a shared mailbox. The PlatformAdmin role bypasses all authorization checks. - [ ] **Verify token lifetime**: Check your IdP's access token and refresh token lifetimes. Shorter access token lifetimes (1 hour or less) reduce the impact of token theft. -- [ ] **Review additional issuers**: If `additional_issuers` is configured, verify all listed issuers are trusted. Each issuer can produce tokens that {{platform_name}} will accept. +- [ ] **Review additional issuers**: If `additional_issuers` is configured, verify all listed issuers are trusted. Each issuer can produce tokens that NeMo Platform will accept. ## Authorization - [ ] **Review default workspace bindings**: The `default` workspace grants Editor to `*` (all authenticated users). If your deployment requires tighter control, restrict this after bootstrap. - [ ] **Restrict PlatformAdmin**: Only one email should have PlatformAdmin. This role bypasses all authorization — treat it like a root account. -- [ ] **Use scoped tokens for CI/CD**: Request `platform:read` only for pipelines that don't need to modify resources. See [API Scopes](../authorization/api-scopes.md). -- [ ] **Audit workspace access**: Periodically review workspace members (`nemo workspaces members list --workspace `) and remove stale access. +- [ ] **Use scoped tokens for CI/CD**: Request `platform:read` only for pipelines that don't need to modify resources. See [API Scopes](/platform/authentication-authorization/authorization/api-scopes). +- [ ] **Audit workspace access**: Periodically review workspace members (`nemo workspaces members list --workspace <name>`) and remove stale access. - [ ] **Use wildcard bindings carefully**: Only grant `*` (all users) a role when you intentionally want shared access. Prefer Viewer over Editor for public workspaces. ## Gateway and Network -- [ ] **Strip auth headers from external requests**: Configure your ingress/gateway to remove `X-NMP-Principal-Id`, `X-NMP-Principal-Email`, `X-NMP-Principal-Groups`, `X-NMP-Principal-On-Behalf-Of`, `X-NMP-Authorized`, and `X-NMP-Scopes` from all incoming external traffic. See [Gateway Integration](gateway.md). +- [ ] **Strip auth headers from external requests**: Configure your ingress/gateway to remove `X-NMP-Principal-Id`, `X-NMP-Principal-Email`, `X-NMP-Principal-Groups`, `X-NMP-Principal-On-Behalf-Of`, `X-NMP-Authorized`, and `X-NMP-Scopes` from all incoming external traffic. See [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration). - [ ] **Enable TLS termination**: Terminate TLS at the ingress or load balancer. Tokens in `Authorization` headers are sent in the clear without TLS. -- [ ] **Consider gateway-level auth**: For reduced latency and centralized authorization, configure Envoy `ext_authz` to call the PDP at the edge. See [Gateway Integration](gateway.md). +- [ ] **Consider gateway-level auth**: For reduced latency and centralized authorization, configure Envoy `ext_authz` to call the PDP at the edge. See [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration). ## Policy Engine -- [ ] **Choose the right PDP provider**: Use embedded (default) for new deployments. Use external OPA if you already run OPA for other services. See [Policy Engine](../authorization/policy-engine.md). +- [ ] **Choose the right PDP provider**: Use embedded (default) for new deployments. Use external OPA if you already run OPA for other services. See [Policy Engine](/platform/authentication-authorization/authorization/policy-engine). - [ ] **Set appropriate refresh interval**: `policy_data_refresh_interval` (embedded) or `bundle_cache_seconds` (external OPA) controls how quickly role changes take effect. Lower values = faster propagation but more load on the entity store. - [ ] **Monitor PDP health**: Ensure the auth service (embedded) or OPA sidecar (external) is healthy. If the PDP is unreachable, the middleware fails closed (returns 503). ## Secrets and Credentials -- [ ] **Verify CLI config file permissions**: Token storage at `~/.config/nmp/config.yaml` should have permissions `0600` (owner read/write only). Avoid storing this file in cloud-synced or shared directories. See [Using Authentication — Config File](../authentication/using-authentication.md#config-file) for full guidance. +- [ ] **Verify CLI config file permissions**: Token storage at `~/.config/nmp/config.yaml` should have permissions `0600` (owner read/write only). Avoid storing this file in cloud-synced or shared directories. See [Using Authentication — Config File](/platform/authentication-authorization/authentication/using-authentication#config-file) for full guidance. - [ ] **Rotate IdP client secrets**: If your OIDC application uses a client secret, rotate it periodically per your organization's policy. - [ ] **Avoid storing tokens in source code or CI configs**: Use environment variables or secret managers for tokens in CI/CD pipelines. @@ -61,7 +63,7 @@ nemo workspaces list ## Related -- [Security Model](../security-model.md) — Architecture, trust boundaries, and authorization layers. -- [Auth Configuration](configuration.md) — Full configuration reference. -- [Gateway Integration](gateway.md) — Gateway auth and header stripping. -- [Roles & Permissions](../authorization/roles-and-permissions.md) — Permission matrix for role auditing. +- [Security Model](/platform/authentication-authorization/security-model) — Architecture, trust boundaries, and authorization layers. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Full configuration reference. +- [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration) — Gateway auth and header stripping. +- [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions) — Permission matrix for role auditing. diff --git a/docs/auth/index.mdx b/docs/auth/index.mdx index b1dc51c90b..121293259d 100644 --- a/docs/auth/index.mdx +++ b/docs/auth/index.mdx @@ -1,33 +1,36 @@ -# Authentication and Authorization - -{{platform_name}} includes a built-in security layer that lets you control who can access your platform and what they can do. When multiple teams or users share a {{platform_name}} deployment, authentication and authorization ensure that each user sees only the workspaces and resources they are permitted to access, and can only perform actions appropriate to their role. +--- +title: "Authentication and Authorization" +description: "" +--- +NeMo Platform includes a built-in security layer that lets you control who can access your platform and what they can do. When multiple teams or users share a NeMo Platform deployment, authentication and authorization ensure that each user sees only the workspaces and resources they are permitted to access, and can only perform actions appropriate to their role. Access control has two layers: -- **Authentication** — Prove your identity. {{platform_name}} validates a JWT issued by your OpenID Connect (OIDC) identity provider. +- **Authentication** — Prove your identity. NeMo Platform validates a JWT issued by your OpenID Connect (OIDC) identity provider. - **Authorization** — Control what you can do. Workspace-scoped RBAC with roles (Viewer, Editor, Admin) and optional API scopes on tokens. Both layers are opt-in. When `auth.enabled` is `false` (the default), all requests are allowed without checks. This lets you get started quickly and add security when you are ready for multi-user or production deployments. ## How Authentication Works -{{platform_name}} authenticates every request using a JWT from your OIDC identity provider. The token is sent in the `Authorization: Bearer ` header, and {{platform_name}} validates the signature, issuer, audience, and expiry. Refer to [OIDC Setup](authentication/oidc.md) to connect your identity provider. +NeMo Platform authenticates every request using a JWT from your OIDC identity provider. The token is sent in the `Authorization: Bearer <token>` header, and NeMo Platform validates the signature, issuer, audience, and expiry. Refer to [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) to connect your identity provider. How you **obtain** the token depends on your context: -- **CLI** — Run `nemo auth login` to authenticate using the browser-based device flow. The CLI stores and auto-refreshes the token. Refer to [Using Authentication](authentication/using-authentication.md). -- **SDK** — After `nemo auth login`, the Python SDK automatically reads stored tokens from the CLI config and refreshes them transparently. Refer to [Using Authentication](authentication/using-authentication.md#python-sdk). -- **HTTP** — For raw HTTP calls, fetch a token from your IdP (or from the CLI using `nemo auth token`) and pass it in the `Authorization: Bearer ` header. +- **CLI** — Run `nemo auth login` to authenticate using the browser-based device flow. The CLI stores and auto-refreshes the token. Refer to [Using Authentication](/platform/authentication-authorization/authentication/using-authentication). +- **SDK** — After `nemo auth login`, the Python SDK automatically reads stored tokens from the CLI config and refreshes them transparently. Refer to [Using Authentication](/platform/authentication-authorization/authentication/using-authentication#python-sdk). +- **HTTP** — For raw HTTP calls, fetch a token from your IdP (or from the CLI using `nemo auth token`) and pass it in the `Authorization: Bearer <token>` header. - **Studio** — When auth is enabled, Studio automatically redirects you to your IdP to sign in and uses the resulting token for all API calls. -!!! tip - **Quickstart shortcut** — When running {{platform_name}} quickstart without an OIDC provider, you can use an unsigned JWT: + +**Quickstart shortcut** — When running NeMo Platform quickstart without an OIDC provider, you can use an unsigned JWT: - `nemo auth login --unsigned-token --email ` +`nemo auth login --unsigned-token --email <email>` - Quickstart-generated unsigned tokens expire after 24 hours. +Quickstart-generated unsigned tokens expire after 24 hours. - Unsigned JWT login only works for quickstart and must not be used in production. See [Getting Started](#quickstart-development) below. +Unsigned JWT login only works for quickstart and must not be used in production. See [Getting Started](#quickstart-development) below. + ## Getting Started @@ -44,108 +47,107 @@ $ nemo quickstart configure # Enter admin email: admin@example.com ``` -??? "Full quickstart configure output" - :icon: terminal - - ```text - {{platform_name}} Quickstart Configuration - ... - Step 3 of 3: Save Config - Save configuration? - 1. Save configuration - > 2. Configure advanced options - authentication, ports, registry + +```text +{{platform_name}} Quickstart Configuration +... +Step 3 of 3: Save Config +Save configuration? +1. Save configuration +> 2. Configure advanced options - authentication, ports, registry - • Platform Authorization - Enable auth to require authentication for API requests. - When enabled, you can set an admin email to bootstrap access. +• Platform Authorization +Enable auth to require authentication for API requests. +When enabled, you can set an admin email to bootstrap access. - Enable authentication/authorization? - 1. No - Allow all requests without authentication - > 2. Yes - Require authentication for API access +Enable authentication/authorization? +1. No - Allow all requests without authentication +> 2. Yes - Require authentication for API access - ✓ Authorization enabled +✓ Authorization enabled - Admin email (grants PlatformAdmin role): admin@example.com - ✓ Admin: admin@example.com +Admin email (grants PlatformAdmin role): admin@example.com +✓ Admin: admin@example.com - ℹ All CLI requests will be authenticated as admin@example.com. - To use a different identity: nemo auth login --unsigned-token --email - ... - ✓ Configuration saved successfully! - ``` +ℹ All CLI requests will be authenticated as admin@example.com. +To use a different identity: nemo auth login --unsigned-token --email +... +✓ Configuration saved successfully! +``` + The CLI is automatically configured to authenticate as the admin email for all subsequent commands after setup. To switch identity, run: -`nemo auth login --unsigned-token --email `. +`nemo auth login --unsigned-token --email <email>`. #### Step 2: Make Authenticated Calls After authorization is enabled, all API requests must include an identity. The CLI and SDK are already configured after Step 1 — they read the admin email from the CLI config automatically. -=== "CLI" - - ```bash - # CLI is already configured after quickstart configure - # All commands are authenticated as the admin - nemo workspaces list - - # To use a different identity: - nemo auth login --unsigned-token --email other-user@example.com - ``` - -=== "Python SDK" - - ```python - from nemo_platform import NeMoPlatform - - # No arguments needed — the SDK reads base_url, workspace, and credentials - # from the active CLI context (set by `nemo auth login` or `nemo quickstart configure`). - # See: Initializing the CLI and SDK in the quickstart for other init options. - client = NeMoPlatform() - - workspaces = client.workspaces.list() - print(f"Found {len(workspaces.data)} workspaces") - ``` + + +```bash +# CLI is already configured after quickstart configure +# All commands are authenticated as the admin +nemo workspaces list +# To use a different identity: +nemo auth login --unsigned-token --email other-user@example.com +``` + + +```python +from nemo_platform import NeMoPlatform + +# No arguments needed — the SDK reads base_url, workspace, and credentials +# from the active CLI context (set by `nemo auth login` or `nemo quickstart configure`). +# See: Initializing the CLI and SDK in the quickstart for other init options. +client = NeMoPlatform() + +workspaces = client.workspaces.list() +print(f"Found {len(workspaces.data)} workspaces") +``` + + ### Production / Helm Deployment -For production or Helm-based deployments, enable auth by setting `platformConfig.auth.enabled: true` in your Helm values and configure the `auth:` section in platform config. Refer to [Auth Configuration](deployment/configuration.md) for the full reference and [OIDC Setup](authentication/oidc.md) to connect your identity provider. +For production or Helm-based deployments, enable auth by setting `platformConfig.auth.enabled: true` in your Helm values and configure the `auth:` section in platform config. Refer to [Auth Configuration](/platform/authentication-authorization/deployment/configuration) for the full reference and [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) to connect your identity provider. ## Where to Go Next
-- **[Security Model](security-model.md)** +- **[Security Model](/platform/authentication-authorization/security-model)** --- - Understand how {{platform_name}} authentication and authorization work together — trust boundaries, principal model, and authorization layers. + Understand how NeMo Platform authentication and authorization work together — trust boundaries, principal model, and authorization layers. -- **[Connect an Identity Provider](authentication/oidc.md)** +- **[Connect an Identity Provider](/platform/authentication-authorization/authentication/oidc-setup)** --- - Configure {{platform_name}} to authenticate users using your OIDC identity provider. + Configure NeMo Platform to authenticate users using your OIDC identity provider. -- **[Manage Workspace Access](authorization/managing-access.md)** +- **[Manage Workspace Access](/platform/authentication-authorization/authorization/managing-access)** --- Add users to workspaces, assign roles, and control who can access your resources. -- **[Configuration Reference](deployment/configuration.md)** +- **[Configuration Reference](/platform/authentication-authorization/deployment/configuration)** --- Full configuration reference — enabling auth, PDP provider, OIDC settings, environment variables. -- **[Harden for Production](deployment/hardening.md)** +- **[Harden for Production](/platform/authentication-authorization/deployment/production-hardening)** --- Security checklist for production deployments — OIDC, gateway headers, scoped tokens, TLS. -- **[Troubleshooting](troubleshooting.md)** +- **[Troubleshooting](/platform/authentication-authorization/troubleshooting)** --- diff --git a/docs/auth/security-model.mdx b/docs/auth/security-model.mdx index aedb17d658..0ad520daff 100644 --- a/docs/auth/security-model.mdx +++ b/docs/auth/security-model.mdx @@ -1,8 +1,10 @@ -# Security Model +--- +title: "Security Model" +description: "" +--- +This page describes the security architecture of NeMo Platform authentication and authorization — how requests are authenticated, how access decisions are made, and where trust boundaries lie. It is intended for platform operators and security reviewers evaluating NeMo Platform for production deployment. -This page describes the security architecture of {{platform_name}} authentication and authorization — how requests are authenticated, how access decisions are made, and where trust boundaries lie. It is intended for platform operators and security reviewers evaluating {{platform_name}} for production deployment. - -For hands-on setup, see [Auth Configuration](deployment/configuration.md). For the authorization model, see [Concepts](concepts.md). +For hands-on setup, see [Auth Configuration](/platform/authentication-authorization/deployment/configuration). For the authorization model, see [Concepts](/platform/authentication-authorization/concepts). ## Architecture Overview @@ -55,22 +57,23 @@ The request flow: 3. **PDP** (Policy Decision Point) evaluates authorization — checks the principal's role bindings and token scopes against the operation's requirements. 4. If allowed, the service handles the request. In gateway deployments, the gateway forwards the request with trusted `X-NMP-Principal-*` headers so downstream services skip re-validation. -!!! note - In quickstart deployments without an OIDC provider, the `X-NMP-Principal-*` headers are set directly by the client instead of being derived from a validated JWT. See [Getting Started](index.md). + +In quickstart deployments without an OIDC provider, the `X-NMP-Principal-*` headers are set directly by the client instead of being derived from a validated JWT. See [Getting Started](/platform/authentication-authorization/overview). + ## Authentication Modes -{{platform_name}} supports two authentication modes: **service-level** (the service validates the JWT) and **gateway-level** (the gateway validates at the edge). +NeMo Platform supports two authentication modes: **service-level** (the service validates the JWT) and **gateway-level** (the gateway validates at the edge). -In both modes, identity arrives as a **JWT** from the client. The JWT is validated exactly once — either by the first {{platform_name}} service or by the gateway. After validation, the authenticated identity (email, subject, groups) is propagated to downstream services via **trusted `X-NMP-Principal-*` headers**. Downstream services accept these headers without re-validating the JWT. +In both modes, identity arrives as a **JWT** from the client. The JWT is validated exactly once — either by the first NeMo Platform service or by the gateway. After validation, the authenticated identity (email, subject, groups) is propagated to downstream services via **trusted `X-NMP-Principal-*` headers**. Downstream services accept these headers without re-validating the JWT. -This "validate once, propagate via headers" design means that **network perimeter security is critical**: anything inside the trust boundary that receives `X-NMP-Principal-*` headers will trust them unconditionally. The gateway must strip these headers from all incoming external requests to prevent clients from forging an identity. See [Gateway Integration](deployment/gateway.md). +This "validate once, propagate via headers" design means that **network perimeter security is critical**: anything inside the trust boundary that receives `X-NMP-Principal-*` headers will trust them unconditionally. The gateway must strip these headers from all incoming external requests to prevent clients from forging an identity. See [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration). ### Service-Level Authentication -The first {{platform_name}} service that receives the request validates the JWT directly: +The first NeMo Platform service that receives the request validates the JWT directly: -1. Extracts the `Authorization: Bearer ` header +1. Extracts the `Authorization: Bearer <token>` header 2. Validates the JWT signature, issuer, audience, and expiry against the configured OIDC provider 3. Extracts the principal identity (email, subject, groups) from JWT claims 4. Calls the PDP for an authorization decision @@ -94,8 +97,9 @@ A **principal** is an authenticated identity — typically a human user identifi When OIDC is enabled, the principal is resolved from JWT claims: the `sub` claim becomes the principal ID (or `oid` for Azure AD), the `email` claim provides the email (or `upn` for Azure AD), and group memberships come from the `groups` claim. These claim names are configurable. In gateway deployments, the gateway performs this extraction and forwards the result in `X-NMP-Principal-*` headers. -!!! note - **Quickstart shortcut** — When running without OIDC (`email-as-API-key` mode), the principal is the raw value of the `X-NMP-Principal-Id` header, without any token validation. This is intended for quick testing only. + +**Quickstart shortcut** — When running without OIDC (`email-as-API-key` mode), the principal is the raw value of the `X-NMP-Principal-Id` header, without any token validation. This is intended for quick testing only. + ### Trusted Identity Headers @@ -116,12 +120,13 @@ Whichever component performs the initial authentication and authorization — th Not all requests originate from human users. Platform services that need cross-workspace access — for example, the jobs controller monitoring jobs across all users, or the evaluator coordinating evaluations — authenticate as **service principals**. -A service principal's ID has the form `service:` (e.g., `service:jobs`, `service:evaluator`). Service principals are auto-authorized without a PDP call and have access to all workspaces and all operations. They are created internally by the platform and are never exposed to external callers. +A service principal's ID has the form `service:<name>` (e.g., `service:jobs`, `service:evaluator`). Service principals are auto-authorized without a PDP call and have access to all workspaces and all operations. They are created internally by the platform and are never exposed to external callers. **Internal endpoints** (`/internal/*`) are also auto-authorized — they bypass PDP checks entirely and are reserved for service-to-service communication. -!!! info - Service principals rely on perimeter security — the gateway strips `X-NMP-Principal-*` headers from incoming requests, so external callers cannot forge a `service:` identity. The gateway must also block external access to `/internal/*` paths. See [Gateway Integration](deployment/gateway.md). + +Service principals rely on perimeter security — the gateway strips `X-NMP-Principal-*` headers from incoming requests, so external callers cannot forge a `service:` identity. The gateway must also block external access to `/internal/*` paths. See [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration). + #### On-Behalf-Of Delegation @@ -147,40 +152,41 @@ The downstream service constructs a principal from this object and evaluates the #### Job Credential Propagation -Many {{platform_name}} operations — customization, evaluation, synthetic data generation — run as asynchronous jobs. When a user submits a job, the platform propagates the submitting user's identity into the job container via the `NMP_PRINCIPAL` environment variable. +Many NeMo Platform operations — customization, evaluation, synthetic data generation — run as asynchronous jobs. When a user submits a job, the platform propagates the submitting user's identity into the job container via the `NMP_PRINCIPAL` environment variable. -!!! info - Job containers need to run inside the trust boundary — they call {{platform_name}} APIs using the propagated identity headers and are subject to the same authorization checks as any other caller. The key distinction is that jobs act as the *user*, not as a privileged service principal, so they can only access resources the submitting user is permitted to reach. + +Job containers need to run inside the trust boundary — they call NeMo Platform APIs using the propagated identity headers and are subject to the same authorization checks as any other caller. The key distinction is that jobs act as the *user*, not as a privileged service principal, so they can only access resources the submitting user is permitted to reach. + ## Authorization: Workspace-Scoped RBAC -{{platform_name}} uses **workspace-scoped Role-Based Access Control (RBAC)**. All resources (models, datasets, jobs, evaluations) belong to exactly one workspace, and access is controlled at the workspace level — not per-resource. +NeMo Platform uses **workspace-scoped Role-Based Access Control (RBAC)**. All resources (models, datasets, jobs, evaluations) belong to exactly one workspace, and access is controlled at the workspace level — not per-resource. - Users are granted roles (Viewer, Editor, Admin) per workspace via **role bindings** - The wildcard principal `*` binds a role for all authenticated users at once - Workspaces are private by default; the creator becomes Admin automatically -On top of RBAC, {{platform_name}} supports **API scopes** as a second authorization layer at the token level. Every authorized request passes through two independent checks: +On top of RBAC, NeMo Platform supports **API scopes** as a second authorization layer at the token level. Every authorized request passes through two independent checks: 1. **Scope check** (token level): The JWT must carry at least one of the scopes required by the endpoint (e.g., `platform:read` or `platform:write`). Scopes limit what the *token* can do. 2. **Permission check** (role level): The principal must have the necessary permissions via role bindings in the workspace. Roles limit what the *user* can do. Both must pass. This enables least-privilege token usage — for example, an Editor can create a read-only token (`platform:read` only) for monitoring scripts. -For details, see [Authorization Concepts](concepts.md), [Roles & Permissions](authorization/roles-and-permissions.md), and [API Scopes](authorization/api-scopes.md). +For details, see [Authorization Concepts](/platform/authentication-authorization/concepts), [Roles & Permissions](/platform/authentication-authorization/authorization/roles-and-permissions), and [API Scopes](/platform/authentication-authorization/authorization/api-scopes). -## What {{platform_name}} Does NOT Do +## What NeMo Platform Does NOT Do The following are explicitly out of scope for the current implementation: - **Multi-tenancy with database isolation.** Workspaces provide logical isolation, not separate databases or schemas per tenant. - **Service mesh mTLS.** Service-to-service encryption and mutual authentication are delegated to the customer's infrastructure (e.g., Istio, Linkerd). -- **Built-in audit logging.** The embedded PDP does not produce a dedicated audit trail. If you need structured decision logs, use [External OPA](authorization/policy-engine.md#external-opa) — OPA's [decision logging](https://www.openpolicyagent.org/docs/latest/management-decision-logs/) can export every authorization decision to your log aggregator. +- **Built-in audit logging.** The embedded PDP does not produce a dedicated audit trail. If you need structured decision logs, use [External OPA](/platform/authentication-authorization/authorization/policy-engine#external-opa) — OPA's [decision logging](https://www.openpolicyagent.org/docs/latest/management-decision-logs/) can export every authorization decision to your log aggregator. - **Custom RBAC at runtime.** Custom roles can be defined at deployment time via YAML configuration, but cannot be created or modified at runtime through the API. ## Related -- [Authorization Concepts](concepts.md) — Workspaces, roles, bindings, and the RBAC model. -- [Auth Configuration](deployment/configuration.md) — Enabling auth, PDP provider, OIDC settings. -- [Gateway Integration](deployment/gateway.md) — Gateway-level auth and header configuration. -- [Production Hardening](deployment/hardening.md) — Security checklist for production deployments. +- [Authorization Concepts](/platform/authentication-authorization/concepts) — Workspaces, roles, bindings, and the RBAC model. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Enabling auth, PDP provider, OIDC settings. +- [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration) — Gateway-level auth and header configuration. +- [Production Hardening](/platform/authentication-authorization/deployment/production-hardening) — Security checklist for production deployments. diff --git a/docs/auth/troubleshooting.mdx b/docs/auth/troubleshooting.mdx index bfd0c5de00..243fb3ba0c 100644 --- a/docs/auth/troubleshooting.mdx +++ b/docs/auth/troubleshooting.mdx @@ -1,5 +1,7 @@ -# Troubleshooting - +--- +title: "Troubleshooting" +description: "" +--- When something goes wrong with authentication or authorization, start here. Problems are organized by symptom. ## "I Get 401 Unauthorized on Every Request" @@ -28,7 +30,7 @@ When something goes wrong with authentication or authorization, start here. Prob nemo auth login ``` -3. Verify the token is being sent. For SDK/curl, ensure the `Authorization: Bearer ` header is present. +3. Verify the token is being sent. For SDK/curl, ensure the `Authorization: Bearer <token>` header is present. **Common causes**: @@ -81,7 +83,7 @@ The requested scope is not configured or admin consent has not been granted. ### "Device flow not enabled" or "Public client flows not allowed" -Your IdP doesn't have device flow enabled for the {{platform_name}} application. +Your IdP doesn't have device flow enabled for the NeMo Platform application. **Fix**: @@ -91,7 +93,7 @@ Your IdP doesn't have device flow enabled for the {{platform_name}} application. ### Client ID Mismatch -The `client_id` in {{platform_name}} config doesn't match the application in your IdP. +The `client_id` in NeMo Platform config doesn't match the application in your IdP. **Fix**: Verify `auth.oidc.client_id` matches the client ID in your IdP exactly. @@ -160,7 +162,7 @@ If these steps don't resolve the issue: ## Related -- [Auth Configuration](deployment/configuration.md) — Configuration reference. -- [Security Model](security-model.md) — Architecture and trust boundaries. -- [OIDC Setup](authentication/oidc.md) — IdP configuration. -- [Gateway Integration](deployment/gateway.md) — Gateway auth and headers. +- [Auth Configuration](/platform/authentication-authorization/deployment/configuration) — Configuration reference. +- [Security Model](/platform/authentication-authorization/security-model) — Architecture and trust boundaries. +- [OIDC Setup](/platform/authentication-authorization/authentication/oidc-setup) — IdP configuration. +- [Gateway Integration](/platform/authentication-authorization/deployment/gateway-integration) — Gateway auth and headers. diff --git a/docs/cli/configuration.mdx b/docs/cli/configuration.mdx index d33defdccb..eb403c55dc 100644 --- a/docs/cli/configuration.mdx +++ b/docs/cli/configuration.mdx @@ -1,5 +1,7 @@ -# Configuration - +--- +title: "Configuration" +description: "" +--- The NeMo CLI uses a configuration file to store connection settings, credentials, and preferences. This allows you to work with multiple environments and switch between them easily. ## Quick Setup diff --git a/docs/cli/index.mdx b/docs/cli/index.mdx index d303bc6493..4a8341da50 100644 --- a/docs/cli/index.mdx +++ b/docs/cli/index.mdx @@ -1,7 +1,11 @@ +--- +title: "Overview" +description: "" +--- -# {{platform_name}} CLI +# NeMo Platform CLI -The {{platform_name}} CLI (`nemo`) is a command-line tool for interacting with {{platform_name}}. It provides a unified interface for managing models, running jobs, deploying inference endpoints, and working with a local setup. +The NeMo Platform CLI (`nemo`) is a command-line tool for interacting with NeMo Platform. It provides a unified interface for managing models, running jobs, deploying inference endpoints, and working with a local setup. ## Key Capabilities @@ -13,10 +17,13 @@ The {{platform_name}} CLI (`nemo`) is a command-line tool for interacting with { ## Installation -!!! note "This package downloads and installs additional third-party open source software projects. Review the license terms of these open source projects before use." - If you previously installed the `nemo-microservices` package, uninstall it first to avoid conflicts: + +This package downloads and installs additional third-party open source software projects. Review the license terms of these open source projects before use. +If you previously installed the `nemo-microservices` package, uninstall it first to avoid conflicts: + +pip uninstall nemo-microservices + - pip uninstall nemo-microservices ### Install in a Virtual Environment ```bash @@ -29,7 +36,9 @@ Or with uv: uv pip install nemo-platform[all] ``` -!!! warning "When installed in a virtual environment, the `nemo` command is only available when the environment is activated." + +When installed in a virtual environment, the `nemo` command is only available when the environment is activated. + ### Verify Installation @@ -37,7 +46,7 @@ uv pip install nemo-platform[all] nemo --help ``` -If you see `Unknown command: nemo`, see the [troubleshooting](troubleshooting.md) page. +If you see `Unknown command: nemo`, see the [troubleshooting](/reference/cli-reference/troubleshooting) page. ## Getting Started @@ -56,7 +65,7 @@ nemo auth login nemo workspaces list ``` -This command outputs a list of available workspaces. If the connection fails, see [troubleshooting](troubleshooting.md) for common issues and solutions. +This command outputs a list of available workspaces. If the connection fails, see [troubleshooting](/reference/cli-reference/troubleshooting) for common issues and solutions. ## Command Structure @@ -66,7 +75,25 @@ The CLI follows a consistent pattern: nemo [GLOBAL OPTIONS] [...] [OPTIONS] ``` ---8<-- "_snippets/cli-summary.md" +**Global options** apply to all commands: + +| Option | Description | +|--------|-------------| +| `--base-url` | Base URL for the NeMo Platform API | +| `--output-format, -f <CHOICE>` | Output format for how results are printed. [possible values: table, json, yaml, markdown, csv, raw, code] | +| `--no-truncate` | Don't truncate long values in table/markdown/csv output | +| `--timestamp-format <CHOICE>` | Timestamp format for table/markdown/csv output [possible values: relative, iso8601] | +| `--verbose, -v` | Enable verbose messaging. This only impacts logs that are visible, it doesn't change any data outputs. | +| `--agent-mode, -A` | Enable agent-friendly output mode with extra context for coding agents. | + +**Commands** are organized into categories: + +| Category | Commands | Description | +|----------|----------|-------------| +| Setup | `setup`, `services`, `skills` | Set up and run local platform components | +| CLI functions | `chat`, `docs`, `wait`, `agent`, `plugins` | Interactive, documentation, and agent-oriented workflows | +| Core plugins | `files`, `inference`, `jobs`, `models`, `secrets`, `workspaces` | Core platform resources | +| Functional plugins | `guardrail` | Functional service and plugin commands | ## Common Workflows @@ -123,7 +150,7 @@ nemo models list -f csv --all-pages --no-truncate --output-columns all > models. ``` ## Next Steps -- [configuration](configuration.md) - Contexts, authentication, environment variables, and shell completion -- [working-with-resources](working-with-resources.md) - Output formats, pagination, and input methods -- [troubleshooting](troubleshooting.md) - Common issues and solutions -- [reference](reference.md) - Full command reference +- [configuration](/reference/cli-reference/configuration) - Contexts, authentication, environment variables, and shell completion +- [working-with-resources](/reference/cli-reference/working-with-resources) - Output formats, pagination, and input methods +- [troubleshooting](/reference/cli-reference/troubleshooting) - Common issues and solutions +- [reference](/reference/cli-reference/full-cli-reference) - Full command reference diff --git a/docs/cli/reference.mdx b/docs/cli/reference.mdx index 288a99c5a6..ad0517b804 100644 --- a/docs/cli/reference.mdx +++ b/docs/cli/reference.mdx @@ -1,5 +1,7 @@ -# Full CLI Reference - +--- +title: "Full CLI Reference" +description: "" +--- Command-line interface for NeMo Platform. **Getting started:** @@ -22,9 +24,9 @@ nemo [GLOBAL OPTIONS] COMMAND [ARGS]... **Global Options:** * `--base-url`: Base URL for the NeMo Platform API -* `--output-format, -f `: Output format for how results are printed. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for how results are printed. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output -* `--timestamp-format `: Timestamp format for table/markdown/csv output [possible values: relative, iso8601] +* `--timestamp-format <CHOICE>`: Timestamp format for table/markdown/csv output [possible values: relative, iso8601] * `--verbose, -v`: Enable verbose messaging. This only impacts logs that are visible, it doesn't change any data outputs. * `--agent-mode, -A`: Enable agent-friendly output mode with extra context for coding agents. @@ -78,10 +80,10 @@ nemo setup [OPTIONS] * `--start-services, --no-start-services`: Start local platform services * `--install-skills, --no-install-skills`: Install NeMo skills for coding agents * `--skills-agents`: Comma-separated list of agents to install skills for (e.g. 'codex,cursor'). Default: all detected. Only applied when --install-skills is set. -* `--skills-scope `: Install scope for skills: 'project' (this repo) or 'user' (home). Default: project. Only applied when --install-skills is set. [possible values: project, user] +* `--skills-scope <CHOICE>`: Install scope for skills: 'project' (this repo) or 'user' (home). Default: project. Only applied when --install-skills is set. [possible values: project, user] * `--skills-from`: Comma-separated list of skill sources to install from (e.g. 'nemo-platform,nemo-evaluator-plugin'). Use 'nemo-platform' for the built-in set. Default: all sources. Only applied when --install-skills is set. * `--deploy-agent, --no-deploy-agent`: Deploy the demo calculator agent -* `--ready-timeout `: Seconds to wait for platform readiness (default: 240) +* `--ready-timeout <INTEGER>`: Seconds to wait for platform readiness (default: 240) **Help:** @@ -130,7 +132,7 @@ nemo services run [OPTIONS] * `--sidecars`: Comma-separated sidecars to run, e.g. adapters,cache. * `--config`: Path to a platform configuration YAML file. * `--host`: Host to bind to. [default: 127.0.0.1] -* `--port `: Port to bind to. [default: 8080] +* `--port <INTEGER>`: Port to bind to. [default: 8080] * `--instance`: Instance name. Defaults to a name derived from the working directory and port. **Help:** @@ -165,7 +167,7 @@ nemo services start [OPTIONS] * `--sidecars`: Comma-separated sidecars to run, e.g. adapters,cache. * `--config`: Path to a platform configuration YAML file. * `--host`: Host to bind to. [default: 127.0.0.1] -* `--port `: Port to bind to. [default: 8080] +* `--port <INTEGER>`: Port to bind to. [default: 8080] * `--instance`: Instance name. Defaults to a name derived from the working directory and port. **Help:** @@ -178,7 +180,7 @@ Stop running platform services. Sends SIGTERM to the running service process and waits for it to exit. Falls back to SIGKILL after a timeout. Foreground instances (started -with ``run``) are protected; use ``--force`` to override. +with `run`) are protected; use `--force` to override. **Examples:** @@ -195,9 +197,9 @@ nemo services stop [OPTIONS] **Options:** -* `--timeout `: Seconds to wait before SIGKILL. [default: 30.0] +* `--timeout <FLOAT>`: Seconds to wait before SIGKILL. [default: 30.0] * `--instance`: Instance name. Defaults to a name derived from the working directory and port. -* `--port `: Port (used for scope computation if --instance not given). [default: 8080] +* `--port <INTEGER>`: Port (used for scope computation if --instance not given). [default: 8080] * `--force`: Stop even if the instance is running in the foreground. **Help:** @@ -234,7 +236,7 @@ nemo services restart [OPTIONS] * `--sidecars`: Comma-separated sidecars to run. Overrides previous setting. * `--config`: Path to a platform configuration YAML file. * `--host`: Host to bind to. Defaults to previous value or 127.0.0.1. -* `--port `: Port to bind to. Defaults to previous value or 8080. +* `--port <INTEGER>`: Port to bind to. Defaults to previous value or 8080. * `--instance`: Instance name. Defaults to a name derived from the working directory and port. **Help:** @@ -254,7 +256,7 @@ nemo services status [OPTIONS] **Options:** * `--instance`: Instance name. Defaults to a name derived from the working directory and port. -* `--port `: Port (used for scope computation if --instance not given). [default: 8080] +* `--port <INTEGER>`: Port (used for scope computation if --instance not given). [default: 8080] **Help:** @@ -295,9 +297,9 @@ nemo services logs [OPTIONS] **Options:** * `--path`: Print the log file path instead of tailing. -* `-n, --lines `: Number of lines to show from end of log. [default: 50] +* `-n, --lines <INTEGER RANGE>`: Number of lines to show from end of log. [default: 50] * `--instance`: Instance name. Defaults to a name derived from the working directory and port. -* `--port `: Port (used for scope computation if --instance not given). [default: 8080] +* `--port <INTEGER>`: Port (used for scope computation if --instance not given). [default: 8080] **Help:** @@ -373,7 +375,7 @@ nemo skills list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -400,7 +402,7 @@ nemo skills show [OPTIONS] NAME **Arguments:** -* ``: Skill name to show (use 'nemo skills list' to see available skills) +* `<NAME>`: Skill name to show (use 'nemo skills list' to see available skills) **Options:** @@ -478,8 +480,8 @@ nemo chat [OPTIONS] MODEL [PROMPT] **Arguments:** -* ``: Model entity name (from 'nemo models list') or model ID when using --provider -* ``: Prompt for one-shot mode. Takes precedence over piped stdin. +* `<MODEL>`: Model entity name (from 'nemo models list') or model ID when using --provider +* `<PROMPT>`: Prompt for one-shot mode. Takes precedence over piped stdin. **Options:** @@ -496,13 +498,13 @@ nemo chat [OPTIONS] MODEL [PROMPT] **Model Options:** -* `--temperature `: Sampling temperature (0.0 to 2.0) -* `--max-tokens `: Maximum tokens to generate +* `--temperature <FLOAT>`: Sampling temperature (0.0 to 2.0) +* `--max-tokens <INTEGER>`: Maximum tokens to generate * `--system-message`: System message to set context for the conversation **Output Options:** -* `--output-format, --format, -f `: Output format for one-shot responses. [possible values: text, json, raw] +* `--output-format, --format, -f <CHOICE>`: Output format for one-shot responses. [possible values: text, json, raw] ### nemo docs @@ -524,7 +526,7 @@ nemo docs [OPTIONS] [PATH] **Arguments:** -* ``: Path to a doc topic (e.g., get-started/setup). Omit to see available topics. +* `<PATH>`: Path to a doc topic (e.g., get-started/setup). Omit to see available topics. **Options:** @@ -599,15 +601,15 @@ nemo wait inference deployment [OPTIONS] NAME **Arguments:** -* ``: Name of the deployment to wait for +* `<NAME>`: Name of the deployment to wait for **Options:** * `--workspace`: Workspace name -* `--status, -s `: Desired status to wait for [possible values: READY, DELETED, PENDING, ERROR; default: READY] -* `--timeout, -t `: Maximum time to wait in seconds [default: 1200] +* `--status, -s <CHOICE>`: Desired status to wait for [possible values: READY, DELETED, PENDING, ERROR; default: READY] +* `--timeout, -t <INTEGER RANGE>`: Maximum time to wait in seconds [default: 1200] * `--check-gateway, --no-check-gateway`: When waiting for READY, also verify gateway can route to the provider -* `--poll-interval `: Seconds between status checks [default: 3] +* `--poll-interval <INTEGER RANGE>`: Seconds between status checks [default: 3] **Help:** @@ -640,13 +642,13 @@ nemo wait inference provider [OPTIONS] NAME **Arguments:** -* ``: Name of the provider to wait for +* `<NAME>`: Name of the provider to wait for **Options:** * `--workspace`: Workspace name -* `--timeout, -t `: Maximum time to wait in seconds [default: 60] -* `--poll-interval `: Seconds between status checks [default: 1] +* `--timeout, -t <INTEGER RANGE>`: Maximum time to wait in seconds [default: 60] +* `--poll-interval <INTEGER RANGE>`: Seconds between status checks [default: 1] **Help:** @@ -777,7 +779,7 @@ nemo plugins list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -834,8 +836,8 @@ nemo files upload [OPTIONS] LOCAL_PATH [FILESET] **Arguments:** -* ``: Local path to upload -* ``: Name of the fileset to upload to. If not provided, a new fileset is created. +* `<LOCAL_PATH>`: Local path to upload +* `<FILESET>`: Name of the fileset to upload to. If not provided, a new fileset is created. **Options:** @@ -871,13 +873,13 @@ nemo files download [OPTIONS] FILESET **Arguments:** -* ``: Name of the fileset to download from +* `<FILESET>`: Name of the fileset to download from **Options:** * `--workspace` * `--remote-path`: Path within the fileset. Defaults to root. [default: ] -* `--output, -o `: Local path to download to. +* `--output, -o <PATH>`: Local path to download to. **Help:** @@ -907,7 +909,7 @@ nemo files list [OPTIONS] FILESET **Arguments:** -* ``: Name of the fileset to list files from +* `<FILESET>`: Name of the fileset to list files from **Options:** @@ -920,7 +922,7 @@ nemo files list [OPTIONS] FILESET **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. * `--no-truncate`: Don't truncate long values in table/markdown/csv output. @@ -943,7 +945,7 @@ nemo files delete [OPTIONS] FILESET **Arguments:** -* ``: Name of the fileset containing the file +* `<FILESET>`: Name of the fileset containing the file **Options:** @@ -1002,7 +1004,7 @@ nemo files filesets create [OPTIONS] [NAME] **Arguments:** -* ``: The name of the fileset. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. +* `<NAME>`: The name of the fileset. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. **Options:** @@ -1014,7 +1016,7 @@ nemo files filesets create [OPTIONS] [NAME] Example: metadata = FilesetMetadata( dataset=DatasetMetadataContent( schema=`{"columns": ["id", "name"]}`, ) ) (JSON string) * `--project`: The name of the project associated with this fileset. -* `--purpose `: The purpose of the fileset. [possible values: dataset, generic, model] +* `--purpose <CHOICE>`: The purpose of the fileset. [possible values: dataset, generic, model] * `--storage`: The storage configuration for the fileset. If not provided, uses default storage. (JSON string) * `--exist-ok`: Do not raise an error if the resource already exists. Returns the existing resource. @@ -1029,7 +1031,7 @@ Example: metadata = FilesetMetadata( dataset=DatasetMetadataContent( schema=`{"c **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo files filesets delete @@ -1049,7 +1051,7 @@ nemo files filesets delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1075,17 +1077,17 @@ nemo files filesets list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. -* `--sort `: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, name, -name] +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. +* `--sort <CHOICE>`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, name, -name] * `--all-pages`: Fetch all pages **Filter Options:** * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter filesets by name, description, purpose, storage_type, created_at, and updated_at. * `--filter.description` @@ -1099,7 +1101,7 @@ Filter filesets by name, description, purpose, storage_type, created_at, and upd **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -1117,7 +1119,7 @@ nemo files filesets get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1129,7 +1131,7 @@ nemo files filesets get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo files filesets update @@ -1152,7 +1154,7 @@ nemo files filesets update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1163,7 +1165,7 @@ nemo files filesets update [OPTIONS] NAME Example: metadata = FilesetMetadata( dataset=DatasetMetadataContent( schema=`{"columns": ["id", "name"]}`, ) ) (JSON string) * `--project`: The name of the project associated with this fileset. -* `--purpose `: The purpose of the fileset. [possible values: dataset, generic, model] +* `--purpose <CHOICE>`: The purpose of the fileset. [possible values: dataset, generic, model] **Help:** @@ -1176,7 +1178,7 @@ Example: metadata = FilesetMetadata( dataset=DatasetMetadataContent( schema=`{"c **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo files otlp @@ -1238,7 +1240,7 @@ nemo files otlp logs create [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1255,7 +1257,7 @@ nemo files otlp logs create [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo files otlp logs query @@ -1272,13 +1274,13 @@ nemo files otlp logs query [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--filters`: Key-value filters to apply to the query -* `--limit `: Maximum number of results to return +* `--limit <INTEGER>`: Maximum number of results to return * `--page-cursor`: Cursor for pagination **Help:** @@ -1287,7 +1289,7 @@ nemo files otlp logs query [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ### nemo inference @@ -1390,7 +1392,7 @@ nemo inference deployment-configs create [OPTIONS] [NAME] **Arguments:** -* ``: Name of the deployment configuration. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. +* `<NAME>`: Name of the deployment configuration. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. **Options:** @@ -1412,7 +1414,7 @@ nemo inference deployment-configs create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference deployment-configs delete @@ -1430,7 +1432,7 @@ nemo inference deployment-configs delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1456,8 +1458,8 @@ nemo inference deployment-configs list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. * `--sort`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. * `--all-pages`: Fetch all pages @@ -1465,8 +1467,8 @@ nemo inference deployment-configs list [OPTIONS] * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter deployment configs by workspace, project, model_entity_id, name, description, created_at, and updated_at. * `--filter.description` @@ -1481,7 +1483,7 @@ Filter deployment configs by workspace, project, model_entity_id, name, descript **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -1497,7 +1499,7 @@ nemo inference deployment-configs get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1509,7 +1511,7 @@ nemo inference deployment-configs get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference deployment-configs update @@ -1534,7 +1536,7 @@ nemo inference deployment-configs update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1554,7 +1556,7 @@ nemo inference deployment-configs update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference deployment-configs versions @@ -1593,7 +1595,7 @@ nemo inference deployment-configs versions delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1616,7 +1618,7 @@ nemo inference deployment-configs versions list [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1628,7 +1630,7 @@ nemo inference deployment-configs versions list [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -1644,7 +1646,7 @@ nemo inference deployment-configs versions get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1657,7 +1659,7 @@ nemo inference deployment-configs versions get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo inference deployments @@ -1707,13 +1709,13 @@ nemo inference deployments create [OPTIONS] [NAME] **Arguments:** -* ``: Name of the deployment. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. +* `<NAME>`: Name of the deployment. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. **Options:** * `--workspace` * `--config`: Reference to the ModelDeploymentConfig name -* `--config-version `: Reference to a specific ModelDeploymentConfig version. If not specified, uses latest. +* `--config-version <INTEGER>`: Reference to a specific ModelDeploymentConfig version. If not specified, uses latest. * `--project`: The URN of the project associated with this deployment * `--exist-ok`: Do not raise an error if the resource already exists. Returns the existing resource. @@ -1728,13 +1730,13 @@ nemo inference deployments create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] **Wait Options:** * `--wait`: Wait for the created deployment to reach a terminal state -* `--timeout `: Maximum time to wait in seconds [default: 1200] -* `--poll-interval `: Seconds between status checks [default: 3] +* `--timeout <INTEGER RANGE>`: Maximum time to wait in seconds [default: 1200] +* `--poll-interval <INTEGER RANGE>`: Seconds between status checks [default: 3] ##### nemo inference deployments delete @@ -1764,7 +1766,7 @@ nemo inference deployments delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1790,8 +1792,8 @@ nemo inference deployments list [OPTIONS] * `--workspace` * `--all-versions`: If true, return all versions of each deployment. If false (default), return only the latest version. -* `--page `: Page number. -* `--page-size `: Page size. +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. * `--sort`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. * `--all-pages`: Fetch all pages @@ -1799,8 +1801,8 @@ nemo inference deployments list [OPTIONS] * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter deployments by workspace, project, status, config, model_provider_id, name, status_message, created_at, and updated_at. * `--filter.config` @@ -1817,7 +1819,7 @@ Filter deployments by workspace, project, status, config, model_provider_id, nam **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -1839,7 +1841,7 @@ nemo inference deployments list-models [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1851,7 +1853,7 @@ nemo inference deployments list-models [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -1867,7 +1869,7 @@ nemo inference deployments get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -1879,7 +1881,7 @@ nemo inference deployments get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference deployments update @@ -1904,13 +1906,13 @@ nemo inference deployments update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--config`: Reference to the ModelDeploymentConfig name -* `--config-version `: Reference to a specific ModelDeploymentConfig version. If not specified, uses latest. +* `--config-version <INTEGER>`: Reference to a specific ModelDeploymentConfig version. If not specified, uses latest. **Help:** @@ -1923,7 +1925,7 @@ nemo inference deployments update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference deployments update-status @@ -1951,12 +1953,12 @@ nemo inference deployments update-status [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` -* `--status `: Status enum for ModelDeployment objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] +* `--status <CHOICE>`: Status enum for ModelDeployment objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] * `--version` * `--model-provider-id`: Optional reference to the auto-created ModelProvider workspace/name (format: workspace/name) * `--status-message`: Detailed status message @@ -1972,7 +1974,7 @@ nemo inference deployments update-status [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference deployments versions @@ -2022,7 +2024,7 @@ nemo inference deployments versions delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2045,7 +2047,7 @@ nemo inference deployments versions list [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2057,7 +2059,7 @@ nemo inference deployments versions list [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -2073,7 +2075,7 @@ nemo inference deployments versions get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2086,7 +2088,7 @@ nemo inference deployments versions get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo inference gateway @@ -2149,7 +2151,7 @@ nemo inference gateway model delete [OPTIONS] TRAILING_URI **Arguments:** -* `` +* `<TRAILING_URI>` **Options:** @@ -2179,7 +2181,7 @@ nemo inference gateway model get [OPTIONS] TRAILING_URI **Arguments:** -* `` +* `<TRAILING_URI>` **Options:** @@ -2192,7 +2194,7 @@ nemo inference gateway model get [OPTIONS] TRAILING_URI **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway model patch @@ -2224,7 +2226,7 @@ nemo inference gateway model patch [OPTIONS] TRAILING_URI **Arguments:** -* `` +* `<TRAILING_URI>` **Options:** @@ -2243,7 +2245,7 @@ nemo inference gateway model patch [OPTIONS] TRAILING_URI **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway model post @@ -2275,8 +2277,8 @@ nemo inference gateway model post [OPTIONS] TRAILING_URI [NAME] **Arguments:** -* `` -* ``: (required) +* `<TRAILING_URI>` +* `<NAME>`: (required) **Options:** @@ -2294,7 +2296,7 @@ nemo inference gateway model post [OPTIONS] TRAILING_URI [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway model put @@ -2326,8 +2328,8 @@ nemo inference gateway model put [OPTIONS] TRAILING_URI [NAME] **Arguments:** -* `` -* ``: (required) +* `<TRAILING_URI>` +* `<NAME>`: (required) **Options:** @@ -2345,7 +2347,7 @@ nemo inference gateway model put [OPTIONS] TRAILING_URI [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference gateway openai @@ -2418,7 +2420,7 @@ nemo inference gateway openai v1 models get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2430,7 +2432,7 @@ nemo inference gateway openai v1 models get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] **nemo inference gateway openai v1 models list** @@ -2454,7 +2456,7 @@ nemo inference gateway openai v1 models list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -2493,7 +2495,7 @@ nemo inference gateway provider delete [OPTIONS] TRAILING_URI **Arguments:** -* `` +* `<TRAILING_URI>` **Options:** @@ -2516,7 +2518,7 @@ nemo inference gateway provider get [OPTIONS] TRAILING_URI **Arguments:** -* `` +* `<TRAILING_URI>` **Options:** @@ -2529,7 +2531,7 @@ nemo inference gateway provider get [OPTIONS] TRAILING_URI **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway provider patch @@ -2554,7 +2556,7 @@ nemo inference gateway provider patch [OPTIONS] TRAILING_URI **Arguments:** -* `` +* `<TRAILING_URI>` **Options:** @@ -2573,7 +2575,7 @@ nemo inference gateway provider patch [OPTIONS] TRAILING_URI **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway provider post @@ -2598,8 +2600,8 @@ nemo inference gateway provider post [OPTIONS] TRAILING_URI [NAME] **Arguments:** -* `` -* ``: (required) +* `<TRAILING_URI>` +* `<NAME>`: (required) **Options:** @@ -2617,7 +2619,7 @@ nemo inference gateway provider post [OPTIONS] TRAILING_URI [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway provider put @@ -2642,8 +2644,8 @@ nemo inference gateway provider put [OPTIONS] TRAILING_URI [NAME] **Arguments:** -* `` -* ``: (required) +* `<TRAILING_URI>` +* `<NAME>`: (required) **Options:** @@ -2661,7 +2663,7 @@ nemo inference gateway provider put [OPTIONS] TRAILING_URI [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ###### nemo inference gateway provider ready @@ -2682,7 +2684,7 @@ nemo inference gateway provider ready [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2694,7 +2696,7 @@ nemo inference gateway provider ready [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo inference models @@ -2731,7 +2733,7 @@ nemo inference models get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2743,7 +2745,7 @@ nemo inference models get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference models list @@ -2767,7 +2769,7 @@ nemo inference models list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -2817,7 +2819,7 @@ nemo inference providers create [OPTIONS] [NAME] **Arguments:** -* ``: Name of the model provider. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. +* `<NAME>`: Name of the model provider. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. **Options:** @@ -2833,7 +2835,7 @@ nemo inference providers create [OPTIONS] [NAME] * `--project`: The URN of the project associated with this model provider * `--required-extra-body`: Required body parameters for inference requests. Cannot be overridden by user requests. (JSON string) * `--required-extra-headers`: Required headers for inference requests. Cannot be overridden by user requests. (JSON string) -* `--status `: Status enum for ModelProvider objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] +* `--status <CHOICE>`: Status enum for ModelProvider objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] * `--status-message`: Status message * `--exist-ok`: Do not raise an error if the resource already exists. Returns the existing resource. @@ -2848,7 +2850,7 @@ nemo inference providers create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference providers delete @@ -2862,7 +2864,7 @@ nemo inference providers delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2885,17 +2887,17 @@ nemo inference providers list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. -* `--sort `: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: name, -name, created_at, -created_at, updated_at, -updated_at, status, -status] +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. +* `--sort <CHOICE>`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: name, -name, created_at, -created_at, updated_at, -updated_at, status, -status] * `--all-pages`: Fetch all pages **Filter Options:** * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter model providers by workspace, project, status, model_deployment_id, name, description, host_url, created_at, and updated_at. * `--filter.description` @@ -2912,7 +2914,7 @@ Filter model providers by workspace, project, status, model_deployment_id, name, **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -2928,7 +2930,7 @@ nemo inference providers get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2940,7 +2942,7 @@ nemo inference providers get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference providers update @@ -2965,7 +2967,7 @@ nemo inference providers update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -2981,7 +2983,7 @@ nemo inference providers update [OPTIONS] NAME * `--project`: The URN of the project associated with this model provider * `--required-extra-body`: Required body parameters for inference requests. Cannot be overridden by user requests. (JSON string) * `--required-extra-headers`: Required headers for inference requests. Cannot be overridden by user requests. (JSON string) -* `--status `: Status enum for ModelProvider objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] +* `--status <CHOICE>`: Status enum for ModelProvider objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] * `--status-message`: Status message **Help:** @@ -2995,7 +2997,7 @@ nemo inference providers update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference providers update-status @@ -3028,14 +3030,14 @@ nemo inference providers update-status [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--model-deployment-id`: Reference to the ModelDeployment ID if this provider is associated with a deployment * `--served-models`: List of models served by this provider with routing information for IGW (JSON string) -* `--status `: Status enum for ModelProvider objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] +* `--status <CHOICE>`: Status enum for ModelProvider objects. [possible values: UNKNOWN, CREATED, PENDING, READY, ERROR, DELETING, DELETED, LOST] * `--status-message`: Status message. If status is provided without status_message, defaults to empty string. **Help:** @@ -3049,7 +3051,7 @@ nemo inference providers update-status [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo inference virtual-models @@ -3099,7 +3101,7 @@ nemo inference virtual-models create [OPTIONS] [NAME] **Arguments:** -* ``: Name of the virtual model within the workspace. Must be unique per workspace. +* `<NAME>`: Name of the virtual model within the workspace. Must be unique per workspace. **Options:** @@ -3124,7 +3126,7 @@ nemo inference virtual-models create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference virtual-models delete @@ -3141,7 +3143,7 @@ nemo inference virtual-models delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3166,8 +3168,8 @@ nemo inference virtual-models list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number (1-indexed). -* `--page-size `: Number of results per page. +* `--page <INTEGER>`: Page number (1-indexed). +* `--page-size <INTEGER>`: Number of results per page. * `--sort`: Sort field. Prefix with `-` for descending order. * `--all-pages`: Fetch all pages @@ -3177,7 +3179,7 @@ nemo inference virtual-models list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -3205,7 +3207,7 @@ nemo inference virtual-models patch [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3229,7 +3231,7 @@ nemo inference virtual-models patch [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo inference virtual-models get @@ -3243,7 +3245,7 @@ nemo inference virtual-models get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3255,7 +3257,7 @@ nemo inference virtual-models get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ### nemo jobs @@ -3300,7 +3302,7 @@ nemo jobs cancel [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3312,7 +3314,7 @@ nemo jobs cancel [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs create @@ -3337,7 +3339,7 @@ nemo jobs create [OPTIONS] [NAME] **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3361,7 +3363,7 @@ nemo jobs create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs delete @@ -3375,7 +3377,7 @@ nemo jobs delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3397,13 +3399,13 @@ nemo jobs get-logs [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` -* `--attempt-id `: Filter logs by job attempt ID -* `--limit `: Maximum number of logs to return +* `--attempt-id <INTEGER>`: Filter logs by job attempt ID +* `--limit <INTEGER>`: Maximum number of logs to return * `--page-cursor`: Page cursor * `--step-id`: Filter logs by step name * `--task-id`: Filter logs by task ID @@ -3415,7 +3417,7 @@ nemo jobs get-logs [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -3431,7 +3433,7 @@ nemo jobs get-status [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3443,7 +3445,7 @@ nemo jobs get-status [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs list @@ -3458,17 +3460,17 @@ nemo jobs list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. -* `--sort `: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, updated_at, -updated_at] +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. +* `--sort <CHOICE>`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, updated_at, -updated_at] * `--all-pages`: Fetch all pages **Filter Options:** * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter jobs by workspace, project, name, status, source, created_at, and updated_at. * `--filter.name` @@ -3483,7 +3485,7 @@ Filter jobs by workspace, project, name, status, source, created_at, and updated **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -3503,7 +3505,7 @@ nemo jobs list-execution-profiles [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -3519,7 +3521,7 @@ nemo jobs pause [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3531,7 +3533,7 @@ nemo jobs pause [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs resume @@ -3545,7 +3547,7 @@ nemo jobs resume [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3557,7 +3559,7 @@ nemo jobs resume [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs get @@ -3571,7 +3573,7 @@ nemo jobs get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3583,7 +3585,7 @@ nemo jobs get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs update-status-details @@ -3608,7 +3610,7 @@ nemo jobs update-status-details [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3626,7 +3628,7 @@ nemo jobs update-status-details [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs results @@ -3672,13 +3674,13 @@ nemo jobs results create [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--job`: (required) -* `--artifact-storage-type `: (required) [possible values: fileset] +* `--artifact-storage-type <CHOICE>`: (required) [possible values: fileset] * `--artifact-url`: (required) **Help:** @@ -3692,7 +3694,7 @@ nemo jobs results create [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo jobs results download @@ -3706,13 +3708,13 @@ nemo jobs results download [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--job` -* `--output-file, -o `: Output file path +* `--output-file, -o <PATH>`: Output file path **Help:** @@ -3730,12 +3732,12 @@ nemo jobs results list [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` -* `--sort `: The field to sort by. [possible values: created_at, -created_at, updated_at, -updated_at] +* `--sort <CHOICE>`: The field to sort by. [possible values: created_at, -created_at, updated_at, -updated_at] **Help:** @@ -3743,7 +3745,7 @@ nemo jobs results list [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -3759,7 +3761,7 @@ nemo jobs results get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3772,7 +3774,7 @@ nemo jobs results get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs steps @@ -3806,14 +3808,14 @@ nemo jobs steps list [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. -* `--sort `: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, updated_at, -updated_at] +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. +* `--sort <CHOICE>`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, updated_at, -updated_at] * `--all-pages`: Fetch all pages **Filter Options:** @@ -3832,7 +3834,7 @@ Filter steps by job, status, and source. **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -3848,7 +3850,7 @@ nemo jobs steps get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3861,7 +3863,7 @@ nemo jobs steps get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo jobs steps update-status @@ -3886,13 +3888,13 @@ nemo jobs steps update-status [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--job`: (required) -* `--status `: Enumeration of possible job statuses. This enum represents the various states a job can be in during its lifecycle, from creation to a terminal state. [possible values: created, pending, active, cancelled, cancelling, error, completed, paused, pausing, resuming] +* `--status <CHOICE>`: Enumeration of possible job statuses. This enum represents the various states a job can be in during its lifecycle, from creation to a terminal state. [possible values: created, pending, active, cancelled, cancelling, error, completed, paused, pausing, resuming] * `--error-details`: Optional error details related to the status update. (JSON string) * `--status-details`: Optional status details related to the status update. (JSON string) @@ -3907,7 +3909,7 @@ nemo jobs steps update-status [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo jobs tasks @@ -3952,7 +3954,7 @@ nemo jobs tasks create-or-update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -3961,7 +3963,7 @@ nemo jobs tasks create-or-update [OPTIONS] NAME * `--step`: (required) * `--error-details`: JSON string * `--error-stack` -* `--status `: Enumeration of possible job statuses. This enum represents the various states a job can be in during its lifecycle, from creation to a terminal state. [possible values: created, pending, active, cancelled, cancelling, error, completed, paused, pausing, resuming] +* `--status <CHOICE>`: Enumeration of possible job statuses. This enum represents the various states a job can be in during its lifecycle, from creation to a terminal state. [possible values: created, pending, active, cancelled, cancelling, error, completed, paused, pausing, resuming] * `--status-details`: JSON string **Help:** @@ -3975,7 +3977,7 @@ nemo jobs tasks create-or-update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo jobs tasks list @@ -3989,7 +3991,7 @@ nemo jobs tasks list [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4002,7 +4004,7 @@ nemo jobs tasks list [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -4018,7 +4020,7 @@ nemo jobs tasks get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4032,7 +4034,7 @@ nemo jobs tasks get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ### nemo models @@ -4083,18 +4085,18 @@ nemo models create [OPTIONS] [NAME] **Arguments:** -* ``: Name of the model entity. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. +* `<NAME>`: Name of the model entity. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. **Options:** * `--workspace` * `--api-endpoint`: Data about an inference endpoint. (JSON string) -* `--backend-format `: Inference backend API wire formats understood by IGW and middleware plugins. [possible values: OPENAI_CHAT, ANTHROPIC_MESSAGES] +* `--backend-format <CHOICE>`: Inference backend API wire formats understood by IGW and middleware plugins. [possible values: OPENAI_CHAT, ANTHROPIC_MESSAGES] * `--base-model`: Link to another model which is used as a base for the current model * `--custom-fields`: Custom fields for additional metadata (JSON string) * `--description`: Optional description of the model -* `--fileset`: A set of checkpoint files, configs, and other auxiliary info associated with this model - expected format {workspace}/{fileset_name} -* `--finetuning-type `: Finetuning types. [possible values: lora_merged, all_weights, last_layer, top_layers, gradual_unfreezing, bias_only, attention_only, lora, qlora, adalora, dora, lora_plus, prompt_tuning, prefix_tuning, p_tuning, p_tuning_v2, soft_prompt, ppo, dpo, cdpo, ipo, orpo, kto, rrhf, grpo] +* `--fileset`: A set of checkpoint files, configs, and other auxiliary info associated with this model - expected format \{workspace\}/\{fileset_name\} +* `--finetuning-type <CHOICE>`: Finetuning types. [possible values: lora_merged, all_weights, last_layer, top_layers, gradual_unfreezing, bias_only, attention_only, lora, qlora, adalora, dora, lora_plus, prompt_tuning, prefix_tuning, p_tuning, p_tuning_v2, soft_prompt, ppo, dpo, cdpo, ipo, orpo, kto, rrhf, grpo] * `--model-providers`: List of ModelProvider workspace/name resource names that provide inference for this Model Entity (can be repeated) * `--ownership`: Ownership information for the model (JSON string) * `--project`: The URN of the project associated with this model entity @@ -4114,7 +4116,7 @@ nemo models create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo models delete @@ -4130,7 +4132,7 @@ nemo models delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4157,9 +4159,9 @@ nemo models list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. -* `--sort `: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: name, -name, created_at, -created_at, updated_at, -updated_at] +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. +* `--sort <CHOICE>`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: name, -name, created_at, -created_at, updated_at, -updated_at] * `--verbose`: Whether to include full spec details * `--all-pages`: Fetch all pages @@ -4167,8 +4169,8 @@ nemo models list [OPTIONS] * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter models by name, project, workspace, base_model, adapters, finetuning_type, prompt, lora_enabled, description, created_at, and updated_at. * `--filter.adapters` @@ -4187,7 +4189,7 @@ Filter models by name, project, workspace, base_model, adapters, finetuning_type **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -4206,7 +4208,7 @@ nemo models get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4219,7 +4221,7 @@ nemo models get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo models update @@ -4247,19 +4249,19 @@ nemo models update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** * `--workspace` * `--verbose`: Whether to include full spec details * `--api-endpoint`: Data about an inference endpoint. (JSON string) -* `--backend-format `: Inference backend API wire formats understood by IGW and middleware plugins. [possible values: OPENAI_CHAT, ANTHROPIC_MESSAGES] +* `--backend-format <CHOICE>`: Inference backend API wire formats understood by IGW and middleware plugins. [possible values: OPENAI_CHAT, ANTHROPIC_MESSAGES] * `--base-model`: Link to another model which is used as a base for the current model * `--custom-fields`: Custom fields for additional metadata (JSON string) * `--description`: Optional description of the model -* `--fileset`: A set of checkpoint files, configs, and other auxiliary info associated with this model - expected format {workspace}/{fileset_name} -* `--finetuning-type `: Finetuning types. [possible values: lora_merged, all_weights, last_layer, top_layers, gradual_unfreezing, bias_only, attention_only, lora, qlora, adalora, dora, lora_plus, prompt_tuning, prefix_tuning, p_tuning, p_tuning_v2, soft_prompt, ppo, dpo, cdpo, ipo, orpo, kto, rrhf, grpo] +* `--fileset`: A set of checkpoint files, configs, and other auxiliary info associated with this model - expected format \{workspace\}/\{fileset_name\} +* `--finetuning-type <CHOICE>`: Finetuning types. [possible values: lora_merged, all_weights, last_layer, top_layers, gradual_unfreezing, bias_only, attention_only, lora, qlora, adalora, dora, lora_plus, prompt_tuning, prefix_tuning, p_tuning, p_tuning_v2, soft_prompt, ppo, dpo, cdpo, ipo, orpo, kto, rrhf, grpo] * `--model-providers`: List of ModelProvider workspace/name resource names that provide inference for this Model Entity (can be repeated) * `--ownership`: Ownership information for the model (JSON string) * `--prompt`: Configuration for prompt engineering. (JSON string) @@ -4277,7 +4279,7 @@ nemo models update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo models adapters @@ -4322,14 +4324,14 @@ nemo models adapters create [OPTIONS] MODEL_NAME [NAME] **Arguments:** -* `` -* ``: Name of the adapter. Name must be unique in the workspace. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. +* `<MODEL_NAME>` +* `<NAME>`: Name of the adapter. Name must be unique in the workspace. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. **Options:** * `--workspace` -* `--fileset`: Location where adapter files are stored - expected format {workspace}/{fileset_name} -* `--finetuning-type `: Finetuning types. [possible values: lora_merged, all_weights, last_layer, top_layers, gradual_unfreezing, bias_only, attention_only, lora, qlora, adalora, dora, lora_plus, prompt_tuning, prefix_tuning, p_tuning, p_tuning_v2, soft_prompt, ppo, dpo, cdpo, ipo, orpo, kto, rrhf, grpo] +* `--fileset`: Location where adapter files are stored - expected format \{workspace\}/\{fileset_name\} +* `--finetuning-type <CHOICE>`: Finetuning types. [possible values: lora_merged, all_weights, last_layer, top_layers, gradual_unfreezing, bias_only, attention_only, lora, qlora, adalora, dora, lora_plus, prompt_tuning, prefix_tuning, p_tuning, p_tuning_v2, soft_prompt, ppo, dpo, cdpo, ipo, orpo, kto, rrhf, grpo] * `--description`: Optional description of the adapter * `--enabled`: Whether to make this adapter available for inference post training * `--lora-config`: Lora configuration specifics (JSON string) @@ -4345,7 +4347,7 @@ nemo models adapters create [OPTIONS] MODEL_NAME [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo models adapters delete @@ -4362,7 +4364,7 @@ nemo models adapters delete [OPTIONS] ADAPTER **Arguments:** -* `` +* `<ADAPTER>` **Options:** @@ -4396,7 +4398,7 @@ nemo models adapters update [OPTIONS] ADAPTER **Arguments:** -* `` +* `<ADAPTER>` **Options:** @@ -4417,7 +4419,7 @@ nemo models adapters update [OPTIONS] ADAPTER **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ### nemo secrets @@ -4455,7 +4457,7 @@ nemo secrets access [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4467,7 +4469,7 @@ nemo secrets access [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo secrets create @@ -4494,7 +4496,7 @@ nemo secrets create [OPTIONS] [NAME] **Arguments:** -* ``: The name of the secret to create +* `<NAME>`: The name of the secret to create **Options:** @@ -4509,7 +4511,7 @@ nemo secrets create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo secrets delete @@ -4523,7 +4525,7 @@ nemo secrets delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4546,8 +4548,8 @@ nemo secrets list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. * `--all-pages`: Fetch all pages **Help:** @@ -4556,7 +4558,7 @@ nemo secrets list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -4572,7 +4574,7 @@ nemo secrets get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4584,7 +4586,7 @@ nemo secrets get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo secrets update @@ -4611,7 +4613,7 @@ nemo secrets update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4626,7 +4628,7 @@ nemo secrets update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo secrets admin @@ -4662,7 +4664,7 @@ nemo secrets admin rotate-encryption-keys [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ### nemo workspaces @@ -4722,7 +4724,7 @@ nemo workspaces create [OPTIONS] [NAME] **Arguments:** -* ``: Workspace name (unique identifier). Name must start with a lowercase letter, be 2-63 characters, and contain only lowercase letters, digits, and hyphens (no consecutive hyphens, cannot end with a hyphen). +* `<NAME>`: Workspace name (unique identifier). Name must start with a lowercase letter, be 2-63 characters, and contain only lowercase letters, digits, and hyphens (no consecutive hyphens, cannot end with a hyphen). **Options:** @@ -4741,7 +4743,7 @@ nemo workspaces create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo workspaces delete @@ -4768,7 +4770,7 @@ nemo workspaces delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Help:** @@ -4803,11 +4805,11 @@ nemo workspaces list [OPTIONS] **Options:** * `--filter`: Query filter expression. Supports text and JSON syntaxes: -- Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT IN AND OR and negation prefix - +- Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT IN AND OR and negation prefix - - Object (JSON): `{"name":{"$like":"value"}}` with operators `$eq`, `$like`, `$lt`, `$lte`, `$gt`, `$gte`, `$in`, `$nin`, `$and`, `$or`, `$not` -* `--page `: Page number -* `--page-size `: Items per page -* `--sort `: Sort field [possible values: created_at, -created_at, name, -name] +* `--page <INTEGER>`: Page number +* `--page-size <INTEGER>`: Items per page +* `--sort <CHOICE>`: Sort field [possible values: created_at, -created_at, name, -name] * `--all-pages`: Fetch all pages **Help:** @@ -4816,7 +4818,7 @@ nemo workspaces list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -4839,7 +4841,7 @@ nemo workspaces get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Help:** @@ -4847,7 +4849,7 @@ nemo workspaces get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo workspaces update @@ -4878,7 +4880,7 @@ nemo workspaces update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -4895,7 +4897,7 @@ nemo workspaces update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo workspaces members @@ -4969,7 +4971,7 @@ nemo workspaces members create [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo workspaces members delete @@ -4994,7 +4996,7 @@ nemo workspaces members delete [OPTIONS] PRINCIPAL_ID **Arguments:** -* `` +* `<PRINCIPAL_ID>` **Options:** @@ -5034,7 +5036,7 @@ nemo workspaces members list [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -5073,7 +5075,7 @@ nemo workspaces members update [OPTIONS] PRINCIPAL_ID **Arguments:** -* `` +* `<PRINCIPAL_ID>` **Options:** @@ -5092,7 +5094,7 @@ nemo workspaces members update [OPTIONS] PRINCIPAL_ID **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ## Functional plugins @@ -5141,27 +5143,27 @@ nemo guardrail check [OPTIONS] * `--workspace` * `--messages`: A list of messages comprising the conversation so far (JSON string) * `--model`: The model to use for completion. Must be one of the available models. -* `--frequency-penalty `: Positive values penalize new tokens based on their existing frequency in the text. -* `--function-call`: Deprecated in favor of tool_choice. 'none' means the model will not call a function and instead generates a message. 'auto' means the model can pick between generating a message or calling a function. Specifying a particular function via {'name': 'my_function'} forces the model to call that function. (JSON string) +* `--frequency-penalty <FLOAT>`: Positive values penalize new tokens based on their existing frequency in the text. +* `--function-call`: Deprecated in favor of tool_choice. 'none' means the model will not call a function and instead generates a message. 'auto' means the model can pick between generating a message or calling a function. Specifying a particular function via \{'name': 'my_function'\} forces the model to call that function. (JSON string) * `--guardrails`: Guardrails specific options for the request. (JSON string) * `--ignore-eos`: Ignore the eos when running * `--logit-bias`: Modify the likelihood of specified tokens appearing in the completion. Maps token IDs (as strings) to bias values from -100 to 100. (JSON string) * `--logprobs`: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message -* `--max-completion-tokens `: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Preferred over max_tokens for reasoning models. -* `--max-tokens `: The maximum number of tokens that can be generated in the chat completion. -* `--n `: How many chat completion choices to generate for each input message. -* `--presence-penalty `: Positive values penalize new tokens based on whether they appear in the text so far. +* `--max-completion-tokens <INTEGER>`: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Preferred over max_tokens for reasoning models. +* `--max-tokens <INTEGER>`: The maximum number of tokens that can be generated in the chat completion. +* `--n <INTEGER>`: How many chat completion choices to generate for each input message. +* `--presence-penalty <FLOAT>`: Positive values penalize new tokens based on whether they appear in the text so far. * `--reasoning-effort`: Constrains effort on reasoning for reasoning models. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. -* `--response-format`: Format of the response. Use {'type': 'json_object'} for JSON mode or {'type': 'json_schema', 'json_schema': {...}} for structured outputs. (JSON string) -* `--seed `: If specified, attempts to sample deterministically. +* `--response-format`: Format of the response. Use \{'type': 'json_object'\} for JSON mode or \{'type': 'json_schema', 'json_schema': \{...\}\} for structured outputs. (JSON string) +* `--seed <INTEGER>`: If specified, attempts to sample deterministically. * `--stop`: Up to 4 sequences where the API will stop generating further tokens. (JSON string) * `--stream`: If set, partial message deltas will be sent, like in ChatGPT. * `--stream-options`: Options for streaming response. Only set this when stream=True. Supports include_usage to receive token usage in the final stream chunk. (JSON string) -* `--temperature `: What sampling temperature to use, between 0 and 2. +* `--temperature <FLOAT>`: What sampling temperature to use, between 0 and 2. * `--tool-choice`: Controls which (if any) tool is called by the model. 'none' means no tool is called, 'auto' lets the model decide, 'required' forces a tool call. (JSON string) * `--tools`: A list of tools the model may call. Each tool is an object with a 'type' field and a 'function' definition. (JSON string) -* `--top-logprobs `: The number of most likely tokens to return at each token position. -* `--top-p `: An alternative to sampling with temperature, called nucleus sampling. +* `--top-logprobs <INTEGER>`: The number of most likely tokens to return at each token position. +* `--top-p <FLOAT>`: An alternative to sampling with temperature, called nucleus sampling. * `--user`: A unique identifier representing your end-user, used by some providers for abuse monitoring. * `--vision`: Whether this is a vision-capable request with image inputs. @@ -5176,7 +5178,7 @@ nemo guardrail check [OPTIONS] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] #### nemo guardrail configs @@ -5223,7 +5225,7 @@ nemo guardrail configs create [OPTIONS] [NAME] **Arguments:** -* ``: The name of the guardrail config +* `<NAME>`: The name of the guardrail config **Options:** @@ -5243,7 +5245,7 @@ nemo guardrail configs create [OPTIONS] [NAME] **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo guardrail configs delete @@ -5257,7 +5259,7 @@ nemo guardrail configs delete [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -5282,17 +5284,17 @@ nemo guardrail configs list [OPTIONS] **Options:** * `--workspace` -* `--page `: Page number. -* `--page-size `: Page size. -* `--sort `: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, name, -name] +* `--page <INTEGER>`: Page number. +* `--page-size <INTEGER>`: Page size. +* `--sort <CHOICE>`: The field to sort by. To sort in decreasing order, use `-` in front of the field name. [possible values: created_at, -created_at, name, -name] * `--all-pages`: Fetch all pages **Filter Options:** * `--filter FILTER_JSON`: Use --filter with JSON for complex/nested queries, or --filter. FIELD options for simple fields. Both can be combined, with field options taking precedence. JSON-only fields: - created_at: {gte: str, lte: str} - updated_at: {gte: str, lte: str} + created_at: \{gte: str, lte: str\} + updated_at: \{gte: str, lte: str\} Filter guardrail configs by name, description, project, created_at, and updated_at. * `--filter.description` @@ -5305,7 +5307,7 @@ Filter guardrail configs by name, description, project, created_at, and updated_ **Output Options:** -* `--output-format, -f `: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] +* `--output-format, -f <CHOICE>`: Output format for the list of results. [possible values: table, json, yaml, markdown, csv, raw, code] * `--no-truncate`: Don't truncate long values in table/markdown/csv output. * `--output-columns, -c`: Columns to display: 'default', 'all', or comma-separated names. Only affects table/csv/markdown formats. @@ -5321,7 +5323,7 @@ nemo guardrail configs get [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -5333,7 +5335,7 @@ nemo guardrail configs get [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] ##### nemo guardrail configs update @@ -5359,7 +5361,7 @@ nemo guardrail configs update [OPTIONS] NAME **Arguments:** -* `` +* `<NAME>` **Options:** @@ -5378,4 +5380,4 @@ nemo guardrail configs update [OPTIONS] NAME **Output Options:** -* `--output-format, -f `: Output format for an entity. [possible values: json, yaml, raw, code] +* `--output-format, -f <CHOICE>`: Output format for an entity. [possible values: json, yaml, raw, code] diff --git a/docs/cli/troubleshooting.mdx b/docs/cli/troubleshooting.mdx index 29976d9bb9..5e59590d4d 100644 --- a/docs/cli/troubleshooting.mdx +++ b/docs/cli/troubleshooting.mdx @@ -1,5 +1,7 @@ -# Troubleshooting - +--- +title: "Troubleshooting" +description: "" +--- This page covers common issues and how to resolve them. ## Command Not Found @@ -98,5 +100,5 @@ nemo config view ## Getting Help - Use `--help` on any command for usage information -- See the [reference](reference.md) for complete command documentation -- See [configuration](configuration.md) for detailed configuration options +- See the [reference](/reference/cli-reference/full-cli-reference) for complete command documentation +- See [configuration](/reference/cli-reference/configuration) for detailed configuration options diff --git a/docs/cli/working-with-resources.mdx b/docs/cli/working-with-resources.mdx index 8ca8004f84..e1b1952770 100644 --- a/docs/cli/working-with-resources.mdx +++ b/docs/cli/working-with-resources.mdx @@ -1,5 +1,7 @@ -# Working with Resources - +--- +title: "Working with Resources" +description: "" +--- This page covers how to list, view, and create resources with the CLI. ## Output Formats diff --git a/docs/contributing/skills-spec.mdx b/docs/contributing/skills-spec.mdx index 4de427ddcd..2d4aba87f1 100644 --- a/docs/contributing/skills-spec.mdx +++ b/docs/contributing/skills-spec.mdx @@ -1,5 +1,7 @@ -# NeMo Platform Skills Spec - +--- +title: "NeMo Platform Skills Spec" +description: "" +--- **Status:** Draft. This document defines the conventions for skills shipped with NeMo Platform: how they're structured, what their frontmatter must contain, how they get tested, and where they live in the repo. @@ -45,7 +47,7 @@ Entry point: a coding agent (Claude Code, Cursor, Codex, OpenCode) opened inside |---|---|---| | 1 | `nemo-skill-selection` | Router: parses natural-language intent, picks the right downstream skill | | 2 | `nemo-explore` | Design conversation: captures goal, audience, tools, constraints | -| 3 | `nemo-spec` | Writes the design to `agents/.spec.md` | +| 3 | `nemo-spec` | Writes the design to `agents/<name>.spec.md` | | 4 | `nemo-build-agent` | Scaffolds NAT workflow YAML and deploys | | 5 | `nemo-try-agent` | Sends queries to the deployed agent | | 6 | `nemo-status` | Read-only platform health dashboard | @@ -101,7 +103,7 @@ Every skill ships with the following YAML frontmatter at the top of `SKILL.md`. Library-prefix naming (`nemo-*`) is required for user-invocable skills (skills install into shared agent catalogs alongside skills from other plugins; the prefix prevents collisions). It's optional for internal helpers. -**Canonical location:** `packages/nemo_platform_ext/src/nemo_platform_ext/skills//`. Skills there ship with `pip install nemo-platform[all]`. +**Canonical location:** `packages/nemo_platform_ext/src/nemo_platform_ext/skills/<name>/`. Skills there ship with `pip install nemo-platform[all]`. --- @@ -148,14 +150,14 @@ If user-invocable, the `nemo-*` prefix is required. If internal, optional. ### 2. Pick the name - Kebab-case. -- `nemo--` or `nemo-` for user-invocable. Verb is preferred for action skills (`nemo-build-agent`, not `nemo-agent-build`). +- `nemo-<verb>-<noun>` or `nemo-<noun>` for user-invocable. Verb is preferred for action skills (`nemo-build-agent`, not `nemo-agent-build`). - Run `nemo skills list` to confirm no existing name collides. ### 3. Pick the canonical location -- User-invocable: `packages/nemo_platform_ext/src/nemo_platform_ext/skills//` (ships with the platform package). -- Plugin-owned (skill is specific to a plugin): `plugins//src//skills//`. -- Internal-only dev skills: `.agents/skills//` (gitignored from skills install by default). +- User-invocable: `packages/nemo_platform_ext/src/nemo_platform_ext/skills/<name>/` (ships with the platform package). +- Plugin-owned (skill is specific to a plugin): `plugins/<plugin-name>/src/<plugin_module>/skills/<name>/`. +- Internal-only dev skills: `.agents/skills/<name>/` (gitignored from skills install by default). ### 4. Write the frontmatter @@ -177,7 +179,7 @@ Optional: - Lead with one-sentence purpose. - Step-by-step instructions in the order the agent should run them. - One verification step after any state-changing action. Skills must not claim success without verification. -- Lean: under 500 lines. Lift detail into `references/.md` if needed. +- Lean: under 500 lines. Lift detail into `references/<topic>.md` if needed. - Use real `nemo` CLI commands; never improvise flags. `scripts/skill-cli-lint.py` enforces this against `nemo --help`. ### 6. Write `tests.json` (four-mode routing tests) @@ -193,7 +195,7 @@ At least 3 examples per mode. `scripts/skill-test.py` runs them and fails if rou ### 7. Add `references/` only if needed -If the skill body needs reference material (template files, troubleshooting tables, deep-dive documentation), put it under `references/.md` and point at it from the body. Don't pre-emptively create `references/` for short skills. +If the skill body needs reference material (template files, troubleshooting tables, deep-dive documentation), put it under `references/<topic>.md` and point at it from the body. Don't pre-emptively create `references/` for short skills. ### 8. Local verification before PR diff --git a/docs/customizer/about.mdx b/docs/customizer/about.mdx index 05ebdca962..a659a80df7 100644 --- a/docs/customizer/about.mdx +++ b/docs/customizer/about.mdx @@ -1,7 +1,11 @@ +--- +title: "Customization Concepts" +description: "" +--- # Customization Concepts -This page provides an overview of the customization concepts for the {{platform_name}}. +This page provides an overview of the customization concepts for the NeMo Platform. ## Supervised Fine-Tuning @@ -58,7 +62,7 @@ flowchart TD style M fill:#90EE90 ``` -{{ncm_short_name}} supports supervised fine-tuning (SFT) with the PEFT method of Low-Rank Adaptation (LoRA). +NeMo Customizer supports supervised fine-tuning (SFT) with the PEFT method of Low-Rank Adaptation (LoRA). Fine-tuning with LoRA is the recommended starting point for most use cases. @@ -74,7 +78,7 @@ In comparison to full fine-tuning of a base model, Low-Rank Adaptation (LoRA) ha LoRA high level architecture: ```mermaid -{% raw %} + --- caption: Low Rank Adaptation (LoRA) --- @@ -92,7 +96,7 @@ flowchart LR class Input,API input class FW,RD,CM neural class Add add -{% endraw %} + ``` During LoRA training: @@ -117,7 +121,7 @@ Additional Resources: ## Training with Your Own Data -Use {{ncm_short_name}} to train custom models on your own data. The workflow can be carried out as follows: +Use NeMo Customizer to train custom models on your own data. The workflow can be carried out as follows: - Upload a dataset - Train a custom model - Perform inference with the trained model @@ -126,22 +130,22 @@ Use {{ncm_short_name}} to train custom models on your own data. The workflow can Long samples in the dataset are truncated during training if the total token length exceeds the context supported by the model. -!!! note - Refer to the model's documentation to see the maximum supported sequence length. +Refer to the model's documentation to see the maximum supported sequence length. | Dataset Type | Token Counting | Length Management | |--------------|----------------|-------------------| -| Prompt Completion | • Total = prompt + completion tokens | • Truncates prompt tokens to fit limits
• Filters out entries that still exceed maximum length | -| Conversational | • Total = conversation turns + template tokens
• Templates are model-specific | • Truncates tokens beyond maximum limit
• Preserves template formatting | +| Prompt Completion | • Total = prompt + completion tokens | • Truncates prompt tokens to fit limits
• Filters out entries that still exceed maximum length | +| Conversational | • Total = conversation turns + template tokens
• Templates are model-specific | • Truncates tokens beyond maximum limit
• Preserves template formatting | ### Prompt Completion Datasets Below are some examples of how you might format your dataset to perform a handful of different tasks. -!!! note - When testing models trained with prompt/completion datasets, use the `/v1/completions` endpoint instead of `/v1/chat/completions`. + +When testing models trained with prompt/completion datasets, use the `/v1/completions` endpoint instead of `/v1/chat/completions`. - For details, refer to the [Dataset Formatting tutorial](tutorials/format-training-dataset.md#format-a-prompt-completion-dataset). +For details, refer to the [Dataset Formatting tutorial](/fine-tune-models/tutorials/format-training-dataset#format-a-prompt-completion-dataset). + #### Document Classification @@ -190,7 +194,7 @@ completion: "" Most of the models support Instruction Templates for training, the expected dataset conforms with the standard [OpenAI messages format](https://platform.openai.com/docs/guides/fine-tuning#multi-turn-chat-examples). Additionally, some models support tool calling which have additional optional parameters of `tools` at the top level of each entry and `tool_calls` per message. -For more information refer to our [in-depth instructions](tutorials/format-training-dataset.md#format-a-conversation-dataset). +For more information refer to our [in-depth instructions](/fine-tune-models/tutorials/format-training-dataset#format-a-conversation-dataset). ## Hyperparameters @@ -208,22 +212,23 @@ Hyperparameters are configuration settings used to control the training process. ## Parallelism -{{platform_name}} Customizer supports various distributed training parallelization methods, which can be mixed together. +NeMo Platform Customizer supports various distributed training parallelization methods, which can be mixed together. ### Tensor Parallelism [Tensor Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#tensor-parallelism) (TP) distributes the parameter tensor of an individual layer across GPUs. In addition to reducing model state memory usage, it also saves activation memory as the per-GPU tensor sizes shrink. The tradeoff is increased CPU overhead. -TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](manage-customization-jobs/hyperparameters.md). +TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/fine-tune-models/manage-jobs/training-configuration). -!!! note - As of release 25.10.0, AutoModel engines including Phi-4, Qwen, and Gemma support tensor parallelism greater than 1 through the multi-GPU LoRA patch. Previous releases only supported `TP=1` for these models. + +As of release 25.10.0, AutoModel engines including Phi-4, Qwen, and Gemma support tensor parallelism greater than 1 through the multi-GPU LoRA patch. Previous releases only supported `TP=1` for these models. + ### Pipeline Parallelism [Pipeline Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#pipeline-parallelism) (PP) distributes the layers of a neural network across GPUs. The GPUs then process the different layers sequentially. -PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](manage-customization-jobs/hyperparameters.md). +PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/fine-tune-models/manage-jobs/training-configuration). #### Configuration @@ -241,7 +246,7 @@ PP can be configured via `parallelism.pipeline_parallel_size` in the [training c [Sequence Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#sequence-parallelism) (SP) extends tensor-level model parallelism by distributing computing load and activation memory across multiple GPUs along the sequence dimension of transformer layers. This method is particularly useful when training on the datasets with longer sequences. It also benefits portions of the layer that have previously not been parallelized, enhancing overall model performance and efficiency. -Sequence Parallelism can be enabled/disabled using `parallelism.sequence_parallel` in the [training configuration](manage-customization-jobs/hyperparameters.md). +Sequence Parallelism can be enabled/disabled using `parallelism.sequence_parallel` in the [training configuration](/fine-tune-models/manage-jobs/training-configuration). ## Sequence Packing @@ -265,8 +270,9 @@ When enabled, the `batch_size` and number of training steps update so that each - Chat prompt templates do not have support for sequence packing. -!!! note - If `training.sequence_packing` is enabled when using a model that does not support sequence packing, the fine-tuning will proceed _without_ sequence packing and a warning will be returned in the API response. + +If `training.sequence_packing` is enabled when using a model that does not support sequence packing, the fine-tuning will proceed _without_ sequence packing and a warning will be returned in the API response. + ### Example of using in the API @@ -291,4 +297,4 @@ job = client.customization.jobs.create( ) ``` -Learn how to create a LoRA customization job with sequence packing by following the [Optimizing for Tokens/GPU](tutorials/optimize-throughput.ipynb) tutorial. +Learn how to create a LoRA customization job with sequence packing by following the Optimizing for Tokens/GPU tutorial. diff --git a/docs/customizer/index.mdx b/docs/customizer/index.mdx index c55f165660..fc5663245d 100644 --- a/docs/customizer/index.mdx +++ b/docs/customizer/index.mdx @@ -1,5 +1,7 @@ -# About Fine-Tuning - +--- +title: "About Fine-Tuning" +description: "" +--- Learn how to fine-tune models by making requests to NVIDIA NeMo Customizer through the API. Fine-tuned models you have created can be deployed using NVIDIA NIMs. @@ -8,15 +10,15 @@ Learn how to fine-tune models by making requests to NVIDIA NeMo Customizer throu At a high level, the fine-tuning workflow consists of the following steps: -1. [Create a Model Entity](manage-model-entities/index.md) pointing to your base model checkpoint (stored as a FileSet). -1. Format a compatible [dataset](tutorials/format-training-dataset.md). -1. [Create a customization job](manage-customization-jobs/index.md) referencing the Model Entity. +1. [Create a Model Entity](/fine-tune-models/manage-model-entities/overview) pointing to your base model checkpoint (stored as a FileSet). +1. Format a compatible [dataset](/fine-tune-models/tutorials/format-training-dataset). +1. [Create a customization job](/fine-tune-models/manage-jobs/overview) referencing the Model Entity. 1. Monitor the job until it completes. 1. The customization job automatically creates either: - **LoRA jobs**: An adapter attached to the original Model Entity - **Full fine-tuning jobs**: A new Model Entity with the customized weights -1. [Deploy the model](../run-inference/about.md) using the Deployment Management Service. -1. Move on to [Evaluate the output model](../evaluator/index.md). +1. [Deploy the model](/models-and-inference/about) using the Deployment Management Service. +1. Move on to [Evaluate the output model](/evaluation/about). --- @@ -27,31 +29,31 @@ Explore the model families and sizes supported by NVIDIA NeMo Customizer.
-- **[Llama Models](models/llama.md)** +- **[Llama Models](/fine-tune-models/models/llama)** --- View the available Llama models in the model catalog. -- **[Llama Nemotron Models](models/llama-nemotron.md)** +- **[Llama Nemotron Models](/fine-tune-models/models/llama-nemotron)** --- View the available Llama Nemotron models from NVIDIA, including Nano and Super variants for efficient and advanced instruction tuning. -- **[Phi Models](models/phi.md)** +- **[Phi Models](/fine-tune-models/models/phi)** --- View the available Phi models from Microsoft, designed for strong reasoning capabilities with efficient deployment. -- **[GPT-OSS Models](models/gpt-oss.md)** +- **[GPT-OSS Models](/fine-tune-models/models/gpt-oss)** --- View the available GPT-OSS models supported for Full SFT customization. -- **[Embedding Models](models/embedding.md)** +- **[Embedding Models](/fine-tune-models/models/embedding)** --- @@ -65,19 +67,19 @@ Perform common fine-tuning tasks.
-- **[Manage Customization Jobs](manage-customization-jobs/index.md)** +- **[Manage Customization Jobs](/fine-tune-models/manage-jobs/overview)** --- Create, list, view, and cancel customization jobs. -- **[Manage Model Entities](manage-model-entities/index.md)** +- **[Manage Model Entities](/fine-tune-models/manage-model-entities/overview)** --- Create FileSets and Model Entities to prepare base models for customization. -- **[Manage Datasets](../get-started/concepts/manage-files.md)** +- **[Manage Datasets](/get-started/core-concepts/manage-files)** --- @@ -93,7 +95,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks.
-- **[Format Training Datasets](tutorials/format-training-dataset.md)** +- **[Format Training Datasets](/fine-tune-models/tutorials/format-training-dataset)** --- @@ -101,7 +103,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks. datasets chat-models completion-models -- **[Start a LoRA Customization Job](tutorials/lora-customization-job.ipynb)** +- **Start a LoRA Customization Job** --- @@ -109,7 +111,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks. nemo-customizer -- **[Start a Full SFT Customization Job](tutorials/sft-customization-job.ipynb)** +- **Start a Full SFT Customization Job** --- @@ -117,7 +119,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks. nemo-customizer -- **[Align a Model with DPO](tutorials/dpo-customization-job.ipynb)** +- **Align a Model with DPO** --- @@ -125,7 +127,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks. nemo-customizer dpo -- **[Distill a Model with Knowledge Distillation](tutorials/distillation-customization-job.ipynb)** +- **Distill a Model with Knowledge Distillation** --- @@ -133,7 +135,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks. nemo-customizer knowledge-distillation -- **[Check Customization Job Metrics](tutorials/metrics.md)** +- **[Check Customization Job Metrics](/fine-tune-models/tutorials/job-metrics)** --- @@ -141,7 +143,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks. nemo-customizer mlflow wandb -- **[Optimize Tokens per GPU](tutorials/optimize-throughput.ipynb)** +- **Optimize Tokens per GPU** --- @@ -157,19 +159,19 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks.
-- **[Hyperparameters](manage-customization-jobs/hyperparameters.md)** +- **[Hyperparameters](/fine-tune-models/manage-jobs/training-configuration)** --- View the available hyperparameters and their valid options that you can set when creating a customization job. -- **[Customizer API](../api/index.md#tag-customizer)** +- **[Customizer API](/reference/api-reference#tag-customizer)** --- View the OpenAPI specification for Customizer. -- **[Troubleshoot Failed Jobs](../troubleshooting/customizer.md)** +- **[Troubleshoot Failed Jobs](/reference/troubleshooting/customizer)** --- diff --git a/docs/customizer/manage-customization-jobs/cancel-job.mdx b/docs/customizer/manage-customization-jobs/cancel-job.mdx index 14d0d403bc..3e6c52fbb9 100644 --- a/docs/customizer/manage-customization-jobs/cancel-job.mdx +++ b/docs/customizer/manage-customization-jobs/cancel-job.mdx @@ -1,12 +1,14 @@ -# Cancel Job - +--- +title: "Cancel Job" +description: "" +--- ## Prerequisites Before you can cancel a customization job, make sure that you have: -- Obtained the base URL of your {{platform_name}}. -- Set the `NMP_BASE_URL` environment variable to your {{platform_name}} endpoint +- Obtained the base URL of your NeMo Platform. +- Set the `NMP_BASE_URL` environment variable to your NeMo Platform endpoint ```bash export NMP_BASE_URL="https://your-nmp-base-url" @@ -16,7 +18,7 @@ export NMP_BASE_URL="https://your-nmp-base-url" ## To Cancel a Customization Job -Running jobs may be cancelled. A cancelled job does not upload checkpoints. You need the job's name and workspace; you can get these from [List Active Jobs](list-active-jobs.md). +Running jobs may be cancelled. A cancelled job does not upload checkpoints. You need the job's name and workspace; you can get these from [List Active Jobs](/fine-tune-models/manage-jobs/list-active-jobs). Use the SDK to cancel a customization job: @@ -40,37 +42,35 @@ print(f"Current status: {cancelled_job.status}") print(f"Updated at: {cancelled_job.updated_at}") ``` -??? "Example Response" - :icon: code-square - :open: - - ```json - { - "name": "my-sft-job", - "workspace": "default", - "id": "job-abc123def456", - "status": "cancelled", - "spec": { - "model": "default/llama-3-2-1b", - "dataset": "fileset://default/my-training-dataset", - "training": { - "type": "sft", - "batch_size": 16, - "epochs": 3, - "learning_rate": 1e-05, - "max_seq_length": 4096, - "parallelism": { - "num_gpus_per_node": 2, - "tensor_parallel_size": 2 - } - }, - "output": { - "name": "my-finetuned-llama", - "type": "model", - "fileset": "my-finetuned-llama-a1b2c3d4e5f6" - } - }, - "created_at": "2026-02-09T10:30:00.000Z", - "updated_at": "2026-02-09T10:35:00.000Z" + +```json +{ + "name": "my-sft-job", + "workspace": "default", + "id": "job-abc123def456", + "status": "cancelled", + "spec": { + "model": "default/llama-3-2-1b", + "dataset": "fileset://default/my-training-dataset", + "training": { + "type": "sft", + "batch_size": 16, + "epochs": 3, + "learning_rate": 1e-05, + "max_seq_length": 4096, + "parallelism": { + "num_gpus_per_node": 2, + "tensor_parallel_size": 2 + } + }, + "output": { + "name": "my-finetuned-llama", + "type": "model", + "fileset": "my-finetuned-llama-a1b2c3d4e5f6" } - ``` + }, + "created_at": "2026-02-09T10:30:00.000Z", + "updated_at": "2026-02-09T10:35:00.000Z" +} +``` + \ No newline at end of file diff --git a/docs/customizer/manage-customization-jobs/create-job.mdx b/docs/customizer/manage-customization-jobs/create-job.mdx index d51469aba6..917648819a 100644 --- a/docs/customizer/manage-customization-jobs/create-job.mdx +++ b/docs/customizer/manage-customization-jobs/create-job.mdx @@ -1,16 +1,18 @@ -# Create Job - +--- +title: "Create Job" +description: "" +--- ## Prerequisites Before you can create a customization job, make sure that you have: -- Obtained the base URL of your {{platform_name}}. -- Created a [FileSet and Model Entity](../manage-model-entities/index.md) for your base model. -- [Uploaded a dataset](../../get-started/concepts/manage-files.md) as a FileSet. -- Determined the [training configuration](hyperparameters.md) you want to use for the customization job. -- Verified that the platform has sufficient storage for the job. Full SFT jobs require approximately 3× the base model size in free disk space; LoRA jobs require approximately 1.5×. See [ft-tut-understand-models](../tutorials/understand-configurations-and-models.md) for details. If you are also deploying the model from a base checkpoint fileset, plan for ~2.5× model size overall for LoRA. -- Set the `NMP_BASE_URL` environment variable to your {{platform_name}} endpoint. +- Obtained the base URL of your NeMo Platform. +- Created a [FileSet and Model Entity](/fine-tune-models/manage-model-entities/overview) for your base model. +- [Uploaded a dataset](/get-started/core-concepts/manage-files) as a FileSet. +- Determined the [training configuration](/fine-tune-models/manage-jobs/training-configuration) you want to use for the customization job. +- Verified that the platform has sufficient storage for the job. Full SFT jobs require approximately 3× the base model size in free disk space; LoRA jobs require approximately 1.5×. See [ft-tut-understand-models](/fine-tune-models/tutorials/understanding-models-and-training) for details. If you are also deploying the model from a base checkpoint fileset, plan for ~2.5× model size overall for LoRA. +- Set the `NMP_BASE_URL` environment variable to your NeMo Platform endpoint. ```bash export NMP_BASE_URL="https://your-nemo-platform-url" @@ -60,42 +62,40 @@ print(f"Created job: {job.name}") print(f"Job status: {job.status}") ``` -??? "Example Response" - :icon: code-square - :open: - - ```json - { - "name": "my-lora-job", - "workspace": "default", - "id": "job-abc123def456", - "status": "created", - "spec": { - "model": "default/llama-3-2-1b", - "dataset": "fileset://default/my-training-dataset", - "training": { - "type": "sft", - "peft": { - "type": "lora", - "rank": 8, - "alpha": 32, - "dropout": 0.0 - }, - "batch_size": 32, - "epochs": 3, - "learning_rate": 0.0001, - "max_seq_length": 2048 - }, - "output": { - "name": "my-custom-model", - "type": "adapter", - "fileset": "my-custom-model-a1b2c3d4e5f6" - } + +```json +{ + "name": "my-lora-job", + "workspace": "default", + "id": "job-abc123def456", + "status": "created", + "spec": { + "model": "default/llama-3-2-1b", + "dataset": "fileset://default/my-training-dataset", + "training": { + "type": "sft", + "peft": { + "type": "lora", + "rank": 8, + "alpha": 32, + "dropout": 0.0 }, - "created_at": "2026-02-09T10:30:00.000Z", - "updated_at": "2026-02-09T10:30:00.000Z" + "batch_size": 32, + "epochs": 3, + "learning_rate": 0.0001, + "max_seq_length": 2048 + }, + "output": { + "name": "my-custom-model", + "type": "adapter", + "fileset": "my-custom-model-a1b2c3d4e5f6" } - ``` + }, + "created_at": "2026-02-09T10:30:00.000Z", + "updated_at": "2026-02-09T10:30:00.000Z" +} +``` + --- @@ -126,14 +126,15 @@ job = client.customization.jobs.create( ) ``` -!!! note - See [Knowledge Distillation constraints](hyperparameters.md) for requirements on model compatibility, tokenizer, and GPU memory. + +See [Knowledge Distillation constraints](/fine-tune-models/manage-jobs/training-configuration) for requirements on model compatibility, tokenizer, and GPU memory. + --- ## Job Configuration Reference -The job spec contains the model, dataset, training, output, deployment, and integration settings for the job. For key fields, the complete API schema, and W&B or MLflow integration options, see [Customization Job Reference](customization-job-reference.md). +The job spec contains the model, dataset, training, output, deployment, and integration settings for the job. For key fields, the complete API schema, and W&B or MLflow integration options, see [Customization Job Reference](/fine-tune-models/manage-jobs/customization-job-reference). --- diff --git a/docs/customizer/manage-customization-jobs/customization-job-reference.mdx b/docs/customizer/manage-customization-jobs/customization-job-reference.mdx index 14705aa0be..aef1670f71 100644 --- a/docs/customizer/manage-customization-jobs/customization-job-reference.mdx +++ b/docs/customizer/manage-customization-jobs/customization-job-reference.mdx @@ -1,9 +1,11 @@ -# Customization Job Reference - +--- +title: "Customization Job Reference" +description: "" +--- Use this page when you need field-level details for customization job specifications, the complete API schema, or integration options. -For concepts, see [Customization Job overview](index.md). +For concepts, see [Customization Job overview](/fine-tune-models/manage-jobs/overview). ## Key Fields @@ -15,7 +17,7 @@ All job configuration (model, dataset, training, and output) is specified in the | `workspace` | Yes | Workspace where the job runs. Determines what datasets and models are authorized to be used in the job. | | `spec.model` | Yes | Reference to the Model Entity (`workspace/name` format) | | `spec.dataset` | Yes | Dataset URI (`fileset://workspace/name`) | -| `spec.training` | Yes | Training method and hyperparameters (see [Training Configuration](hyperparameters.md)) | +| `spec.training` | Yes | Training method and hyperparameters (see [Training Configuration](/fine-tune-models/manage-jobs/training-configuration)) | | `spec.training.type` | Yes | Training method: `sft`, `distillation`, or `dpo` | | `spec.training.peft` | No | PEFT adapter configuration (e.g., `{"type": "lora", ...}`). Omit for full-weight training | | `spec.output` | No | Output artifact configuration (`{"name": "..."}`). Auto-generated if not provided | @@ -25,7 +27,7 @@ All job configuration (model, dataset, training, and output) is specified in the ## Complete API Reference -For generated REST API details, see the [Customizer API Reference](../../api/index.md#tag-customizer) and +For generated REST API details, see the [Customizer API Reference](/reference/api-reference#tag-customizer) and search for `CustomizationJobInput`. --- @@ -34,51 +36,51 @@ search for `CustomizationJobInput`. To enable W&B integration, add the `integrations` configuration: -=== "Python" - - ```python - job = client.customization.jobs.create( - name="my-job", - workspace="default", - spec={ - "model": "default/llama-3-2-1b", - "dataset": "fileset://default/my-dataset", - "training": {"type": "sft", "peft": {"type": "lora"}, "epochs": 3}, - "integrations": { - "wandb": { - "project": "my-finetuning-project", - "entity": "my-team", - "tags": ["fine-tuning", "llama"], - "api_key_secret": "my-wandb-key", - } - }, - }, - ) - ``` - -=== "CLI" - - ```bash - nemo customization jobs create my-job \ - --workspace default \ - --spec '{ + + +```python +job = client.customization.jobs.create( + name="my-job", + workspace="default", + spec={ "model": "default/llama-3-2-1b", "dataset": "fileset://default/my-dataset", "training": {"type": "sft", "peft": {"type": "lora"}, "epochs": 3}, "integrations": { - "wandb": { - "project": "my-finetuning-project", - "entity": "my-team", - "tags": ["fine-tuning", "llama"], - "api_key_secret": "my-wandb-key" - } - } - }' - ``` - + "wandb": { + "project": "my-finetuning-project", + "entity": "my-team", + "tags": ["fine-tuning", "llama"], + "api_key_secret": "my-wandb-key", + } + }, + }, +) +``` + + +```bash +nemo customization jobs create my-job \ + --workspace default \ + --spec '{ + "model": "default/llama-3-2-1b", + "dataset": "fileset://default/my-dataset", + "training": {"type": "sft", "peft": {"type": "lora"}, "epochs": 3}, + "integrations": { + "wandb": { + "project": "my-finetuning-project", + "entity": "my-team", + "tags": ["fine-tuning", "llama"], + "api_key_secret": "my-wandb-key" + } + } + }' +``` + + The `api_key_secret` field references a stored secret containing your `WANDB_API_KEY`. Use the secret name (e.g., `"my-wandb-key"`) to resolve it from the request workspace. -To create the secret, see [Weights & Biases Keys](../../get-started/concepts/manage-secrets.md). +To create the secret, see [Weights & Biases Keys](/get-started/core-concepts/manage-secrets). | Field | Description | |-------|-------------| @@ -90,7 +92,7 @@ To create the secret, see [Weights & Biases Keys](../../get-started/concepts/man | `notes` | Notes or description for the run | | `base_url` | Base URL for self-hosted W&B servers (e.g., `https://wandb.mycompany.com`). Omit to use W&B cloud | -To view your training metrics in W&B after the job starts, see [ft-tut-metrics-wandb](../tutorials/metrics.md). +To view your training metrics in W&B after the job starts, see [ft-tut-metrics-wandb](/fine-tune-models/tutorials/job-metrics). --- @@ -98,44 +100,44 @@ To view your training metrics in W&B after the job starts, see [ft-tut-metrics-w To enable MLflow integration: -=== "Python" - - ```python - job = client.customization.jobs.create( - name="my-job", - workspace="default", - spec={ - "model": "default/llama-3-2-1b", - "dataset": "fileset://default/my-dataset", - "training": {"type": "sft", "peft": {"type": "lora"}, "epochs": 3}, - "integrations": { - "mlflow": { - "experiment_name": "llama-finetuning", - "tracking_uri": "http://mlflow.example.com:5000", - } - }, - }, - ) - ``` - -=== "CLI" - - ```bash - nemo customization jobs create my-job \ - --workspace default \ - --spec '{ + + +```python +job = client.customization.jobs.create( + name="my-job", + workspace="default", + spec={ "model": "default/llama-3-2-1b", "dataset": "fileset://default/my-dataset", "training": {"type": "sft", "peft": {"type": "lora"}, "epochs": 3}, "integrations": { - "mlflow": { - "experiment_name": "llama-finetuning", - "tracking_uri": "http://mlflow.example.com:5000" - } - } - }' - ``` - + "mlflow": { + "experiment_name": "llama-finetuning", + "tracking_uri": "http://mlflow.example.com:5000", + } + }, + }, +) +``` + + +```bash +nemo customization jobs create my-job \ + --workspace default \ + --spec '{ + "model": "default/llama-3-2-1b", + "dataset": "fileset://default/my-dataset", + "training": {"type": "sft", "peft": {"type": "lora"}, "epochs": 3}, + "integrations": { + "mlflow": { + "experiment_name": "llama-finetuning", + "tracking_uri": "http://mlflow.example.com:5000" + } + } + }' +``` + + | Field | Description | |-------|-------------| | `experiment_name` | MLflow experiment name. Defaults to `output.name` if not set | @@ -146,7 +148,7 @@ To enable MLflow integration: ## Next Steps -- [Create a customization job](create-job.md): Start a job with a model, dataset, training configuration, and optional integrations. -- [Monitor training metrics](../tutorials/metrics.md): View logs and metrics through MLflow or W&B. -- [Manage secrets](../../get-started/concepts/manage-secrets.md): Store credentials such as W&B API keys and provider tokens. -- [Troubleshooting MLflow integrations](../../troubleshooting/customizer.md): Diagnose failed or misconfigured customization jobs. +- [Create a customization job](/fine-tune-models/manage-jobs/create-job): Start a job with a model, dataset, training configuration, and optional integrations. +- [Monitor training metrics](/fine-tune-models/tutorials/job-metrics): View logs and metrics through MLflow or W&B. +- [Manage secrets](/get-started/core-concepts/manage-secrets): Store credentials such as W&B API keys and provider tokens. +- [Troubleshooting MLflow integrations](/reference/troubleshooting/customizer): Diagnose failed or misconfigured customization jobs. diff --git a/docs/customizer/manage-customization-jobs/get-job-status.mdx b/docs/customizer/manage-customization-jobs/get-job-status.mdx index 0744df61af..094e16e888 100644 --- a/docs/customizer/manage-customization-jobs/get-job-status.mdx +++ b/docs/customizer/manage-customization-jobs/get-job-status.mdx @@ -1,23 +1,26 @@ -# Get Job Status - +--- +title: "Get Job Status" +description: "" +--- Get detailed execution status for a customization job, including step-by-step progress and real-time training metrics. -!!! tip - This endpoint provides granular execution details including: + +This endpoint provides granular execution details including: - - **Step-level status**: `model-and-dataset-download` → `customization-training-job` → `model-upload` → `model-entity-creation` - - **Training metrics**: `step`, `epoch`, `loss`, `lr` (learning rate), `grad_norm`, `val_loss` - - **Progress tracking**: `downloaded_files`, `uploaded_bytes`, `progress_pct` +- **Step-level status**: `model-and-dataset-download` → `customization-training-job` → `model-upload` → `model-entity-creation` +- **Training metrics**: `step`, `epoch`, `loss`, `lr` (learning rate), `grad_norm`, `val_loss` +- **Progress tracking**: `downloaded_files`, `uploaded_bytes`, `progress_pct` - To list jobs or get job definitions (model entity, hyperparameters, spec), use [List Active Jobs](list-active-jobs.md) instead. +To list jobs or get job definitions (model entity, hyperparameters, spec), use [List Active Jobs](/fine-tune-models/manage-jobs/list-active-jobs) instead. + ## Prerequisites Before you can get the status of a customization job, make sure that you have: -- Obtained the base URL of your {{platform_name}}. -- Set the `NMP_BASE_URL` environment variable to your {{platform_name}} endpoint +- Obtained the base URL of your NeMo Platform. +- Set the `NMP_BASE_URL` environment variable to your NeMo Platform endpoint ```bash export NMP_BASE_URL="https://your-nmp-base-url" @@ -58,7 +61,8 @@ for step in status.steps or []: print(f" Progress: {current_step}/{max_steps}") ``` -???+ "Example Response" + + **Active Job (Training in Progress)** diff --git a/docs/customizer/manage-customization-jobs/hyperparameters.mdx b/docs/customizer/manage-customization-jobs/hyperparameters.mdx index 716b121057..11d9118c91 100644 --- a/docs/customizer/manage-customization-jobs/hyperparameters.mdx +++ b/docs/customizer/manage-customization-jobs/hyperparameters.mdx @@ -1,12 +1,15 @@ -# Training Configuration - +--- +title: "Training Configuration" +description: "" +--- -!!! tip - Want to learn about training concepts at a high level? Check out the [Customization concepts](../about.md) page. + +Want to learn about training concepts at a high level? Check out the [Customization concepts](/fine-tune-models/customization-concepts) page. + ## Complete Schema Reference -For generated REST API details, see the [Customizer API Reference](../../api/index.md#tag-customizer) and +For generated REST API details, see the [Customizer API Reference](/reference/api-reference#tag-customizer) and search for `CustomizationJobInput`. Training is configured in the job's `spec.training` object. @@ -22,7 +25,7 @@ The `training` field is a discriminated union on the `type` field. Each training | `training.type` | `sft`, `dpo`, `distillation` | Training method (discriminated union) | | `training.peft` | `{ type: "lora", rank: 8, ... }` or omit | PEFT adapter configuration. If set, trains an adapter; if omitted, performs full-weight training | -For generated SFT schema details, see the [Customizer API Reference](../../api/index.md#tag-customizer) +For generated SFT schema details, see the [Customizer API Reference](/reference/api-reference#tag-customizer) and search for `SFTTrainingInput`. ### DPO Configuration @@ -37,14 +40,16 @@ When `training.type` is `"dpo"`, additional DPO-specific fields are available: | `preference_loss_weight` | Weight for preference loss | `1.0` | | `sft_loss_weight` | Weight for SFT loss | `0.0` | -For generated DPO schema details, see the [Customizer API Reference](../../api/index.md#tag-customizer) +For generated DPO schema details, see the [Customizer API Reference](/reference/api-reference#tag-customizer) and search for `DPOTrainingInput`. -!!! note - PEFT (LoRA) is not yet supported with DPO training. Use full-weight training by omitting the `peft` field. + +PEFT (LoRA) is not yet supported with DPO training. Use full-weight training by omitting the `peft` field. + -!!! tip - When setting `val_check_interval` for DPO, use a fractional value (e.g., `0.5` for twice per epoch) or omit it entirely (validates once at end of epoch). Avoid integer step counts — they may not divide evenly into the total training steps, which can prevent validation from running on the final step. + +When setting `val_check_interval` for DPO, use a fractional value (e.g., `0.5` for twice per epoch) or omit it entirely (validates once at end of epoch). Avoid integer step counts — they may not divide evenly into the total training steps, which can prevent validation from running on the final step. + ### Parallelism Configuration @@ -60,10 +65,11 @@ Parallelism parameters are grouped inside `training.parallelism`: | `parallelism.expert_parallel_size` | Expert parallelism for MoE models | Must divide number of experts | | `parallelism.sequence_parallel` | Enable sequence parallelism | Memory optimization for long sequences | -!!! note - **GPU Relationship**: `total_gpus = num_gpus_per_node x num_nodes` + +**GPU Relationship**: `total_gpus = num_gpus_per_node x num_nodes` - `data_parallel_size` is automatically derived as `total_gpus / (TP × PP × CP)`. +`data_parallel_size` is automatically derived as `total_gpus / (TP × PP × CP)`. + ### PEFT / LoRA Configuration @@ -104,15 +110,16 @@ When `training.type` is `"distillation"`, additional KD-specific fields are avai | `distillation_temperature` | Softmax temperature for KD. Higher = softer probability distributions | `1.0–5.0` (start with `2.0`) | For generated distillation schema details, see the -[Customizer API Reference](../../api/index.md#tag-customizer) and search for +[Customizer API Reference](/reference/api-reference#tag-customizer) and search for `DistillationTrainingInput`. -!!! note - - Knowledge distillation uses **logit-pair distillation only** — the student learns to match the teacher's output probability distribution. - - Both student and teacher models must be **full-weight Model Entities**. LoRA adapters cannot be used as teacher models. - - Student and teacher must **share the same tokenizer and vocabulary**. Use models from the same family (e.g., Llama 3.2 1B Instruct + Llama 3.2 3B Instruct). - - Both models are loaded during training. Plan GPU memory accordingly. + +- Knowledge distillation uses **logit-pair distillation only** — the student learns to match the teacher's output probability distribution. +- Both student and teacher models must be **full-weight Model Entities**. LoRA adapters cannot be used as teacher models. +- Student and teacher must **share the same tokenizer and vocabulary**. Use models from the same family (e.g., Llama 3.2 1B Instruct + Llama 3.2 3B Instruct). +- Both models are loaded during training. Plan GPU memory accordingly. + --- @@ -172,5 +179,6 @@ Estimated GPU requirements by model size: | 13B | 80GB | 4 × 80GB | | 70B | 2 × 80GB | 8+ × 80GB | -!!! tip - Use LoRA for most fine-tuning tasks. It's significantly more memory-efficient and often achieves comparable results to full fine-tuning. + +Use LoRA for most fine-tuning tasks. It's significantly more memory-efficient and often achieves comparable results to full fine-tuning. + diff --git a/docs/customizer/manage-customization-jobs/index.mdx b/docs/customizer/manage-customization-jobs/index.mdx index 93a2d7946d..aa8888fc60 100644 --- a/docs/customizer/manage-customization-jobs/index.mdx +++ b/docs/customizer/manage-customization-jobs/index.mdx @@ -1,7 +1,9 @@ -# Manage Customization Jobs - +--- +title: "Manage Customization Jobs" +description: "" +--- -Use customization jobs to fine-tune a [model](../models/index.md) using a [dataset](../../get-started/concepts/manage-files.md) and [hyperparameters](hyperparameters.md). +Use customization jobs to fine-tune a [model](/fine-tune-models/models/model-catalog) using a [dataset](/get-started/core-concepts/manage-files) and [hyperparameters](/fine-tune-models/manage-jobs/training-configuration). ## How It Works @@ -14,7 +16,7 @@ This design keeps adapters organized with their parent models and simplifies dep ## Prerequisites -Before you can customize a model using a customization job, make sure that you have `prepared and uploaded a dataset <../tutorials/format-training-dataset>` to the dataset repository. See also [format-training-dataset](../tutorials/format-training-dataset.md) for dataset formatting requirements. +Before you can customize a model using a customization job, make sure that you have `prepared and uploaded a dataset <../tutorials/format-training-dataset>` to the dataset repository. See also [format-training-dataset](/fine-tune-models/tutorials/format-training-dataset) for dataset formatting requirements. --- @@ -22,30 +24,31 @@ Before you can customize a model using a customization job, make sure that you h Perform common customization job tasks. -!!! tip - The value for `NMP_BASE_URL` will depend on your deployment. After the standard [Setup](../../get-started/setup.md) flow, the default local URL is `http://localhost:8080`. Otherwise, consult with your cluster administrator. + +The value for `NMP_BASE_URL` will depend on your deployment. After the standard [Setup](/get-started/setup) flow, the default local URL is `http://localhost:8080`. Otherwise, consult with your cluster administrator. +
-- **[Create a Customization Job](create-job.md)** +- **[Create a Customization Job](/fine-tune-models/manage-jobs/create-job)** --- Create a customization job using SFT, DPO, or Knowledge Distillation. -- **[Get Job Status](get-job-status.md)** +- **[Get Job Status](/fine-tune-models/manage-jobs/get-job-status)** --- Check the status of a customization job. -- **[List Active Jobs](list-active-jobs.md)** +- **[List Active Jobs](/fine-tune-models/manage-jobs/list-active-jobs)** --- List all active customization jobs to find a job name for use with Get Status or Cancel. -- **[Cancel a Job](cancel-job.md)** +- **[Cancel a Job](/fine-tune-models/manage-jobs/cancel-job)** --- @@ -59,13 +62,13 @@ Refer to the following pages for more information on customization jobs.
-- **[Hyperparameters](hyperparameters.md)** +- **[Hyperparameters](/fine-tune-models/manage-jobs/training-configuration)** --- Review the hyperparameters that you can use to customize a model. -- **[Troubleshoot Failed Jobs](../../troubleshooting/customizer.md)** +- **[Troubleshoot Failed Jobs](/reference/troubleshooting/customizer)** --- diff --git a/docs/customizer/manage-customization-jobs/list-active-jobs.mdx b/docs/customizer/manage-customization-jobs/list-active-jobs.mdx index a813c8b7a8..30edf8c7a1 100644 --- a/docs/customizer/manage-customization-jobs/list-active-jobs.mdx +++ b/docs/customizer/manage-customization-jobs/list-active-jobs.mdx @@ -1,17 +1,20 @@ -# List Active Jobs - +--- +title: "List Active Jobs" +description: "" +--- List all customization jobs and their high-level status. This returns job definitions including the model, dataset, training configuration, and overall status. -!!! tip - To get **detailed execution progress** (step-by-step status, training metrics like loss/epoch/step), use [Get Job Status](get-job-status.md) instead. + +To get **detailed execution progress** (step-by-step status, training metrics like loss/epoch/step), use [Get Job Status](/fine-tune-models/manage-jobs/get-job-status) instead. + ## Prerequisites Before you can list active customization jobs, make sure that you have: -- Obtained the base URL of your {{platform_name}}. -- Set the `NMP_BASE_URL` environment variable to your {{platform_name}} endpoint +- Obtained the base URL of your NeMo Platform. +- Set the `NMP_BASE_URL` environment variable to your NeMo Platform endpoint ```bash export NMP_BASE_URL="https://your-nmp-base-url" @@ -58,59 +61,57 @@ for job in filtered_jobs.data: print(f"Job {job.name}: {job.status}") ``` -??? "Example Response" - :icon: code-square - :open: - - ```json + +```json +{ + "data": [ { - "data": [ - { - "id": "platform-job-QtyhRY5ub4t4tTLPY4sTkz", - "name": "my-sft-job-99da", - "workspace": "default", - "created_at": "2026-02-09T22:12:45", - "updated_at": "2026-02-09T22:12:45", - "status": "active", - "status_details": { - "message": "Job is running" - }, - "spec": { - "model": "default/llama-3-2-1b", - "dataset": "fileset://default/sft-dataset", - "training": { - "type": "sft", - "batch_size": 64, - "epochs": 2, - "learning_rate": 5e-05, - "weight_decay": 0.01, - "max_seq_length": 2048, - "parallelism": { - "num_gpus_per_node": 1, - "num_nodes": 1, - "tensor_parallel_size": 1, - "pipeline_parallel_size": 1, - "context_parallel_size": 1 - } - }, - "output": { - "name": "customization-407790d32cfb", - "type": "model", - "fileset": "customization-407790d32cfb" - }, - "custom_fields": {} - } - } - ], - "pagination": { - "page": 1, - "page_size": 10, - "current_page_size": 1, - "total_pages": 1, - "total_results": 1 + "id": "platform-job-QtyhRY5ub4t4tTLPY4sTkz", + "name": "my-sft-job-99da", + "workspace": "default", + "created_at": "2026-02-09T22:12:45", + "updated_at": "2026-02-09T22:12:45", + "status": "active", + "status_details": { + "message": "Job is running" }, - "sort": "created_at", - "filter": {}, - "search": {} + "spec": { + "model": "default/llama-3-2-1b", + "dataset": "fileset://default/sft-dataset", + "training": { + "type": "sft", + "batch_size": 64, + "epochs": 2, + "learning_rate": 5e-05, + "weight_decay": 0.01, + "max_seq_length": 2048, + "parallelism": { + "num_gpus_per_node": 1, + "num_nodes": 1, + "tensor_parallel_size": 1, + "pipeline_parallel_size": 1, + "context_parallel_size": 1 + } + }, + "output": { + "name": "customization-407790d32cfb", + "type": "model", + "fileset": "customization-407790d32cfb" + }, + "custom_fields": {} + } } - ``` + ], + "pagination": { + "page": 1, + "page_size": 10, + "current_page_size": 1, + "total_pages": 1, + "total_results": 1 + }, + "sort": "created_at", + "filter": {}, + "search": {} +} +``` + \ No newline at end of file diff --git a/docs/customizer/manage-model-entities/create-fileset.mdx b/docs/customizer/manage-model-entities/create-fileset.mdx index c9ffaea239..82f820d6f7 100644 --- a/docs/customizer/manage-model-entities/create-fileset.mdx +++ b/docs/customizer/manage-model-entities/create-fileset.mdx @@ -1,12 +1,14 @@ -# Create a Model FileSet - +--- +title: "Create a Model FileSet" +description: "" +--- Create a FileSet containing your base model checkpoint before creating a Model Entity. ## Prerequisites -- Obtained the base URL of your {{platform_name}}. -- For HuggingFace models: Created a secret with your HF token. Refer to [Manage Secrets](../../get-started/concepts/manage-secrets.md). +- Obtained the base URL of your NeMo Platform. +- For HuggingFace models: Created a secret with your HF token. Refer to [Manage Secrets](/get-started/core-concepts/manage-secrets). - Set the `NMP_BASE_URL` environment variable. ```bash @@ -19,95 +21,96 @@ export NMP_BASE_URL="https://your-nemo-platform-url" The most common method is downloading directly from HuggingFace: -=== "Python SDK" + + +```python +import os +from nemo_platform import ConflictError, NeMoPlatform +from nemo_platform.types.files import HuggingfaceStorageConfigParam - ```python - import os - from nemo_platform import ConflictError, NeMoPlatform - from nemo_platform.types.files import HuggingfaceStorageConfigParam +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), +HF_REPO_ID = "meta-llama/Llama-3.2-1B-Instruct" +MODEL_NAME = "llama-3-2-1b" +HF_SECRET_NAME = "my-hf-token" +HF_TOKEN = os.environ.get("HF_TOKEN") +if not HF_TOKEN: + raise RuntimeError("Set HF_TOKEN before creating the HuggingFace secret.") + +# First, create a secret for your HF token (if not already created) +try: + hf_secret = client.secrets.create( + name=HF_SECRET_NAME, workspace="default", + value=HF_TOKEN, ) + print(f"Created secret: {HF_SECRET_NAME}") +except ConflictError: + print(f"Secret '{HF_SECRET_NAME}' already exists, continuing...") + hf_secret = client.secrets.retrieve(name=HF_SECRET_NAME, workspace="default") + +# Create FileSet from HuggingFace +try: + fileset = client.files.filesets.create( + workspace="default", + name=MODEL_NAME, + description="Llama 3.2 1B Instruct from HuggingFace", + purpose="model", + storage=HuggingfaceStorageConfigParam( + type="huggingface", + repo_id=HF_REPO_ID, + repo_type="model", + token_secret=hf_secret.name, + ), + ) + print(f"Created FileSet: {fileset.name}") +except ConflictError: + print(f"FileSet '{MODEL_NAME}' already exists, retrieving...") + fileset = client.files.filesets.retrieve(workspace="default", name=MODEL_NAME) - HF_REPO_ID = "meta-llama/Llama-3.2-1B-Instruct" - MODEL_NAME = "llama-3-2-1b" - HF_SECRET_NAME = "my-hf-token" - HF_TOKEN = os.environ.get("HF_TOKEN") - if not HF_TOKEN: - raise RuntimeError("Set HF_TOKEN before creating the HuggingFace secret.") - - # First, create a secret for your HF token (if not already created) - try: - hf_secret = client.secrets.create( - name=HF_SECRET_NAME, - workspace="default", - value=HF_TOKEN, - ) - print(f"Created secret: {HF_SECRET_NAME}") - except ConflictError: - print(f"Secret '{HF_SECRET_NAME}' already exists, continuing...") - hf_secret = client.secrets.retrieve(name=HF_SECRET_NAME, workspace="default") - - # Create FileSet from HuggingFace - try: - fileset = client.files.filesets.create( - workspace="default", - name=MODEL_NAME, - description="Llama 3.2 1B Instruct from HuggingFace", - purpose="model", - storage=HuggingfaceStorageConfigParam( - type="huggingface", - repo_id=HF_REPO_ID, - repo_type="model", - token_secret=hf_secret.name, - ), - ) - print(f"Created FileSet: {fileset.name}") - except ConflictError: - print(f"FileSet '{MODEL_NAME}' already exists, retrieving...") - fileset = client.files.filesets.retrieve(workspace="default", name=MODEL_NAME) - - print(f"FileSet ready: {fileset.name}") - ``` - -=== "CLI" - - ```bash - export WORKSPACE="default" - export HF_REPO_ID="meta-llama/Llama-3.2-1B-Instruct" - export MODEL_NAME="llama-3-2-1b" - export HF_SECRET_NAME="my-hf-token" - - # Export HF_TOKEN with your HuggingFace token before running this. - : "${HF_TOKEN:?Set HF_TOKEN before creating the HuggingFace secret.}" - - nemo secrets get "$HF_SECRET_NAME" --workspace "$WORKSPACE" >/dev/null 2>&1 || \ - printf '%s' "$HF_TOKEN" | nemo secrets create "$HF_SECRET_NAME" \ - --workspace "$WORKSPACE" \ - --from-file - - - nemo files filesets create "$MODEL_NAME" \ - --workspace "$WORKSPACE" \ - --description "Llama 3.2 1B Instruct from HuggingFace" \ - --purpose model \ - --exist-ok \ - --storage '{ - "type": "huggingface", - "repo_id": "'"$HF_REPO_ID"'", - "repo_type": "model", - "token_secret": "'"$HF_SECRET_NAME"'" - }' - - nemo files filesets get "$MODEL_NAME" --workspace "$WORKSPACE" - ``` - -!!! tip - For gated models (like Llama), you need to: - 1. Accept the model license on HuggingFace - 2. Create a HuggingFace token with read access - 3. Store the token as a secret in the platform +print(f"FileSet ready: {fileset.name}") +``` + + +```bash +export WORKSPACE="default" +export HF_REPO_ID="meta-llama/Llama-3.2-1B-Instruct" +export MODEL_NAME="llama-3-2-1b" +export HF_SECRET_NAME="my-hf-token" + +# Export HF_TOKEN with your HuggingFace token before running this. +: "${HF_TOKEN:?Set HF_TOKEN before creating the HuggingFace secret.}" + +nemo secrets get "$HF_SECRET_NAME" --workspace "$WORKSPACE" >/dev/null 2>&1 || \ + printf '%s' "$HF_TOKEN" | nemo secrets create "$HF_SECRET_NAME" \ + --workspace "$WORKSPACE" \ + --from-file - + +nemo files filesets create "$MODEL_NAME" \ + --workspace "$WORKSPACE" \ + --description "Llama 3.2 1B Instruct from HuggingFace" \ + --purpose model \ + --exist-ok \ + --storage '{ + "type": "huggingface", + "repo_id": "'"$HF_REPO_ID"'", + "repo_type": "model", + "token_secret": "'"$HF_SECRET_NAME"'" + }' + +nemo files filesets get "$MODEL_NAME" --workspace "$WORKSPACE" +``` + + + +For gated models (like Llama), you need to: +1. Accept the model license on HuggingFace +2. Create a HuggingFace token with read access +3. Store the token as a secret in the platform + --- @@ -115,98 +118,98 @@ The most common method is downloading directly from HuggingFace: For models from NVIDIA NGC: -=== "Python SDK" + + +```python +import os +from nemo_platform import ConflictError, NeMoPlatform +from nemo_platform.types.files import NGCStorageConfigParam - ```python - import os - from nemo_platform import ConflictError, NeMoPlatform - from nemo_platform.types.files import NGCStorageConfigParam +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), +MODEL_NAME = "nemotron-mini-4b" +NGC_RESOURCE = "nemotron-mini-4b-instruct" +NGC_ORG = "nvidia" +NGC_TEAM = "nemo" +NGC_VERSION = "1.0" +NGC_API_KEY_SECRET = "my-ngc-key" +NGC_API_KEY = os.environ.get("NGC_API_KEY") +if not NGC_API_KEY: + raise RuntimeError("Set NGC_API_KEY before creating the NGC secret.") + +# First, create a secret for your NGC API key (if not already created) +try: + ngc_secret = client.secrets.create( + name=NGC_API_KEY_SECRET, workspace="default", + value=NGC_API_KEY, ) - - MODEL_NAME = "nemotron-mini-4b" - NGC_RESOURCE = "nemotron-mini-4b-instruct" - NGC_ORG = "nvidia" - NGC_TEAM = "nemo" - NGC_VERSION = "1.0" - NGC_API_KEY_SECRET = "my-ngc-key" - NGC_API_KEY = os.environ.get("NGC_API_KEY") - if not NGC_API_KEY: - raise RuntimeError("Set NGC_API_KEY before creating the NGC secret.") - - # First, create a secret for your NGC API key (if not already created) - try: - ngc_secret = client.secrets.create( - name=NGC_API_KEY_SECRET, - workspace="default", - value=NGC_API_KEY, - ) - print(f"Created secret: {NGC_API_KEY_SECRET}") - except ConflictError: - print(f"Secret '{NGC_API_KEY_SECRET}' already exists, continuing...") - ngc_secret = client.secrets.retrieve(name=NGC_API_KEY_SECRET, workspace="default") - - # Create FileSet from NGC - try: - fileset = client.files.filesets.create( - workspace="default", - name=MODEL_NAME, - description="Nemotron Mini 4B from NGC", - purpose="model", - storage=NGCStorageConfigParam( - type="ngc", - org=NGC_ORG, - team=NGC_TEAM, - resource=NGC_RESOURCE, # NGC resource name - version=NGC_VERSION, - api_key_secret=ngc_secret.name, - ), - ) - print(f"Created FileSet: {fileset.name}") - except ConflictError: - print("FileSet already exists, retrieving...") - fileset = client.files.filesets.retrieve(workspace="default", name=MODEL_NAME) - ``` - -=== "CLI" - - ```bash - export WORKSPACE="default" - export MODEL_NAME="nemotron-mini-4b" - export NGC_RESOURCE="nemotron-mini-4b-instruct" - export NGC_ORG="nvidia" - export NGC_TEAM="nemo" - export NGC_VERSION="1.0" - export NGC_API_KEY_SECRET="my-ngc-key" - - # Export NGC_API_KEY with your NGC API key before running this. - : "${NGC_API_KEY:?Set NGC_API_KEY before creating the NGC secret.}" - - nemo secrets get "$NGC_API_KEY_SECRET" --workspace "$WORKSPACE" >/dev/null 2>&1 || \ - printf '%s' "$NGC_API_KEY" | nemo secrets create "$NGC_API_KEY_SECRET" \ - --workspace "$WORKSPACE" \ - --from-file - - - nemo files filesets create "$MODEL_NAME" \ - --workspace "$WORKSPACE" \ - --description "Nemotron Mini 4B from NGC" \ - --purpose model \ - --exist-ok \ - --storage '{ - "type": "ngc", - "org": "'"$NGC_ORG"'", - "team": "'"$NGC_TEAM"'", - "resource": "'"$NGC_RESOURCE"'", - "version": "'"$NGC_VERSION"'", - "api_key_secret": "'"$NGC_API_KEY_SECRET"'" - }' - - nemo files filesets get "$MODEL_NAME" --workspace "$WORKSPACE" - ``` - + print(f"Created secret: {NGC_API_KEY_SECRET}") +except ConflictError: + print(f"Secret '{NGC_API_KEY_SECRET}' already exists, continuing...") + ngc_secret = client.secrets.retrieve(name=NGC_API_KEY_SECRET, workspace="default") + +# Create FileSet from NGC +try: + fileset = client.files.filesets.create( + workspace="default", + name=MODEL_NAME, + description="Nemotron Mini 4B from NGC", + purpose="model", + storage=NGCStorageConfigParam( + type="ngc", + org=NGC_ORG, + team=NGC_TEAM, + resource=NGC_RESOURCE, # NGC resource name + version=NGC_VERSION, + api_key_secret=ngc_secret.name, + ), + ) + print(f"Created FileSet: {fileset.name}") +except ConflictError: + print("FileSet already exists, retrieving...") + fileset = client.files.filesets.retrieve(workspace="default", name=MODEL_NAME) +``` + + +```bash +export WORKSPACE="default" +export MODEL_NAME="nemotron-mini-4b" +export NGC_RESOURCE="nemotron-mini-4b-instruct" +export NGC_ORG="nvidia" +export NGC_TEAM="nemo" +export NGC_VERSION="1.0" +export NGC_API_KEY_SECRET="my-ngc-key" + +# Export NGC_API_KEY with your NGC API key before running this. +: "${NGC_API_KEY:?Set NGC_API_KEY before creating the NGC secret.}" + +nemo secrets get "$NGC_API_KEY_SECRET" --workspace "$WORKSPACE" >/dev/null 2>&1 || \ + printf '%s' "$NGC_API_KEY" | nemo secrets create "$NGC_API_KEY_SECRET" \ + --workspace "$WORKSPACE" \ + --from-file - + +nemo files filesets create "$MODEL_NAME" \ + --workspace "$WORKSPACE" \ + --description "Nemotron Mini 4B from NGC" \ + --purpose model \ + --exist-ok \ + --storage '{ + "type": "ngc", + "org": "'"$NGC_ORG"'", + "team": "'"$NGC_TEAM"'", + "resource": "'"$NGC_RESOURCE"'", + "version": "'"$NGC_VERSION"'", + "api_key_secret": "'"$NGC_API_KEY_SECRET"'" + }' + +nemo files filesets get "$MODEL_NAME" --workspace "$WORKSPACE" +``` + + --- ## Check FileSet Status @@ -225,42 +228,41 @@ for f in files[:10]: # Show first 10 print(f" - {f.path} ({f.size:,} bytes)") ``` -??? "Example Response" - :icon: code-square - - ```json + +```json +{ + "data": [ + { + "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/config.json", + "path": "config.json", + "size": 1234, + "cache_status": null + }, { - "data": [ - { - "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/config.json", - "path": "config.json", - "size": 1234, - "cache_status": null - }, - { - "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/model.safetensors", - "path": "model.safetensors", - "size": 2400000000, - "cache_status": null - }, - { - "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/tokenizer.json", - "path": "tokenizer.json", - "size": 9085657, - "cache_status": null - }, - { - "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/tokenizer_config.json", - "path": "tokenizer_config.json", - "size": 901, - "cache_status": null - } - ] + "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/model.safetensors", + "path": "model.safetensors", + "size": 2400000000, + "cache_status": null + }, + { + "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/tokenizer.json", + "path": "tokenizer.json", + "size": 9085657, + "cache_status": null + }, + { + "file_ref": "/v2/workspaces/default/filesets/llama-3-2-1b/-/tokenizer_config.json", + "path": "tokenizer_config.json", + "size": 901, + "cache_status": null } - ``` + ] +} +``` + --- ## Next Steps -Proceed to [create a Model Entity](create-model-entity.md). +Proceed to [create a Model Entity](/fine-tune-models/manage-model-entities/create-a-model-entity). diff --git a/docs/customizer/manage-model-entities/create-model-entity.mdx b/docs/customizer/manage-model-entities/create-model-entity.mdx index 6c129a0835..9c549ebb2e 100644 --- a/docs/customizer/manage-model-entities/create-model-entity.mdx +++ b/docs/customizer/manage-model-entities/create-model-entity.mdx @@ -1,11 +1,13 @@ -# Create a Model Entity - +--- +title: "Create a Model Entity" +description: "" +--- Create a Model Entity that references your FileSet to enable customization jobs. ## Prerequisites -- Created a FileSet containing your model checkpoint (refer to [Create a Model FileSet](create-fileset.md)). +- Created a FileSet containing your model checkpoint (refer to [Create a Model FileSet](/fine-tune-models/manage-model-entities/create-a-model-fileset)). - Set the `NMP_BASE_URL` environment variable. ```bash @@ -39,24 +41,23 @@ except ConflictError: model = client.models.retrieve(workspace="default", name="llama-3-2-1b") ``` -??? "Example Response" - :icon: code-square - - ```json - { - "id": "model-abc123def456", - "name": "llama-3-2-1b", - "workspace": "default", - "description": "Llama 3.2 1B base model for customization", - "fileset": "default/llama-3-2-1b", - "spec": null, - "adapters": [], - "created_at": "2026-02-09T10:30:00Z", - "updated_at": "2026-02-09T10:30:00Z" - } - ``` - - Note: `spec` is initially `null` and will be auto-populated by the Models Controller. + +```json +{ + "id": "model-abc123def456", + "name": "llama-3-2-1b", + "workspace": "default", + "description": "Llama 3.2 1B base model for customization", + "fileset": "default/llama-3-2-1b", + "spec": null, + "adapters": [], + "created_at": "2026-02-09T10:30:00Z", + "updated_at": "2026-02-09T10:30:00Z" +} +``` + +Note: `spec` is initially `null` and will be auto-populated by the Models Controller. + --- @@ -84,23 +85,22 @@ print(f" Layers: {model.spec.num_layers}") print(f" Attention Heads: {model.spec.num_attention_heads}") ``` -??? "Example Model Spec" - :icon: code-square - - ```json - { - "spec": { - "family": "llama", - "base_num_parameters": 1235814400, - "hidden_size": 2048, - "num_layers": 16, - "num_attention_heads": 32, - "num_key_value_heads": 8, - "vocab_size": 128256, - "max_sequence_length": 131072 - } - } - ``` + +```json +{ + "spec": { + "family": "llama", + "base_num_parameters": 1235814400, + "hidden_size": 2048, + "num_layers": 16, + "num_attention_heads": 32, + "num_key_value_heads": 8, + "vocab_size": 128256, + "max_sequence_length": 131072 + } +} +``` + --- @@ -139,7 +139,7 @@ job = client.customization.jobs.create( ) ``` -Refer to [create-job](../manage-customization-jobs/create-job.md) for complete job creation details. +Refer to [create-job](/fine-tune-models/manage-jobs/create-job) for complete job creation details. --- @@ -167,5 +167,5 @@ else: ## Next Steps -- [Create a customization job](../manage-customization-jobs/create-job.md) -- [Understand hyperparameters](../manage-customization-jobs/hyperparameters.md) +- [Create a customization job](/fine-tune-models/manage-jobs/create-job) +- [Understand hyperparameters](/fine-tune-models/manage-jobs/training-configuration) diff --git a/docs/customizer/manage-model-entities/index.mdx b/docs/customizer/manage-model-entities/index.mdx index f499bbf44c..9244e8fc20 100644 --- a/docs/customizer/manage-model-entities/index.mdx +++ b/docs/customizer/manage-model-entities/index.mdx @@ -1,5 +1,7 @@ -# Manage Model Entities for Customization - +--- +title: "Manage Model Entities for Customization" +description: "" +--- Before running a customization job, you need to set up a **Model Entity** that points to your base model checkpoint. This section covers creating the required FileSet and Model Entity. @@ -7,13 +9,13 @@ Before running a customization job, you need to set up a **Model Entity** that p
-- **[Create a Model FileSet](create-fileset.md)** +- **[Create a Model FileSet](/fine-tune-models/manage-model-entities/create-a-model-fileset)** --- Create a FileSet containing your base model checkpoint from HuggingFace, NGC, or local storage. -- **[Create a Model Entity](create-model-entity.md)** +- **[Create a Model Entity](/fine-tune-models/manage-model-entities/create-a-model-entity)** --- @@ -53,8 +55,9 @@ A **Model Entity** is the platform's representation of a model. It contains: Complete example of setting up a model for customization: -!!! tip - **HuggingFace Token**: If downloading from a gated HuggingFace repository (like Llama models), you will need to create a secret containing your HuggingFace API token first. Refer to [Manage Secrets](../../get-started/concepts/manage-secrets.md) for instructions. + +**HuggingFace Token**: If downloading from a gated HuggingFace repository (like Llama models), you will need to create a secret containing your HuggingFace API token first. Refer to [Manage Secrets](/get-started/core-concepts/manage-secrets) for instructions. + ```python import os diff --git a/docs/customizer/models/data-format.mdx b/docs/customizer/models/data-format.mdx index 0a276b8508..3a43d4c3e3 100644 --- a/docs/customizer/models/data-format.mdx +++ b/docs/customizer/models/data-format.mdx @@ -1,3 +1,7 @@ +--- +title: "Dataset Format Requirements" +description: "" +--- # Dataset Format Requirements @@ -8,7 +12,7 @@ Use the following guidelines to prepare your training dataset for the supported - **File Format**: Save your training data as `.jsonl` files (one JSON object per line). - **Validation**: Each record is automatically validated against the appropriate schema when training begins. The required format depends on the `training_type` (SFT, DPO) specified in your job configuration. -For dataset creation tutorials, refer to [Format Training Dataset](../tutorials/format-training-dataset.md). +For dataset creation tutorials, refer to [Format Training Dataset](/fine-tune-models/tutorials/format-training-dataset). ## Dataset Formats diff --git a/docs/customizer/models/embedding.mdx b/docs/customizer/models/embedding.mdx index 7705bc1bd2..b5b68b47b4 100644 --- a/docs/customizer/models/embedding.mdx +++ b/docs/customizer/models/embedding.mdx @@ -1,7 +1,11 @@ +--- +title: "Embedding Models" +description: "" +--- # Embedding Models -This page provides detailed technical specifications for the embedding model family supported by {{ncm_short_name}}. For information about supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the embedding model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Llama Nemotron Embedding 1B v2 @@ -37,8 +41,9 @@ Create a Model Entity for this embedding model: - **LoRA (merged)**: 1x 80GB GPU, tensor parallel size 1 - **Full SFT**: 1x 80GB GPU, tensor parallel size 1 -!!! note - Embedding models only support **merged LoRA** (`peft` with `merge=True`). Unmerged LoRA adapters are not supported because the embedding NIM requires ONNX format, which cannot represent standalone adapters. + +Embedding models only support **merged LoRA** (`peft` with `merge=True`). Unmerged LoRA adapters are not supported because the embedding NIM requires ONNX format, which cannot represent standalone adapters. + ### Resource Requirements @@ -71,7 +76,7 @@ NVIDIA recommends evaluating fine-tuned embedding models against the baseline to This model supports inference deployment through NVIDIA Inference Microservices (NIM). After customization, access your model through the **Inference Gateway**: -1. **Deploy the model**: Create a ModelDeploymentConfig and ModelDeployment to deploy your fine-tuned model. See [about](../../run-inference/about.md) for details. +1. **Deploy the model**: Create a ModelDeploymentConfig and ModelDeployment to deploy your fine-tuned model. See [about](/models-and-inference/about) for details. 2. **Access through Inference Gateway**: The Inference Gateway provides unified access to all deployed models via three routing patterns: - **Model Entity routing**: `/v2/workspaces/{workspace}/inference/gateway/model/{name}/-/v1/embeddings` @@ -90,8 +95,9 @@ client = NeMoPlatform( oai_client = client.models.get_openai_client() ``` -!!! note - The embedding model requires NIM container images that support embedding inference. When the deployment reaches `READY` state, a ModelProvider is automatically created for routing inference requests. + +The embedding model requires NIM container images that support embedding inference. When the deployment reaches `READY` state, a ModelProvider is automatically created for routing inference requests. + ### Example Usage @@ -125,6 +131,6 @@ embedding = response["data"][0]["embedding"] print(f"Embedding dimension: {len(embedding)}") ``` -For detailed fine-tuning instructions, refer to the [Embedding Customization tutorial](../tutorials/embedding-customization-job.ipynb). +For detailed fine-tuning instructions, refer to the Embedding Customization tutorial. -For more information about formatting training datasets for the embedding model, refer to [Dataset Format Requirements](data-format.md). +For more information about formatting training datasets for the embedding model, refer to [Dataset Format Requirements](/fine-tune-models/models/dataset-format). diff --git a/docs/customizer/models/gpt-oss.mdx b/docs/customizer/models/gpt-oss.mdx index cff3cabc6e..3b5630da01 100644 --- a/docs/customizer/models/gpt-oss.mdx +++ b/docs/customizer/models/gpt-oss.mdx @@ -1,11 +1,15 @@ +--- +title: "GPT-OSS Models" +description: "" +--- # GPT-OSS Models -This page provides detailed technical specifications for the OpenAI GPT-OSS model family supported by {{ncm_short_name}}. For supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the OpenAI GPT-OSS model family supported by NeMo Customizer. For supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Before You Start -These models require a HuggingFace token to download. Create a secret with your HuggingFace API key, then create a FileSet and Model Entity referencing the model. See [index](../manage-model-entities/index.md) for setup instructions. +These models require a HuggingFace token to download. Create a secret with your HuggingFace API key, then create a FileSet and Model Entity referencing the model. See [index](/fine-tune-models/manage-model-entities/overview) for setup instructions. --- @@ -61,10 +65,11 @@ Example: Set reasoning level using "Reasoning: high" in the system prompt. GPT-OSS models use a Mixture of Experts (MoE) architecture and benefit from specialized parallelization across expert layers for optimal performance. -!!! note - **MoE Parallelism Constraints** + +**MoE Parallelism Constraints** - MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference. +MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference. + ### Model Selection Guidelines @@ -75,5 +80,4 @@ GPT-OSS models use a Mixture of Experts (MoE) architecture and benefit from spec Both models use the harmony response format and require this format for proper functionality. -!!! note - Sequence packing is not supported for GPT-OSS models in {{ncm_short_name}}. +Sequence packing is not supported for GPT-OSS models in NeMo Customizer. diff --git a/docs/customizer/models/index.mdx b/docs/customizer/models/index.mdx index 6e13051327..d5025aab78 100644 --- a/docs/customizer/models/index.mdx +++ b/docs/customizer/models/index.mdx @@ -1,15 +1,20 @@ +--- +title: "Model Catalog" +description: "" +--- # Model Catalog -Explore the model families and sizes supported by {{ncm_long_name}}. +Explore the model families and sizes supported by NVIDIA NeMo Customizer. -!!! note - For information on setting up model entities for customization, see the [Manage Model Entities](../manage-model-entities/index.md) guide. - For fine-tuning and deployment tutorials, see the [Tutorials](../tutorials/index.md) guide. + +For information on setting up model entities for customization, see the [Manage Model Entities](/fine-tune-models/manage-model-entities/overview) guide. +For fine-tuning and deployment tutorials, see the [Tutorials](/fine-tune-models/tutorials/overview) guide. + ## Before You Start -If downloading models hosted on Hugging Face, create a secret with your HuggingFace API key, then create a FileSet and Model Entity referencing the model. See [index](../manage-model-entities/index.md) for setup instructions. +If downloading models hosted on Hugging Face, create a secret with your HuggingFace API key, then create a FileSet and Model Entity referencing the model. See [index](/fine-tune-models/manage-model-entities/overview) for setup instructions. --- @@ -18,43 +23,43 @@ If downloading models hosted on Hugging Face, create a secret with your HuggingF
-- **[Llama Models](llama.md)** +- **[Llama Models](/fine-tune-models/models/llama)** --- View the available Llama models from Meta, ranging from 8 billion to 70 billion parameters. -- **[Llama Nemotron Models](llama-nemotron.md)** +- **[Llama Nemotron Models](/fine-tune-models/models/llama-nemotron)** --- View the available Llama Nemotron models from NVIDIA, including Nano and Super variants for efficient and advanced instruction tuning. -- **[Phi Models](phi.md)** +- **[Phi Models](/fine-tune-models/models/phi)** --- View the available Phi models from Microsoft, designed for strong reasoning capabilities with efficient deployment. -- **[Embedding Models](embedding.md)** +- **[Embedding Models](/fine-tune-models/models/embedding)** --- View the available embedding models optimized for retrieval and question-answering tasks. -- **[GPT-OSS Models](gpt-oss.md)** +- **[GPT-OSS Models](/fine-tune-models/models/gpt-oss)** --- View the available GPT-OSS models supported for customization. -- **[Qwen Models](qwen.md)** +- **[Qwen Models](/fine-tune-models/models/qwen)** --- View the available Qwen models from Alibaba Cloud, including compact variants for efficient customization. -- **[Mistral Models](mistral.md)** +- **[Mistral Models](/fine-tune-models/models/mistral)** --- @@ -64,7 +69,7 @@ If downloading models hosted on Hugging Face, create a secret with your HuggingF ## Tested Models -The following table lists models that NVIDIA tested and their available features. While {{ncm_short_name}} works with all LLM NIM microservices, the table lists the models that NVIDIA tested. Models available for fine-tuning with {{ncm_short_name}} are not limited to those listed. +The following table lists models that NVIDIA tested and their available features. While NeMo Customizer works with all LLM NIM microservices, the table lists the models that NVIDIA tested. Models available for fine-tuning with NeMo Customizer are not limited to those listed. For detailed technical specifications of each model such as architecture, parameters, and token limits, refer to the [model family](#model-families) pages. @@ -97,4 +102,4 @@ The following models support both chat and completion model training. |--|--|--| | [nvidia/llama-nemotron-embed-1b-v2](https://huggingface.co/nvidia/llama-nemotron-embed-1b-v2) | Full SFT, LoRA (merged) | Supported | -For detailed technical specifications and configuration information for embedding models, see the [Embedding Models](embedding.md) page. +For detailed technical specifications and configuration information for embedding models, see the [Embedding Models](/fine-tune-models/models/embedding) page. diff --git a/docs/customizer/models/llama-nemotron.mdx b/docs/customizer/models/llama-nemotron.mdx index 3ed124791e..e3a38839d3 100644 --- a/docs/customizer/models/llama-nemotron.mdx +++ b/docs/customizer/models/llama-nemotron.mdx @@ -1,7 +1,11 @@ +--- +title: "Llama Nemotron Models" +description: "" +--- # Llama Nemotron Models -This page provides detailed technical specifications for the Nemotron model family supported by {{ncm_short_name}}. For information about supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the Nemotron model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Llama 3.1 Nemotron Nano 8B v1 @@ -84,10 +88,11 @@ This page provides detailed technical specifications for the Nemotron model fami - **LoRA**: 2x 80GB GPU, tensor parallel size 1, expert parallel size 2, pipeline parallel size 1 - **Full SFT**: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1 -!!! note - **MoE Parallelism Constraints** + +**MoE Parallelism Constraints** - MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference. +MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference. + ### Deployment Configuration @@ -95,8 +100,7 @@ This page provides detailed technical specifications for the Nemotron model fami - NIM Image: `nvcr.io/nim/nvidia/nemotron-3-nano:1.7.0-variant` - GPU Count: 2x 80GB -!!! note - Deployment for LoRA using NIM is not supported for this model. +Deployment for LoRA using NIM is not supported for this model. ## NVIDIA Nemotron 3 Super 120B A12B @@ -114,10 +118,11 @@ This page provides detailed technical specifications for the Nemotron model fami - **LoRA**: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1 -!!! note - **MoE Parallelism Constraints** + +**MoE Parallelism Constraints** - MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference. +MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference. + ### Deployment Configuration diff --git a/docs/customizer/models/llama.mdx b/docs/customizer/models/llama.mdx index e226b227d5..845072f22d 100644 --- a/docs/customizer/models/llama.mdx +++ b/docs/customizer/models/llama.mdx @@ -1,7 +1,9 @@ -# Llama Models - +--- +title: "Llama Models" +description: "" +--- -This page provides detailed technical specifications for the Llama model family supported by {{ncm_short_name}}. For information about supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the Llama model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Llama-3.2-3B Instruct diff --git a/docs/customizer/models/mistral.mdx b/docs/customizer/models/mistral.mdx index dd0104b4f1..766363f012 100644 --- a/docs/customizer/models/mistral.mdx +++ b/docs/customizer/models/mistral.mdx @@ -1,7 +1,11 @@ +--- +title: "Mistral Models" +description: "" +--- # Mistral Models -This page provides detailed technical specifications for the Mistral model family supported by {{ncm_short_name}}. For information about supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the Mistral model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Mistral-7B-Instruct-v0.3 @@ -50,8 +54,7 @@ This page provides detailed technical specifications for the Mistral model famil - **LoRA**: 1x 80GB GPU, tensor parallel size 1 - **Full SFT**: 2x 80GB GPU, tensor parallel size 1 -!!! note - Deployment using NIM is not supported for this model. +Deployment using NIM is not supported for this model. ## Ministral-3-3B-Reasoning-2512 @@ -71,5 +74,4 @@ This page provides detailed technical specifications for the Mistral model famil - **LoRA**: 1x 80GB GPU, tensor parallel size 1 - **Full SFT**: 2x 80GB GPU, tensor parallel size 1 -!!! note - Deployment using NIM is not supported for this model. +Deployment using NIM is not supported for this model. diff --git a/docs/customizer/models/phi.mdx b/docs/customizer/models/phi.mdx index 11968d4262..c8d9ef5a13 100644 --- a/docs/customizer/models/phi.mdx +++ b/docs/customizer/models/phi.mdx @@ -1,7 +1,9 @@ -# Phi Models - +--- +title: "Phi Models" +description: "" +--- -This page provides detailed technical specifications for the Phi model family supported by {{ncm_short_name}}. For information about supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the Phi model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Microsoft Phi-4 diff --git a/docs/customizer/models/qwen.mdx b/docs/customizer/models/qwen.mdx index 1fc506bc06..62b6efde06 100644 --- a/docs/customizer/models/qwen.mdx +++ b/docs/customizer/models/qwen.mdx @@ -1,7 +1,11 @@ +--- +title: "Qwen Models" +description: "" +--- # Qwen Models -This page provides detailed technical specifications for the Qwen model family supported by {{ncm_short_name}}. For information about supported features and capabilities, refer to [Tested Models](index.md). +This page provides detailed technical specifications for the Qwen model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/fine-tune-models/models/model-catalog). ## Qwen2.5-1.5B-Instruct diff --git a/docs/customizer/tutorials/_snippets/customizer-prereqs.mdx b/docs/customizer/tutorials/_snippets/customizer-prereqs.mdx index 61e3c3a3f7..d1d139563a 100644 --- a/docs/customizer/tutorials/_snippets/customizer-prereqs.mdx +++ b/docs/customizer/tutorials/_snippets/customizer-prereqs.mdx @@ -1,32 +1,33 @@ -# NeMo Customizer Prerequisites - -??? "Platform Setup Requirements and Environment Variables" - :icon: gear - - Before starting, make sure you have: - - - {{platform_name}} installed and deployed (see [Setup](../../get-started/setup.md)) - - The `nemo-platform` Python SDK installed (`pip install nemo-platform[all]`) - - (Optional) Weights & Biases account and API key for enhanced visualization - - **Set up environment variables:** - - ```bash - # Set the base URL for {{platform_name}} - export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL - - # Optional: Weights & Biases for experiment tracking - export WANDB_API_KEY="" - ``` - - **Initialize the SDK:** - - ```python - import os - from nemo_platform import NeMoPlatform - - client = NeMoPlatform( - base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), - workspace="default", - ) - ``` +--- +title: "NeMo Customizer Prerequisites" +description: "" +--- + +Before starting, make sure you have: + +- NeMo Platform installed and deployed (see [Setup](/customizer/get-started/setup)) +- The `nemo-platform` Python SDK installed (`pip install nemo-platform[all]`) +- (Optional) Weights & Biases account and API key for enhanced visualization + +**Set up environment variables:** + +```bash +# Set the base URL for {{platform_name}} +export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL + +# Optional: Weights & Biases for experiment tracking +export WANDB_API_KEY="" +``` + +**Initialize the SDK:** + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + \ No newline at end of file diff --git a/docs/customizer/tutorials/format-training-dataset.mdx b/docs/customizer/tutorials/format-training-dataset.mdx index a5e6badd28..75eb1f1819 100644 --- a/docs/customizer/tutorials/format-training-dataset.mdx +++ b/docs/customizer/tutorials/format-training-dataset.mdx @@ -1,3 +1,7 @@ +--- +title: "Format Training Dataset" +description: "" +--- # Format Training Dataset @@ -10,29 +14,70 @@ Customizer expects _all_ datasets to use JSONL format, where each line in the da ## Prerequisites ---8<-- "_snippets/tutorials/prereqs.md" +# Platform Prerequisites + + +All platform resources—models, datasets, and more—must belong to a **workspace**. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use **projects** to group related resources. + +**If you're new to the platform**, start with the **[Setup guide](/get-started/setup)** to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end. + +**If you're already familiar** with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial. + +For more information, see [Workspaces](/get-started/core-concepts/workspaces) and [Projects](/get-started/core-concepts/projects). + + +# NeMo Customizer Prerequisites + + +Before starting, make sure you have: + +- NeMo Platform installed and deployed (see [Setup](/get-started/setup)) +- The `nemo-platform` Python SDK installed (`pip install nemo-platform[all]`) +- (Optional) Weights & Biases account and API key for enhanced visualization + +**Set up environment variables:** ---8<-- "_snippets/customizer-prereqs.md" +```bash +# Set the base URL for {{platform_name}} +export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL + +# Optional: Weights & Biases for experiment tracking +export WANDB_API_KEY="" +``` + +**Initialize the SDK:** + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + ## Dataset Best Practices Before formatting your dataset, follow these principles to ensure high-quality training: -!!! tip - **Quality Principles:** + +**Quality Principles:** - - **Quality > Quantity:** 100 high-quality examples beat 1,000 poor ones - - **Diversity:** Cover different scenarios, edge cases, and variations - - **Consistency:** Maintain uniform format, tone, and style across examples - - **Balance:** Include both common and rare cases relevant to your use case - - **Validation split:** Reserve 10-20% for validation to detect overfitting +- **Quality > Quantity:** 100 high-quality examples beat 1,000 poor ones +- **Diversity:** Cover different scenarios, edge cases, and variations +- **Consistency:** Maintain uniform format, tone, and style across examples +- **Balance:** Include both common and rare cases relevant to your use case +- **Validation split:** Reserve 10-20% for validation to detect overfitting - **Recommended dataset sizes:** +**Recommended dataset sizes:** - - **Minimum:** 50-100 examples for simple tasks - - **Typical:** 500-2,000 examples for most use cases - - **Large scale:** 10,000+ for complex domains with high variation +- **Minimum:** 50-100 examples for simple tasks +- **Typical:** 500-2,000 examples for most use cases +- **Large scale:** 10,000+ for complex domains with high variation + ### Data Quality Checklist @@ -84,41 +129,71 @@ A conversational dataset contains a sequence of `messages` that represent intera - A `role` field to categorize the message text. Options include `system`, `user`, and `assistant`. - A `content` field for the actual body of information communicated by that role. -!!! tip - For best training results, the `assistant` role should be the last message in each training example. + +For best training results, the `assistant` role should be the last message in each training example. + Example entry formatted for JSONL dataset file: ---8<-- "_snippets/output/chat-basic-format-example.jsonl" - -??? "Expanded JSON example" - - For illustrative purposes only, we show an example entry as multi-line JSON. +```json +{"messages": [{"role": "system","content": ""}, {"role": "user","content": ""}, {"role": "assistant","content": ""}]} +``` - --8<-- "_snippets/output/chat-expanded-format-example.json" + +For illustrative purposes only, we show an example entry as multi-line JSON. + +```json +{ + "messages": [ + { + "role": "system", + "content": "" + }, { + "role": "user", + "content": "" + }, { + "role": "assistant", + "content": "" + } + ] +} +``` + ### Reasoning Considerations -Some models (such as [Llama Nemotron](../models/llama-nemotron.md)) support a **detailed thinking** mode, which you can toggle in the system message. This setting controls whether the model is encouraged to show step-by-step reasoning in its responses. +Some models (such as [Llama Nemotron](/fine-tune-models/models/llama-nemotron)) support a **detailed thinking** mode, which you can toggle in the system message. This setting controls whether the model is encouraged to show step-by-step reasoning in its responses. - **Training data without reasoning:** Use `detailed thinking off` in the system message. - **Training data with reasoning:** Use `detailed thinking on` in the system message. -!!! note - If you have an existing system message that must be preserved, prepend `detailed thinking on` or `detailed thinking off` to the beginning of your system message. - - For example, if your original system message is `You are a helpful assistant.`, you should use `detailed thinking on\nYou are a helpful assistant.` or `detailed thinking off\nYou are a helpful assistant.` - - -=== "Detailed Thinking Off" + +If you have an existing system message that must be preserved, prepend `detailed thinking on` or `detailed thinking off` to the beginning of your system message. - --8<-- "_snippets/output/chat-thinking-off-example.jsonl" - -=== "Detailed Thinking On" - - --8<-- "_snippets/output/chat-thinking-on-example.jsonl" +For example, if your original system message is `You are a helpful assistant.`, you should use `detailed thinking on\nYou are a helpful assistant.` or `detailed thinking off\nYou are a helpful assistant.` + + + +```json +{"messages": [ + {"role": "system", "content": "detailed thinking off"}, + {"role": "user", "content": "What is 2 + 2?"}, + {"role": "assistant", "content": "4"} +]} +``` + + +```json +{"messages": [ + {"role": "system", "content": "detailed thinking on"}, + {"role": "user", "content": "What is 2 + 2?"}, + {"role": "assistant", "content": "To solve 2 + 2, add 2 and 2 together. The answer is 4."} +]} +``` + + You can adjust the system message for each training example to match the style of your data and the behavior you want the model to learn. #### Schema with Tool Calling @@ -136,13 +211,51 @@ To train a model with tool calling capabilities, use the conversational dataset Every sample must be a single line as in this example below. ---8<-- "_snippets/output/chat-tool-calling-basic-example.jsonl" - -??? "Expanded JSON example" +```json +{"messages": [{"role": "user","content": ""},{"role": "assistant","content": "","tool_calls": [{"type": "function","function": {"name": "fibonacci","arguments": {"n": 20}}}]}],"tools": [{"type": "function","function": {"name": "fibonacci","description": "Calculates the nth Fibonacci number.","parameters": {"type": "object","properties": {"n": {"description": "The position of the Fibonacci number.","type": "integer"}}}}}]} +``` - For illustrative purposes only, we show an example entry as multi-line JSON. + +For illustrative purposes only, we show an example entry as multi-line JSON. - --8<-- "_snippets/output/chat-tool-calling-expanded-example.json" +```json +{ + "messages": [ + { + "role": "user", + "content": "" + }, + { + "role": "assistant", + "content": "", + "tool_calls": [{ + "type": "function", + "function": { + "name": "fibonacci", + "arguments": {"n": 20} + } + }] + } + ], + "tools": [{ + "type": "function", + "function": { + "name": "fibonacci", + "description": "Calculates the nth Fibonacci number.", + "parameters": { + "type": "object", + "properties": { + "n": { + "description": "The position of the Fibonacci number.", + "type": "integer" + } + } + } + } + }] +} +``` + ##### Shared Tools @@ -225,8 +338,9 @@ if model.spec: To run inference with a chat model, you first need to deploy the model, then use the Inference Gateway to send requests. -!!! note - Before running inference, ensure your model is deployed. See [about](../../run-inference/about.md) for details on creating a ModelDeploymentConfig and ModelDeployment. + +Before running inference, ensure your model is deployed. See [about](/models-and-inference/about) for details on creating a ModelDeploymentConfig and ModelDeployment. + ```python import os @@ -259,9 +373,9 @@ print(f"Response: {response.choices[0].message.content}") Now that you know how to format your training datasets, you can proceed with creating customization jobs: -- [Start a LoRA Model Customization Job](./lora-customization-job.ipynb) - For parameter-efficient fine-tuning -- [Start a Full SFT Customization Job](./sft-customization-job.ipynb) - For full model fine-tuning -- [Start a DPO Customization Job](./dpo-customization-job.ipynb) - For preference-based alignment +- Start a LoRA Model Customization Job - For parameter-efficient fine-tuning +- Start a Full SFT Customization Job - For full model fine-tuning +- Start a DPO Customization Job - For preference-based alignment --- @@ -276,14 +390,17 @@ Prompt completion datasets have a simple schema. Each datum has: - A `prompt` field for the body of information provided by the user. - A `completion` field for output of the model. ---8<-- "_snippets/output/completion-format-example.jsonl" +```json +{"prompt": "Hello", "completion": " world."} +``` ### Prompt the Model To run inference with a completion model, use the Inference Gateway to send requests to the `/completions` endpoint. -!!! note - Before running inference, ensure your model is deployed. See [about](../../run-inference/about.md) for details on creating a ModelDeploymentConfig and ModelDeployment. + +Before running inference, ensure your model is deployed. See [about](/models-and-inference/about) for details on creating a ModelDeploymentConfig and ModelDeployment. + ```python import os diff --git a/docs/customizer/tutorials/import-hf-model.mdx b/docs/customizer/tutorials/import-hf-model.mdx index 60d90b5aa6..3e346cab40 100644 --- a/docs/customizer/tutorials/import-hf-model.mdx +++ b/docs/customizer/tutorials/import-hf-model.mdx @@ -1,4 +1,7 @@ - +--- +title: "Import and Fine-Tune Private HuggingFace Models" +description: "" +--- # Import and Fine-Tune Private HuggingFace Models @@ -6,9 +9,49 @@ Use this tutorial to learn how to import a private HuggingFace model into NeMo C ## Prerequisites ---8<-- "_snippets/tutorials/prereqs.md" +# Platform Prerequisites + + +All platform resources—models, datasets, and more—must belong to a **workspace**. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use **projects** to group related resources. + +**If you're new to the platform**, start with the **[Setup guide](/get-started/setup)** to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end. + +**If you're already familiar** with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial. + +For more information, see [Workspaces](/get-started/core-concepts/workspaces) and [Projects](/get-started/core-concepts/projects). + + +# NeMo Customizer Prerequisites + + +Before starting, make sure you have: + +- NeMo Platform installed and deployed (see [Setup](/get-started/setup)) +- The `nemo-platform` Python SDK installed (`pip install nemo-platform[all]`) +- (Optional) Weights & Biases account and API key for enhanced visualization ---8<-- "_snippets/customizer-prereqs.md" +**Set up environment variables:** + +```bash +# Set the base URL for {{platform_name}} +export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL + +# Optional: Weights & Biases for experiment tracking +export WANDB_API_KEY="" +``` + +**Initialize the SDK:** + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + ### Tutorial-Specific Prerequisites @@ -19,33 +62,35 @@ Use this tutorial to learn how to import a private HuggingFace model into NeMo C - Sufficient storage space for the model files (typically 5-50GB depending on model size) - At least 8GB GPU memory for smaller models, more for larger models -!!! note - Verify that all required services are running and accessible before proceeding. You can check service health using the health endpoints documented in each service's API specification. + +Verify that all required services are running and accessible before proceeding. You can check service health using the health endpoints documented in each service's API specification. + ### Known Issues -!!! warning - **Conv1D Model Architecture Limitation**: Models that use `Conv1D` layers are not compatible with NeMo Customizer AutoModel LoRA. + +**Conv1D Model Architecture Limitation**: Models that use `Conv1D` layers are not compatible with NeMo Customizer AutoModel LoRA. - **Error signature**: `AttributeError: 'Conv1D' object has no attribute 'config'` +**Error signature**: `AttributeError: 'Conv1D' object has no attribute 'config'` - **Affected models include:** +**Affected models include:** - - `microsoft/DialoGPT-*` series - - `openai-gpt` models - - Some older `gpt2` variants - - Other models with Conv1D-based architectures +- `microsoft/DialoGPT-*` series +- `openai-gpt` models +- Some older `gpt2` variants +- Other models with Conv1D-based architectures - **Root cause**: These models use Conv1D layers that lack the linear layers expected by NeMo's LoRA transformation utilities. +**Root cause**: These models use Conv1D layers that lack the linear layers expected by NeMo's LoRA transformation utilities. - **Solution**: Use modern transformer architectures instead: +**Solution**: Use modern transformer architectures instead: - - ✅ **Recommended**: Llama models (3.1, 3.2, 3.3 series) - - ✅ **Recommended**: Nemotron models - - ✅ **Recommended**: Phi models - - ✅ **Alternative**: Gemma models (used in this tutorial) +- ✅ **Recommended**: Llama models (3.1, 3.2, 3.3 series) +- ✅ **Recommended**: Nemotron models +- ✅ **Recommended**: Phi models +- ✅ **Alternative**: Gemma models (used in this tutorial) - For a complete list of tested models, see the [Model Catalog](../models/index.md). +For a complete list of tested models, see the [Model Catalog](/fine-tune-models/models/model-catalog). + --- @@ -54,7 +99,7 @@ Use this tutorial to learn how to import a private HuggingFace model into NeMo C 1. Authenticate to HuggingFace using `hf auth login`. 2. Download the model. - --8<-- "_snippets/input/common.sh" + `--8<-- "_snippets/input/common.sh"` --- @@ -64,234 +109,234 @@ Next, create a model repository in the NeMo Data Store and upload the downloaded ### Create Namespace and Model Repository -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - - --8<-- "_snippets/input/common.py" - -=== "cURL" - - ```bash - # Set environment variables - Update these to match your deployment - export NAMESPACE="my-org" - export MODEL_NAME="gemma-2-2b-it" - export MODEL_VERSION="$(date +%Y%m%d-%H%M%S)" # Unique version for this run - export NEMO_BASE_URL="http://nemo.test" - export DATASTORE_URL="http://data-store.test" - export REPO_ID="${NAMESPACE}/${MODEL_NAME}" - export DATASET_NAME="${MODEL_NAME}-training-data" - - # Create namespace - curl -X POST "${DATASTORE_URL}/v1/datastore/namespaces" \ - -H 'Content-Type: application/json' \ - -d '{"namespace": "'${NAMESPACE}'"}' - - # Create model repository in datastore - curl -X POST "${DATASTORE_URL}/v1/hf/api/repos/create" \ - -H 'Content-Type: application/json' \ - -d '{ - "organization": "'${NAMESPACE}'", - "name": "'${MODEL_NAME}'", - "type": "model" - }' - ``` - + + +`--8<-- "_snippets/input/common.py"` + +`--8<-- "_snippets/input/common.py"` + + +```bash +# Set environment variables - Update these to match your deployment +export NAMESPACE="my-org" +export MODEL_NAME="gemma-2-2b-it" +export MODEL_VERSION="$(date +%Y%m%d-%H%M%S)" # Unique version for this run +export NEMO_BASE_URL="http://nemo.test" +export DATASTORE_URL="http://data-store.test" +export REPO_ID="${NAMESPACE}/${MODEL_NAME}" +export DATASET_NAME="${MODEL_NAME}-training-data" + +# Create namespace +curl -X POST "${DATASTORE_URL}/v1/datastore/namespaces" \ +-H 'Content-Type: application/json' \ +-d '{"namespace": "'${NAMESPACE}'"}' + +# Create model repository in datastore +curl -X POST "${DATASTORE_URL}/v1/hf/api/repos/create" \ +-H 'Content-Type: application/json' \ +-d '{ +"organization": "'${NAMESPACE}'", +"name": "'${MODEL_NAME}'", +"type": "model" +}' +``` + + ### Upload Model Files to Data Store Upload the downloaded model files to the Data Store repository: ---8<-- "_snippets/input/common.py" +`--8<-- "_snippets/input/common.py"` ## Create Model Entity in Entity Store After uploading the model files to the Data Store, create a model entity in the Entity Store to register the model with its metadata and specifications for use in customization jobs. -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - -=== "cURL" - - ```bash - # Create model entity in Entity Store - curl -X POST "${NEMO_BASE_URL}/v1/models" \ - -H 'Content-Type: application/json' \ - -d '{ - "name": "'${MODEL_NAME}'", - "namespace": "'${NAMESPACE}'", - "description": "Private '${MODEL_NAME}' model imported for customization", - "artifact": { - "files_url": "hf://models/'${REPO_ID}'", - "backend_engine": "hugging_face", - "status": "upload_completed" - }, - "spec": { - "num_parameters": 200000000, - "context_size": 1024, - "is_chat": true, - "num_virtual_tokens": -1 - }, - "peft": { - "finetuning_type": "all_weights" - } - }' | jq . - ``` - + + +`--8<-- "_snippets/input/common.py"` + + +```bash +# Create model entity in Entity Store +curl -X POST "${NEMO_BASE_URL}/v1/models" \ +-H 'Content-Type: application/json' \ +-d '{ +"name": "'${MODEL_NAME}'", +"namespace": "'${NAMESPACE}'", +"description": "Private '${MODEL_NAME}' model imported for customization", +"artifact": { +"files_url": "hf://models/'${REPO_ID}'", +"backend_engine": "hugging_face", +"status": "upload_completed" +}, +"spec": { +"num_parameters": 200000000, +"context_size": 1024, +"is_chat": true, +"num_virtual_tokens": -1 +}, +"peft": { +"finetuning_type": "all_weights" +} +}' | jq . +``` + + ## Deploy the Base Model Deploy the base model for inference with LoRA adapter support enabled, allowing it to load fine-tuned adapters from customization jobs. -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - -=== "cURL" - - ```bash - curl "${NEMO_BASE_URL}/v1/deployment/configs" \ - -X POST \ - -H 'Content-Type: application/json' \ - --data-binary '{ - "model": "'${MODEL_NAME}'", - "name": "'${MODEL_NAME}'-deployment-config", - "namespace": "'${NAMESPACE}'", - "nim_deployment": { - "additional_envs": { - "NIM_FT_MODEL": "", - "NIM_GUIDED_DECODING_BACKEND": "outlines", - "NIM_JSONL_LOGGING": "0", - "NIM_MODEL_NAME": "/model-store", - "NIM_PEFT_REFRESH_INTERVAL": "30", - "NIM_PEFT_SOURCE": "http://nemo-entity-store:8000", - "UVICORN_LOG_LEVEL": "DEBUG", - "VLLM_NVEXT_LOG_LEVEL": "DEBUG" - }, - "gpu": 1, - "image_name": "nvcr.io/nim/nvidia/llm-nim", - "image_tag": "1.13.1", - "disable_lora_support": false - } - }' | jq . - - - curl "${NEMO_BASE_URL}/v1/deployment/model-deployments" \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{ - "name": "'${MODEL_NAME}'-deployment", - "namespace": "'${NAMESPACE}'", - "config": "'${NAMESPACE}'/'${MODEL_NAME}'-deployment-config" - }' | jq . - - ``` + + +`--8<-- "_snippets/input/common.py"` + + +```bash +curl "${NEMO_BASE_URL}/v1/deployment/configs" \ +-X POST \ +-H 'Content-Type: application/json' \ +--data-binary '{ +"model": "'${MODEL_NAME}'", +"name": "'${MODEL_NAME}'-deployment-config", +"namespace": "'${NAMESPACE}'", +"nim_deployment": { +"additional_envs": { +"NIM_FT_MODEL": "", +"NIM_GUIDED_DECODING_BACKEND": "outlines", +"NIM_JSONL_LOGGING": "0", +"NIM_MODEL_NAME": "/model-store", +"NIM_PEFT_REFRESH_INTERVAL": "30", +"NIM_PEFT_SOURCE": "http://nemo-entity-store:8000", +"UVICORN_LOG_LEVEL": "DEBUG", +"VLLM_NVEXT_LOG_LEVEL": "DEBUG" +}, +"gpu": 1, +"image_name": "nvcr.io/nim/nvidia/llm-nim", +"image_tag": "1.13.1", +"disable_lora_support": false +} +}' | jq . + + +curl "${NEMO_BASE_URL}/v1/deployment/model-deployments" \ +-X POST \ +-H 'Content-Type: application/json' \ +-d '{ +"name": "'${MODEL_NAME}'-deployment", +"namespace": "'${NAMESPACE}'", +"config": "'${NAMESPACE}'/'${MODEL_NAME}'-deployment-config" +}' | jq . +``` + + ## Create Customization Target Create a customization target that references the uploaded model in the Data Store. -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - -=== "cURL" - - ```bash - - # Create customization target - curl -X POST \ - "${NEMO_BASE_URL}/v1/customization/targets" \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "name": "'${MODEL_NAME}'@v'${MODEL_VERSION}'", - "namespace": "'${NAMESPACE}'", - "description": "Customization target for '${MODEL_NAME}'", - "enabled": true, - "model_uri": "hf://'${NAMESPACE}'/'${MODEL_NAME}'", - "num_parameters": 200000000, - "precision": "bf16-mixed" - }' | jq . - ``` - + + +`--8<-- "_snippets/input/common.py"` + + +```bash + +# Create customization target +curl -X POST \ +"${NEMO_BASE_URL}/v1/customization/targets" \ +-H 'accept: application/json' \ +-H 'Content-Type: application/json' \ +-d '{ +"name": "'${MODEL_NAME}'@v'${MODEL_VERSION}'", +"namespace": "'${NAMESPACE}'", +"description": "Customization target for '${MODEL_NAME}'", +"enabled": true, +"model_uri": "hf://'${NAMESPACE}'/'${MODEL_NAME}'", +"num_parameters": 200000000, +"precision": "bf16-mixed" +}' | jq . +``` + + Wait for the model to be downloaded and ready: -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - -=== "CLI" - - ```bash - # Check target status with comprehensive handling - while true; do - RESPONSE=$(curl -s -X GET \ - "${NEMO_BASE_URL}/v1/customization/targets/${NAMESPACE}/${MODEL_NAME}@v${MODEL_VERSION}" \ - -H 'accept: application/json') - - STATUS=$(echo "$RESPONSE" | jq -r '.status') - echo "Target status: $STATUS" - - if [ "$STATUS" = "ready" ]; then - echo "Model is ready for customization!" - break - elif [ "$STATUS" = "failed" ] || [ "$STATUS" = "cancelled" ] || [ "$STATUS" = "unknown" ] || [ "$STATUS" = "delete_failed" ]; then - echo "Model download failed with status: $STATUS" - echo "Contact your administrator for assistance." - break - elif [ "$STATUS" = "created" ] || [ "$STATUS" = "pending" ] || [ "$STATUS" = "downloading" ]; then - echo "Model is still being prepared, waiting..." - elif [ "$STATUS" = "deleted" ] || [ "$STATUS" = "deleting" ]; then - echo "Model is being deleted (status: $STATUS)" - echo "This target cannot be used for customization." - break - else - echo "Unknown status: $STATUS" - fi - - sleep 30 - done - - ``` + + +`--8<-- "_snippets/input/common.py"` + + +```bash +# Check target status with comprehensive handling +while true; do +RESPONSE=$(curl -s -X GET \ +"${NEMO_BASE_URL}/v1/customization/targets/${NAMESPACE}/${MODEL_NAME}@v${MODEL_VERSION}" \ +-H 'accept: application/json') + +STATUS=$(echo "$RESPONSE" | jq -r '.status') +echo "Target status: $STATUS" + +if [ "$STATUS" = "ready" ]; then +echo "Model is ready for customization!" +break +elif [ "$STATUS" = "failed" ] || [ "$STATUS" = "cancelled" ] || [ "$STATUS" = "unknown" ] || [ "$STATUS" = "delete_failed" ]; then +echo "Model download failed with status: $STATUS" +echo "Contact your administrator for assistance." +break +elif [ "$STATUS" = "created" ] || [ "$STATUS" = "pending" ] || [ "$STATUS" = "downloading" ]; then +echo "Model is still being prepared, waiting..." +elif [ "$STATUS" = "deleted" ] || [ "$STATUS" = "deleting" ]; then +echo "Model is being deleted (status: $STATUS)" +echo "This target cannot be used for customization." +break +else +echo "Unknown status: $STATUS" +fi + +sleep 30 +done +``` + + ## Create Customization Configuration Create a configuration for LoRA fine-tuning: -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - -=== "cURL" - - ```bash - # Create customization configuration - curl -X POST \ - "${NEMO_BASE_URL}/v1/customization/configs" \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "name": "'${MODEL_NAME}'-lora-config@v'${MODEL_VERSION}'", - "namespace": "'${NAMESPACE}'", - "target": "'${NAMESPACE}'/'${MODEL_NAME}'@v'${MODEL_VERSION}'", - "description": "LoRA configuration for '${MODEL_NAME}'", - "training_options": [ - { - "training_type": "sft", - "finetuning_type": "lora", - "num_gpus": 1, - "num_nodes": 1, - "tensor_parallel_size": 1, - "pipeline_parallel_size": 1, - "micro_batch_size": 1 - } - ], - "training_precision": "bf16-mixed", - "max_seq_length": 1024, - "prompt_template": "{prompt} {completion}" - }' | jq . - ``` - + + +`--8<-- "_snippets/input/common.py"` + + +```bash +# Create customization configuration +curl -X POST \ +"${NEMO_BASE_URL}/v1/customization/configs" \ +-H 'accept: application/json' \ +-H 'Content-Type: application/json' \ +-d '{ +"name": "'${MODEL_NAME}'-lora-config@v'${MODEL_VERSION}'", +"namespace": "'${NAMESPACE}'", +"target": "'${NAMESPACE}'/'${MODEL_NAME}'@v'${MODEL_VERSION}'", +"description": "LoRA configuration for '${MODEL_NAME}'", +"training_options": [ +{ +"training_type": "sft", +"finetuning_type": "lora", +"num_gpus": 1, +"num_nodes": 1, +"tensor_parallel_size": 1, +"pipeline_parallel_size": 1, +"micro_batch_size": 1 +} +], +"training_precision": "bf16-mixed", +"max_seq_length": 1024, +"prompt_template": "{prompt} {completion}" +}' | jq . +``` + + --- ## Prepare Training and Validation Datasets @@ -510,44 +555,44 @@ print(f"Created dataset entity: {dataset.namespace}/{dataset.name}") Start the LoRA fine-tuning job. The job will create an output artifact with the name specified in `output`, which you'll use later to access your fine-tuned model for inference. -=== "Python SDK" - - --8<-- "_snippets/input/common.py" - -=== "cURL" - - ```bash - # Create job and capture job ID - RESPONSE=$(curl -s -X POST \ - "${NEMO_BASE_URL}/v1/customization/jobs" \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "name": "'${MODEL_NAME}'-lora-job", - "config": "'${NAMESPACE}'/'${MODEL_NAME}'-lora-config@v'${MODEL_VERSION}'", - "dataset": "'${NAMESPACE}'/'${DATASET_NAME}'", - "output": {"name": "'${NAMESPACE}'/'${MODEL_NAME}'-lora@v'${MODEL_VERSION}'"}, - "description": "LoRA fine-tuning job for '${MODEL_NAME}'", - "training": { - "type": "sft", - "peft": { - "type": "lora", - "rank": 16, - "alpha": 32, - "dropout": 0.01 - }, - "epochs": 3, - "batch_size": 8, - "learning_rate": 5e-5 - } - }') - - JOB_ID=$(echo "$RESPONSE" | jq -r '.id') - OUTPUT_NAME=$(echo "$RESPONSE" | jq -r '.spec.output.name') - echo "Started job with ID: $JOB_ID" - echo "Output name: $OUTPUT_NAME" - ``` - + + +`--8<-- "_snippets/input/common.py"` + + +```bash +# Create job and capture job ID +RESPONSE=$(curl -s -X POST \ +"${NEMO_BASE_URL}/v1/customization/jobs" \ +-H 'accept: application/json' \ +-H 'Content-Type: application/json' \ +-d '{ +"name": "'${MODEL_NAME}'-lora-job", +"config": "'${NAMESPACE}'/'${MODEL_NAME}'-lora-config@v'${MODEL_VERSION}'", +"dataset": "'${NAMESPACE}'/'${DATASET_NAME}'", +"output": {"name": "'${NAMESPACE}'/'${MODEL_NAME}'-lora@v'${MODEL_VERSION}'"}, +"description": "LoRA fine-tuning job for '${MODEL_NAME}'", +"training": { +"type": "sft", +"peft": { +"type": "lora", +"rank": 16, +"alpha": 32, +"dropout": 0.01 +}, +"epochs": 3, +"batch_size": 8, +"learning_rate": 5e-5 +} +}') + +JOB_ID=$(echo "$RESPONSE" | jq -r '.id') +OUTPUT_NAME=$(echo "$RESPONSE" | jq -r '.spec.output.name') +echo "Started job with ID: $JOB_ID" +echo "Output name: $OUTPUT_NAME" +``` + + Copy the following values from the response: - `id` (Job ID) @@ -557,173 +602,175 @@ We'll need them later to monitor the job's status and access the fine-tuned mode Check job progress: -=== "Python SDK" - - --8<-- "_snippets/input/job-monitoring.py" - -=== "cURL" - - ```bash - # Monitor job status with comprehensive handling - while true; do - RESPONSE=$(curl -s -X GET \ - "${NEMO_BASE_URL}/v1/customization/jobs/${JOB_ID}" \ - -H 'accept: application/json') - - STATUS=$(echo "$RESPONSE" | jq -r '.status') - echo "Job status: $STATUS" - - if [ "$STATUS" = "completed" ]; then - echo "Training completed successfully!" - break - elif [ "$STATUS" = "failed" ] || [ "$STATUS" = "cancelled" ]; then - echo "Training finished with status: $STATUS" - if [ "$STATUS" = "failed" ]; then - echo "Check the job logs for error details." - fi - break - elif [ "$STATUS" = "created" ] || [ "$STATUS" = "pending" ]; then - echo "Job is queued and waiting to start..." - elif [ "$STATUS" = "running" ]; then - echo "Training is in progress..." - # Optionally show progress if available - PROGRESS=$(echo "$RESPONSE" | jq -r '.status_details.percentage_done // "N/A"') - if [ "$PROGRESS" != "N/A" ] && [ "$PROGRESS" != "null" ]; then - echo "Progress: ${PROGRESS}%" - fi - elif [ "$STATUS" = "cancelling" ]; then - echo "Job is being cancelled..." - elif [ "$STATUS" = "ready" ] || [ "$STATUS" = "unknown" ]; then - echo "Job finished with status: $STATUS" - break - else - echo "Unknown status: $STATUS" - fi - - sleep 60 # Wait 1 minute before checking again - done - ``` - + + +`--8<-- "_snippets/input/job-monitoring.py"` + + +```bash +# Monitor job status with comprehensive handling +while true; do +RESPONSE=$(curl -s -X GET \ +"${NEMO_BASE_URL}/v1/customization/jobs/${JOB_ID}" \ +-H 'accept: application/json') + +STATUS=$(echo "$RESPONSE" | jq -r '.status') +echo "Job status: $STATUS" + +if [ "$STATUS" = "completed" ]; then +echo "Training completed successfully!" +break +elif [ "$STATUS" = "failed" ] || [ "$STATUS" = "cancelled" ]; then +echo "Training finished with status: $STATUS" +if [ "$STATUS" = "failed" ]; then +echo "Check the job logs for error details." +fi +break +elif [ "$STATUS" = "created" ] || [ "$STATUS" = "pending" ]; then +echo "Job is queued and waiting to start..." +elif [ "$STATUS" = "running" ]; then +echo "Training is in progress..." +# Optionally show progress if available +PROGRESS=$(echo "$RESPONSE" | jq -r '.status_details.percentage_done // "N/A"') +if [ "$PROGRESS" != "N/A" ] && [ "$PROGRESS" != "null" ]; then +echo "Progress: ${PROGRESS}%" +fi +elif [ "$STATUS" = "cancelling" ]; then +echo "Job is being cancelled..." +elif [ "$STATUS" = "ready" ] || [ "$STATUS" = "unknown" ]; then +echo "Job finished with status: $STATUS" +break +else +echo "Unknown status: $STATUS" +fi + +sleep 60 # Wait 1 minute before checking again +done +``` + + --- ## Test the Deployed Model After the customization job has been completed, you can use the `output.name` to access the fine-tuned model and evaluate its performance. The base model NIM deployment you created earlier will automatically load the LoRA adapter when you specify the LoRA model ID in your inference requests. -!!! note - The inference endpoints use the `inference_base_url` configured during client initialization (typically the NIM proxy URL). The base model deployment must be running before you can test inference with LoRA adapters. - -!!! tip - If you included a WandB API key, you can view your training results at [wandb.ai](https://wandb.ai/home) under the `nvidia-nemo-customizer` project. - -=== "Python SDK" + +The inference endpoints use the `inference_base_url` configured during client initialization (typically the NIM proxy URL). The base model deployment must be running before you can test inference with LoRA adapters. + - ```python - base_model_id = f"{NAMESPACE}/{MODEL_NAME}" + +If you included a WandB API key, you can view your training results at [wandb.ai](https://wandb.ai/home) under the `nvidia-nemo-customizer` project. + - # Option 1: If you still have the job object from creation - # lora_model_id = job.spec.output.name - - # Option 2: Construct from job parameters (use this if running in a new session) - lora_model_id = f"{NAMESPACE}/{MODEL_NAME}-lora@v{MODEL_VERSION}" + + +```python +base_model_id = f"{NAMESPACE}/{MODEL_NAME}" + +# Option 1: If you still have the job object from creation +# lora_model_id = job.spec.output.name + +# Option 2: Construct from job parameters (use this if running in a new session) +lora_model_id = f"{NAMESPACE}/{MODEL_NAME}-lora@v{MODEL_VERSION}" + +# First, check if the models are available for inference +# Note: The base model deployment must be running and registered with the NIM proxy +try: +# List models from the entity store (shows all registered models) +models_response = client.models.list() +available_models = models_response.data + +print("Registered models:") +for model in available_models: +print(f" - {model.id}") + +# Test base model +print(f"\nTesting base model: {base_model_id}") +base_response = client.chat.completions.create( +model=base_model_id, +messages=[ +{ +"role": "user", +"content": "Hello, can you help me?" +} +], +max_tokens=100, +temperature=0.7 +) - # First, check if the models are available for inference - # Note: The base model deployment must be running and registered with the NIM proxy - try: - # List models from the entity store (shows all registered models) - models_response = client.models.list() - available_models = models_response.data +print("Base model response:") +print(base_response.choices[0].message.content) + +# Test LoRA-adapted model (if available) +print(f"\nTesting LoRA-adapted model: {lora_model_id}") +lora_response = client.chat.completions.create( +model=lora_model_id, +messages=[ +{ +"role": "user", +"content": "Hello, can you help me?" +} +], +max_tokens=100, +temperature=0.7 +) - print("Registered models:") - for model in available_models: - print(f" - {model.id}") - - # Test base model - print(f"\nTesting base model: {base_model_id}") - base_response = client.chat.completions.create( - model=base_model_id, - messages=[ - { - "role": "user", - "content": "Hello, can you help me?" - } - ], - max_tokens=100, - temperature=0.7 - ) - - print("Base model response:") - print(base_response.choices[0].message.content) - - # Test LoRA-adapted model (if available) - print(f"\nTesting LoRA-adapted model: {lora_model_id}") - lora_response = client.chat.completions.create( - model=lora_model_id, - messages=[ - { - "role": "user", - "content": "Hello, can you help me?" - } - ], - max_tokens=100, - temperature=0.7 - ) - - print("LoRA-adapted model response:") - print(lora_response.choices[0].message.content) - - except Exception as e: - print(f"Error testing model: {e}") - ``` - -=== "cURL" - - ```bash - # Set the NIM proxy URL - export NIM_PROXY_URL="http://nim.test" - - # Option 1: If you captured OUTPUT_MODEL from job creation (from earlier in tutorial) - # export OUTPUT_MODEL="" - - # Option 2: Construct from environment variables (use this if running in a new session) - export LORA_MODEL_ID="${NAMESPACE}/${MODEL_NAME}-lora@v${MODEL_VERSION}" - - # Test model availability - curl -X GET "${NIM_PROXY_URL}/models" | jq - - # Test inference against the base model - curl -X POST "${NIM_PROXY_URL}/v1/chat/completions" \ - -H 'Content-Type: application/json' \ - -d '{ - "model": "'${NAMESPACE}'/'${MODEL_NAME}'", - "messages": [ - {"role": "user", "content": "Hello! How are you?"}, - {"role": "assistant", "content": "Hi! I am quite well, how can I help you today?"}, - {"role": "user", "content": "Can you write me a song?"} - ], - "top_p": 1, - "n": 1, - "max_tokens": 50, - "frequency_penalty": 1.0 - }' - - # Test inference against a LoRA-adapted model (after customization completes) - curl -X POST "${NIM_PROXY_URL}/v1/chat/completions" \ - -H 'Content-Type: application/json' \ - -d '{ - "model": "'${LORA_MODEL_ID}'", - "messages": [ - {"role": "user", "content": "Hello! How are you?"}, - {"role": "assistant", "content": "Hi! I am quite well, how can I help you today?"}, - {"role": "user", "content": "Can you write me a song?"} - ], - "top_p": 1, - "n": 1, - "max_tokens": 50, - "frequency_penalty": 1.0 - }' - ``` +print("LoRA-adapted model response:") +print(lora_response.choices[0].message.content) +except Exception as e: +print(f"Error testing model: {e}") +``` + + +```bash +# Set the NIM proxy URL +export NIM_PROXY_URL="http://nim.test" + +# Option 1: If you captured OUTPUT_MODEL from job creation (from earlier in tutorial) +# export OUTPUT_MODEL="" + +# Option 2: Construct from environment variables (use this if running in a new session) +export LORA_MODEL_ID="${NAMESPACE}/${MODEL_NAME}-lora@v${MODEL_VERSION}" + +# Test model availability +curl -X GET "${NIM_PROXY_URL}/models" | jq + +# Test inference against the base model +curl -X POST "${NIM_PROXY_URL}/v1/chat/completions" \ +-H 'Content-Type: application/json' \ +-d '{ +"model": "'${NAMESPACE}'/'${MODEL_NAME}'", +"messages": [ +{"role": "user", "content": "Hello! How are you?"}, +{"role": "assistant", "content": "Hi! I am quite well, how can I help you today?"}, +{"role": "user", "content": "Can you write me a song?"} +], +"top_p": 1, +"n": 1, +"max_tokens": 50, +"frequency_penalty": 1.0 +}' + +# Test inference against a LoRA-adapted model (after customization completes) +curl -X POST "${NIM_PROXY_URL}/v1/chat/completions" \ +-H 'Content-Type: application/json' \ +-d '{ +"model": "'${LORA_MODEL_ID}'", +"messages": [ +{"role": "user", "content": "Hello! How are you?"}, +{"role": "assistant", "content": "Hi! I am quite well, how can I help you today?"}, +{"role": "user", "content": "Can you write me a song?"} +], +"top_p": 1, +"n": 1, +"max_tokens": 50, +"frequency_penalty": 1.0 +}' +``` + + ## Next Steps -Learn how to [check customization job metrics](metrics.md) to monitor the training progress and performance of your fine-tuned model. +Learn how to [check customization job metrics](/fine-tune-models/tutorials/job-metrics) to monitor the training progress and performance of your fine-tuned model. diff --git a/docs/customizer/tutorials/index.mdx b/docs/customizer/tutorials/index.mdx index c6fe716138..8dce69d9cd 100644 --- a/docs/customizer/tutorials/index.mdx +++ b/docs/customizer/tutorials/index.mdx @@ -1,8 +1,12 @@ -# Fine-Tuning Tutorials - +--- +title: "Fine-Tuning Tutorials" +description: "" +--- Use the tutorials in this section to gain a deeper understanding of how the NVIDIA NeMo Customizer microservice enables fine-tuning tasks. -!!! tip "Tutorials are organized by complexity and typically build on one another. The tutorials reference `NMP_BASE_URL`, which is the base URL of your {{platform_name}} deployment. Refer to the [Setup guide](../../get-started/setup.md) for installation, setup, and platform URL guidance." + +Tutorials are organized by complexity and typically build on one another. The tutorials reference `NMP_BASE_URL`, which is the base URL of your NeMo Platform deployment. Refer to the [Setup guide](/get-started/setup) for installation, setup, and platform URL guidance. + --- @@ -10,7 +14,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID
-- **[Understanding Model Entities and Adapters](understand-configurations-and-models.md)** +- **[Understanding Model Entities and Adapters](/fine-tune-models/tutorials/understanding-models-and-training)** --- @@ -24,7 +28,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID
-- **[Format Training Datasets](format-training-dataset.md)** +- **[Format Training Datasets](/fine-tune-models/tutorials/format-training-dataset)** --- @@ -38,7 +42,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID
-- **[Fine-Tune a Model with Custom Data Using LoRA](lora-customization-job.ipynb)** +- **Fine-Tune a Model with Custom Data Using LoRA** --- @@ -46,7 +50,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID nemo-customizer -- **[Fine-Tune a Model with Custom Data Processing All Weights](sft-customization-job.ipynb)** +- **Fine-Tune a Model with Custom Data Processing All Weights** --- @@ -54,7 +58,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID nemo-customizer -- **[Align a Model with DPO and Preference Data](dpo-customization-job.ipynb)** +- **Align a Model with DPO and Preference Data** --- @@ -62,7 +66,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID nemo-customizer dpo -- **[Distill a Large Model into a Smaller One with Knowledge Distillation](distillation-customization-job.ipynb)** +- **Distill a Large Model into a Smaller One with Knowledge Distillation** --- @@ -70,7 +74,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID nemo-customizer knowledge-distillation -- **[Fine-Tune an Embedding Model With Positive and Negative Samples Using LoRA](embedding-customization-job.ipynb)** +- **Fine-Tune an Embedding Model With Positive and Negative Samples Using LoRA** --- @@ -84,7 +88,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID
-- **[Check Customization Job Metrics](metrics.md)** +- **[Check Customization Job Metrics](/fine-tune-models/tutorials/job-metrics)** --- @@ -92,7 +96,7 @@ Use the tutorials in this section to gain a deeper understanding of how the NVID nemo-customizer mlflow wandb -- **[Optimize Tokens per GPU](optimize-throughput.ipynb)** +- **Optimize Tokens per GPU** --- diff --git a/docs/customizer/tutorials/metrics.mdx b/docs/customizer/tutorials/metrics.mdx index 2c3a594a3e..739ac0b57f 100644 --- a/docs/customizer/tutorials/metrics.mdx +++ b/docs/customizer/tutorials/metrics.mdx @@ -1,3 +1,7 @@ +--- +title: "Checking Your Customization Job Metrics" +description: "" +--- # Checking Your Customization Job Metrics @@ -7,13 +11,53 @@ After completing a customization job, you can monitor its performance through tr 2. Through MLflow (optional) 3. Using Weights & Biases (optional) -!!! note "The time to complete this tutorial is approximately 10 minutes." +The time to complete this tutorial is approximately 10 minutes. ## Prerequisites ---8<-- "_snippets/tutorials/prereqs.md" +# Platform Prerequisites ---8<-- "customizer/tutorials/_snippets/customizer-prereqs.md" + +All platform resources—models, datasets, and more—must belong to a **workspace**. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use **projects** to group related resources. + +**If you're new to the platform**, start with the **[Setup guide](/get-started/setup)** to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end. + +**If you're already familiar** with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial. + +For more information, see [Workspaces](/get-started/core-concepts/workspaces) and [Projects](/get-started/core-concepts/projects). + + +# NeMo Customizer Prerequisites + + +Before starting, make sure you have: + +- NeMo Platform installed and deployed (see [Setup](/get-started/setup)) +- The `nemo-platform` Python SDK installed (`pip install nemo-platform[all]`) +- (Optional) Weights & Biases account and API key for enhanced visualization + +**Set up environment variables:** + +```bash +# Set the base URL for {{platform_name}} +export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL + +# Optional: Weights & Biases for experiment tracking +export WANDB_API_KEY="" +``` + +**Initialize the SDK:** + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + ### Tutorial-Specific Prerequisites @@ -74,12 +118,14 @@ If your deployment has MLflow tracking enabled: 3. Find the run using your customization job ID 4. View detailed metrics, including training and validation loss curves, under the "Metrics" tab -!!! note "MLflow integration is configured at the cluster level. Contact your administrator if you need access to the MLflow UI or if MLflow tracking is not enabled for your deployment." + +MLflow integration is configured at the cluster level. Contact your administrator if you need access to the MLflow UI or if MLflow tracking is not enabled for your deployment. + ### Using Weights & Biases -If your customization job was created with W&B integration enabled (see [Weights & Biases Integration](../manage-customization-jobs/create-job.md)): +If your customization job was created with W&B integration enabled (see [Weights & Biases Integration](/fine-tune-models/manage-jobs/create-job)): 1. Go to [wandb.ai](https://wandb.ai/home) and navigate to your project 2. Find the run corresponding to your customization job @@ -123,10 +169,11 @@ print(f"Status: {job.status}") The `api_key_secret` field references a stored secret containing your `WANDB_API_KEY`. Use the secret name (e.g., `"my-wandb-key"`) to resolve it from the request workspace. -To create the secret, see [Weights & Biases Keys](../../get-started/concepts/manage-secrets.md). +To create the secret, see [Weights & Biases Keys](/get-started/core-concepts/manage-secrets). Then view your results at [wandb.ai](https://wandb.ai/home) under your project. -![W&B charts example](../_images/wandb_charts_example.png) +![W&B charts example](/customizer/_images/wandb_charts_example.png) -!!! note - The W&B integration is optional and must be configured when [creating the customization job](../manage-customization-jobs/create-job.md). When enabled, training metrics are sent to W&B using your API key. While we encrypt your API key and don't log it internally, please review W&B's terms of service before use. + +The W&B integration is optional and must be configured when [creating the customization job](/fine-tune-models/manage-jobs/create-job). When enabled, training metrics are sent to W&B using your API key. While we encrypt your API key and don't log it internally, please review W&B's terms of service before use. + diff --git a/docs/customizer/tutorials/understand-configurations-and-models.mdx b/docs/customizer/tutorials/understand-configurations-and-models.mdx index daac191bd4..41a5982ff5 100644 --- a/docs/customizer/tutorials/understand-configurations-and-models.mdx +++ b/docs/customizer/tutorials/understand-configurations-and-models.mdx @@ -1,17 +1,64 @@ +--- +title: "Understanding NeMo Customizer: Models, Training, and Resources" +description: "" +--- # Understanding NeMo Customizer: Models, Training, and Resources Learn the fundamentals of how NeMo Customizer works to make informed decisions about your fine-tuning projects. This tutorial covers how models are organized, how adapters attach to base models, training types and GPU requirements, and how to choose the right approach for your use case. -Understanding these basics will help you navigate the fine-tuning process more effectively and avoid common issues. If you're ready to start fine-tuning immediately, you can jump to [SFT Customization Job](sft-customization-job.ipynb) after completing this tutorial. +Understanding these basics will help you navigate the fine-tuning process more effectively and avoid common issues. If you're ready to start fine-tuning immediately, you can jump to SFT Customization Job after completing this tutorial. + + +The time to complete this tutorial is approximately 15 minutes. +This tutorial focuses on understanding and discovery—no actual training jobs are created. + -!!! note "The time to complete this tutorial is approximately 15 minutes." - This tutorial focuses on understanding and discovery—no actual training jobs are created. ## Prerequisites ---8<-- "_snippets/tutorials/prereqs.md" +# Platform Prerequisites + + +All platform resources—models, datasets, and more—must belong to a **workspace**. Workspaces provide organizational and authorization boundaries for your work. Within a workspace, you can optionally use **projects** to group related resources. + +**If you're new to the platform**, start with the **[Setup guide](/get-started/setup)** to learn how to deploy and evaluate models, and optimize agents using the platform end-to-end. + +**If you're already familiar** with workspaces and how to upload datasets to the platform, you can proceed directly with this tutorial. + +For more information, see [Workspaces](/get-started/core-concepts/workspaces) and [Projects](/get-started/core-concepts/projects). + + +# NeMo Customizer Prerequisites + + +Before starting, make sure you have: + +- NeMo Platform installed and deployed (see [Setup](/get-started/setup)) +- The `nemo-platform` Python SDK installed (`pip install nemo-platform[all]`) +- (Optional) Weights & Biases account and API key for enhanced visualization + +**Set up environment variables:** + +```bash +# Set the base URL for {{platform_name}} +export NMP_BASE_URL="http://localhost:8080" # Or your deployed platform URL + +# Optional: Weights & Biases for experiment tracking +export WANDB_API_KEY="" +``` + +**Initialize the SDK:** ---8<-- "customizer/tutorials/_snippets/customizer-prereqs.md" +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + --- @@ -19,7 +66,7 @@ Understanding these basics will help you navigate the fine-tuning process more e ### What is a Model Entity? -A **Model Entity** represents a model registered in the {{platform_name}}. It contains: +A **Model Entity** represents a model registered in the NeMo Platform. It contains: 1. **FileSet Reference**: Points to the model checkpoint files (weights, config, tokenizer) 2. **Model Spec**: Auto-populated metadata about the model architecture (layers, parameters, etc.) @@ -50,7 +97,7 @@ A **FileSet** is a collection of files stored in the platform's file service. Fo ## The Customization Workflow ```mermaid -{% raw %} + flowchart LR A[1. Create FileSet
with model files] --> B[2. Create Model Entity
pointing to FileSet] B --> C[3. Create Customization Job
referencing Model Entity] @@ -58,7 +105,7 @@ flowchart LR D -->{{ LoRA }} E[Adapter created and
attached to Model Entity] D -->|Full SFT| F[New Model Entity
with customized weights] E --> G[Auto-deploy to NIM
if enabled] -{% endraw %} + ``` ### Step-by-Step Breakdown @@ -240,15 +287,16 @@ Customization jobs consume disk space on the platform's shared persistent volume | LoRA | ~1.5× base model size | Stores base model + small adapter weights | | Full SFT | ~3× base model size | Stores base model + full checkpoint + output model | -!!! info - These estimates cover model weights only and do not include training dataset size. - If the platform disk fills during a job, the job fails with an I/O - error and the job service may return a ``500`` status when you retrieve logs. + +These estimates cover model weights only and do not include training dataset size. +If the platform disk fills during a job, the job fails with an I/O +error and the job service may return a `500` status when you retrieve logs. - Ensure your platform's shared persistent volume has at least **3× the base model size** - of free space before starting a full SFT job, or **1.5×** for LoRA jobs. +Ensure your platform's shared persistent volume has at least **3× the base model size** +of free space before starting a full SFT job, or **1.5×** for LoRA jobs. + -For troubleshooting disk-related failures, see [customizer](../../troubleshooting/customizer.md). +For troubleshooting disk-related failures, see [customizer](/reference/troubleshooting/customizer). ### Parallelism Parameters Explained @@ -264,29 +312,31 @@ Parallelism is configured via `training.parallelism`. These parameters control h `data_parallel_size` is automatically derived as `total_gpus / (TP × PP × CP)` and is not set directly. -!!! info - **Recommended parallelism for Experts (MoE) Models**: + +**Recommended parallelism for Experts (MoE) Models**: - The `expert_parallel_size` parameter is used to parallelize a Mixture of Experts (MoE) model's experts across GPUs. For non-MoE models, this parameter is ignored. A model's model card will indicate if it is a Mixture of Experts model and specifies its number of experts. +The `expert_parallel_size` parameter is used to parallelize a Mixture of Experts (MoE) model's experts across GPUs. For non-MoE models, this parameter is ignored. A model's model card will indicate if it is a Mixture of Experts model and specifies its number of experts. - The number of experts in the model must be divisible by `expert_parallel_size`. For example, if a model has 8 experts, setting `expert_parallel_size=4` results in each GPU processing 2 experts. +The number of experts in the model must be divisible by `expert_parallel_size`. For example, if a model has 8 experts, setting `expert_parallel_size=4` results in each GPU processing 2 experts. - Also, the value of `expert_parallel_size` must evenly divide the derived `data_parallel_size`, which is automatically calculated as `data_parallel_size = total GPUs / (tensor_parallel_size × pipeline_parallel_size × context_parallel_size)`. +Also, the value of `expert_parallel_size` must evenly divide the derived `data_parallel_size`, which is automatically calculated as `data_parallel_size = total GPUs / (tensor_parallel_size × pipeline_parallel_size × context_parallel_size)`. - For example, with 8 total GPUs, `tensor_parallel_size=2`, and `pipeline_parallel_size=1`: - - Derived `data_parallel_size = 8 / (2 × 1 × 1) = 4` - - Valid `expert_parallel_size` values: `1`, `2`, or `4` (must evenly divide 4) - - Invalid `expert_parallel_size` value: `3` (does not evenly divide 4) +For example, with 8 total GPUs, `tensor_parallel_size=2`, and `pipeline_parallel_size=1`: +- Derived `data_parallel_size = 8 / (2 × 1 × 1) = 4` +- Valid `expert_parallel_size` values: `1`, `2`, or `4` (must evenly divide 4) +- Invalid `expert_parallel_size` value: `3` (does not evenly divide 4) + ### Resource Allocation Rules Training configurations must satisfy mathematical constraints to work properly: -!!! info - **GPU Allocation Rule**: The total number of GPUs (`num_gpus_per_node x num_nodes`) must be a multiple of: - `tensor_parallel_size × pipeline_parallel_size × context_parallel_size` + +**GPU Allocation Rule**: The total number of GPUs (`num_gpus_per_node x num_nodes`) must be a multiple of: +`tensor_parallel_size × pipeline_parallel_size × context_parallel_size` - If this constraint isn't met, your training job will fail with a validation error. +If this constraint isn't met, your training job will fail with a validation error. + **Example Calculations**: @@ -300,7 +350,7 @@ Training configurations must satisfy mathematical constraints to work properly: ### Decision Framework ```mermaid -{% raw %} + flowchart TD A[What's your goal?] --> B{Need maximum
performance?} B -->{{ Yes }} C{Have 4+ GPUs?} @@ -316,7 +366,7 @@ flowchart TD E --> I[New Model Entity
with full weights] G --> J[Multiple adapters
on same Model Entity] H --> J -{% endraw %} + ``` ### When to Use LoRA @@ -380,7 +430,7 @@ model = client.models.create( ) ``` -For detailed guidance, see [Import HuggingFace Model](import-hf-model.md). +For detailed guidance, see [Import HuggingFace Model](/fine-tune-models/tutorials/import-huggingface-models). --- @@ -390,25 +440,25 @@ Now that you understand how Model Entities and Adapters work, you're ready to pr
-- **[Format Training Dataset](format-training-dataset.md)** +- **[Format Training Dataset](/fine-tune-models/tutorials/format-training-dataset)** --- Learn how to prepare your data for fine-tuning. -- **[Start a LoRA Job](lora-customization-job.ipynb)** +- **Start a LoRA Job** --- Create a parameter-efficient LoRA adapter. -- **[Start a Full SFT Job](sft-customization-job.ipynb)** +- **Start a Full SFT Job** --- Use full supervised fine-tuning for maximum performance. -- **[Import Custom Models](import-hf-model.md)** +- **[Import Custom Models](/fine-tune-models/tutorials/import-huggingface-models)** --- diff --git a/docs/data-designer/_snippets/job-results.mdx b/docs/data-designer/_snippets/job-results.mdx index 0294001b9d..e01ccbcd85 100644 --- a/docs/data-designer/_snippets/job-results.mdx +++ b/docs/data-designer/_snippets/job-results.mdx @@ -1,8 +1,13 @@ -??? "More about job results" - The Data Designer library writes several artifacts to disk when running a full generation job, including the final dataset as parquet. - When a Data Designer job runs through NeMo Services, the entire working directory of artifacts produced by the library is saved as a job result. - The `download_artifacts` method downloads this artifacts directory (stored as a `.tar.gz` archive), - unarchives it, and returns a `DataDesignerJobResults` object that can be used to load results into memory as DataFrames or other objects for programmatic inspection. +--- +title: "Untitled" +description: "" +--- + +The Data Designer library writes several artifacts to disk when running a full generation job, including the final dataset as parquet. +When a Data Designer job runs through NeMo Services, the entire working directory of artifacts produced by the library is saved as a job result. +The `download_artifacts` method downloads this artifacts directory (stored as a `.tar.gz` archive), +unarchives it, and returns a `DataDesignerJobResults` object that can be used to load results into memory as DataFrames or other objects for programmatic inspection. - By default, `download_artifacts` saves the artifacts to a relative local directory named after the job. - An alternative path can be passed to `download_artifacts`. +By default, `download_artifacts` saves the artifacts to a relative local directory named after the job. +An alternative path can be passed to `download_artifacts`. + \ No newline at end of file diff --git a/docs/data-designer/_snippets/preview-results.mdx b/docs/data-designer/_snippets/preview-results.mdx index 3656b1594d..15f82c397b 100644 --- a/docs/data-designer/_snippets/preview-results.mdx +++ b/docs/data-designer/_snippets/preview-results.mdx @@ -1,4 +1,9 @@ -??? "More about preview results" - The `PreviewResults` object returned by `client.data_designer.preview` stores all its fields in memory; nothing is persisted to disk by default. - Use standard Python methods to save any preview data you want to keep around longer term. - For example, the `dataset` is a regular Pandas DataFrame and can be saved to disk via methods like `to_csv` or `to_parquet`. +--- +title: "Untitled" +description: "" +--- + +The `PreviewResults` object returned by `client.data_designer.preview` stores all its fields in memory; nothing is persisted to disk by default. +Use standard Python methods to save any preview data you want to keep around longer term. +For example, the `dataset` is a regular Pandas DataFrame and can be saved to disk via methods like `to_csv` or `to_parquet`. + \ No newline at end of file diff --git a/docs/data-designer/cli.mdx b/docs/data-designer/cli.mdx index 57772ba1a4..42d26e145f 100644 --- a/docs/data-designer/cli.mdx +++ b/docs/data-designer/cli.mdx @@ -1,3 +1,7 @@ +--- +title: "CLI" +description: "" +--- # Data Designer CLI @@ -25,7 +29,7 @@ def load_config_builder() -> dd.DataDesignerConfigBuilder: return config_builder ``` -The same configuration source can usually be used with `run` or `submit`. Resource choices determine whether it is compatible with NeMo Services execution; see [Execution Modes](execution-modes.md). +The same configuration source can usually be used with `run` or `submit`. Resource choices determine whether it is compatible with NeMo Services execution; see [Execution Modes](/design-synthetic-data/execution-modes). ## Run Versus Submit diff --git a/docs/data-designer/execution-modes.mdx b/docs/data-designer/execution-modes.mdx index e967937066..903add1856 100644 --- a/docs/data-designer/execution-modes.mdx +++ b/docs/data-designer/execution-modes.mdx @@ -1,3 +1,7 @@ +--- +title: "Execution Modes" +description: "" +--- # Execution Modes @@ -12,8 +16,9 @@ The important distinction is not simply "local" versus "remote". There are two s `run` versus `submit` primarily controls where the plugin workload execution happens. It does not necessarily determine whether the workload uses NeMo Services APIs. -!!! note - `nemo data-designer ... run` can be fully local, but it is not an offline-only mode. A local run can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. + +`nemo data-designer ... run` can be fully local, but it is not an offline-only mode. A local run can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. + ## Terms diff --git a/docs/data-designer/index.mdx b/docs/data-designer/index.mdx index 2aa81d1517..66d31e9995 100644 --- a/docs/data-designer/index.mdx +++ b/docs/data-designer/index.mdx @@ -1,7 +1,11 @@ +--- +title: "About" +description: "" +--- # Data Designer -Data Designer on {{platform_name}} enables high-quality synthetic data generation through the NeMo Data Designer plugin. You can execute workloads locally from the CLI, submit them to a running NeMo Services cluster, or call the Data Designer API from the SDK. +Data Designer on NeMo Platform enables high-quality synthetic data generation through the NeMo Data Designer plugin. You can execute workloads locally from the CLI, submit them to a running NeMo Services cluster, or call the Data Designer API from the SDK. ## Overview @@ -13,9 +17,10 @@ The plugin is built on the open-source [NVIDIA NeMo Data Designer library](https Data Designer separates **configuration** from **execution**. -!!! note - The code snippets below are for conceptual demonstration purposes only. - For runnable examples, see the [tutorials](tutorials/index.md). + +The code snippets below are for conceptual demonstration purposes only. +For runnable examples, see the [tutorials](/design-synthetic-data/tutorials/overview). + ### 1. Build Configurations @@ -55,7 +60,7 @@ The same configuration can run through different plugin surfaces: `run` versus `submit` primarily controls where the plugin workload execution happens. A local `run` can be fully local, but it is not an offline-only mode: it can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. -See [Execution Modes](execution-modes.md) for the full model. +See [Execution Modes](/design-synthetic-data/execution-modes) for the full model. ## NeMo Services Integration @@ -74,25 +79,25 @@ These integrations are required for `submit` and SDK execution. They are optiona
-- **[Execution Modes](execution-modes.md)** +- **[Execution Modes](/design-synthetic-data/execution-modes)** --- Understand local execution, NeMo Services execution, and NeMo resources. -- **[CLI](cli.md)** +- **[CLI](/design-synthetic-data/cli)** --- Run previews and create datasets with `nemo data-designer`. -- **[Tutorials](tutorials/index.md)** +- **[Tutorials](/design-synthetic-data/tutorials/overview)** --- Learn through examples: basics, seeding, and more. -- **[Migration Guide](migration.md)** +- **[Migration Guide](/design-synthetic-data/migrating-from-standalone-library)** --- diff --git a/docs/data-designer/migration.mdx b/docs/data-designer/migration.mdx index 848b2f2302..2936da432b 100644 --- a/docs/data-designer/migration.mdx +++ b/docs/data-designer/migration.mdx @@ -1,9 +1,13 @@ +--- +title: "Migrating from Standalone Library" +description: "" +--- # Moving Between Execution Modes Data Designer configurations are mostly portable across plugin execution modes. The main migration question is not "library versus service"; it is whether the workload executes locally in the CLI process or through NeMo Services, and which resources the configuration references. -See [Execution Modes](execution-modes.md) for the full two-axis model. +See [Execution Modes](/design-synthetic-data/execution-modes) for the full two-axis model. ## What Usually Stays the Same @@ -77,7 +81,7 @@ Before switching execution modes, verify: ## Getting Help -- **Execution Modes:** See [Execution Modes](execution-modes.md) for the conceptual model. -- **CLI:** See [Data Designer CLI](cli.md) for `run`, `submit`, and persona commands. -- **Tutorials:** Follow the [tutorials](tutorials/index.md) for hands-on examples. +- **Execution Modes:** See [Execution Modes](/design-synthetic-data/execution-modes) for the conceptual model. +- **CLI:** See [Data Designer CLI](/design-synthetic-data/cli) for `run`, `submit`, and persona commands. +- **Tutorials:** Follow the [tutorials](/design-synthetic-data/tutorials/overview) for hands-on examples. - **Library Docs:** Refer to the [open-source library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/getting-started/welcome) for configuration details. diff --git a/docs/data-designer/sdk-resources.mdx b/docs/data-designer/sdk-resources.mdx index 7160797ee0..1b6b850c6a 100644 --- a/docs/data-designer/sdk-resources.mdx +++ b/docs/data-designer/sdk-resources.mdx @@ -1,3 +1,7 @@ +--- +title: "SDK Resources" +description: "" +--- # Data Designer SDK Resources @@ -6,9 +10,9 @@ The `data_designer.config` module provides a consistent, context-agnostic experi Once you are ready to execute that config through NeMo Services APIs, you use objects from the `nemo_platform` SDK. This page explains the SDK objects used for Data Designer API execution. -!!! note - The SDK currently executes Data Designer workloads through the Data Designer API. Local SDK execution is planned, but not available yet. Use `nemo data-designer ... run` for local in-process execution today. - + +The SDK currently executes Data Designer workloads through the Data Designer API. Local SDK execution is planned, but not available yet. Use `nemo data-designer ... run` for local in-process execution today. + ## DataDesignerResource diff --git a/docs/data-designer/tutorials/basics.mdx b/docs/data-designer/tutorials/basics.mdx index fea3537291..cadc1a25b0 100644 --- a/docs/data-designer/tutorials/basics.mdx +++ b/docs/data-designer/tutorials/basics.mdx @@ -1,5 +1,9 @@ - - +--- +title: "The Basics" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: download */} # The Basics @@ -9,14 +13,15 @@ For more detail about column behavior, see the [open-source library's version](h ## Prerequisites -Ensure you have completed the [tutorials prerequisites](index.md#prerequisites). This tutorial uses an Inference Gateway provider, so local CLI `run` and NeMo Services execution both need access to the Inference Gateway API in a running NeMo Services cluster. +Ensure you have completed the [tutorials prerequisites](/design-synthetic-data/tutorials/overview#prerequisites). This tutorial uses an Inference Gateway provider, so local CLI `run` and NeMo Services execution both need access to the Inference Gateway API in a running NeMo Services cluster. ## Part 1: Build the Configuration Use the `data_designer.config` package to define your dataset schema. This configuration code is the same across the plugin execution modes. -!!! tip - Build the configuration once, then choose whether to execute with CLI `run`, CLI `submit`, or the SDK. + +Build the configuration once, then choose whether to execute with CLI `run`, CLI `submit`, or the SDK. + ### Define Models @@ -46,7 +51,7 @@ config_builder = dd.DataDesignerConfigBuilder(model_configs) Define the columns for your dataset. The [library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/tutorials/the-basics) explains these column types in detail. -{% raw %} + ```python # Product category sampler config_builder.add_column( @@ -184,7 +189,7 @@ config_builder.add_column( ) ) ``` -{% endraw %} + ## Part 2: Execute @@ -255,7 +260,11 @@ print(df.head()) preview.analysis.to_report() ``` ---8<-- "data-designer/_snippets/preview-results.md" + +The `PreviewResults` object returned by `client.data_designer.preview` stores all its fields in memory; nothing is persisted to disk by default. +Use standard Python methods to save any preview data you want to keep around longer term. +For example, the `dataset` is a regular Pandas DataFrame and can be saved to disk via methods like `to_csv` or `to_parquet`. + **Iterate:** Adjust column configurations, prompts, or parameters in your `config_builder`, then run `preview` again until you're satisfied with the results. @@ -282,7 +291,15 @@ analysis = results.load_analysis() analysis.to_report() ``` ---8<-- "data-designer/_snippets/job-results.md" + +The Data Designer library writes several artifacts to disk when running a full generation job, including the final dataset as parquet. +When a Data Designer job runs through NeMo Services, the entire working directory of artifacts produced by the library is saved as a job result. +The `download_artifacts` method downloads this artifacts directory (stored as a `.tar.gz` archive), +unarchives it, and returns a `DataDesignerJobResults` object that can be used to load results into memory as DataFrames or other objects for programmatic inspection. + +By default, `download_artifacts` saves the artifacts to a relative local directory named after the job. +An alternative path can be passed to `download_artifacts`. + ## What Happens Under the Hood @@ -302,7 +319,7 @@ When you use CLI `submit` or the SDK today: ## Next Steps -- **Seed data:** Learn how to use external datasets in the [seeding tutorial](seeding.md) -- **Execution modes:** Learn more about local and NeMo Services execution in [Execution Modes](../execution-modes.md) +- **Seed data:** Learn how to use external datasets in the [seeding tutorial](/design-synthetic-data/tutorials/seeding-with-external-datasets) +- **Execution modes:** Learn more about local and NeMo Services execution in [Execution Modes](/design-synthetic-data/execution-modes) - **Column types:** Explore all available column types in the [library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/concepts/columns) - **Advanced features:** Learn about [processors](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/concepts/processors) and [validation](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/concepts/validators) diff --git a/docs/data-designer/tutorials/index.mdx b/docs/data-designer/tutorials/index.mdx index b83687f70c..afbadbe00e 100644 --- a/docs/data-designer/tutorials/index.mdx +++ b/docs/data-designer/tutorials/index.mdx @@ -1,11 +1,16 @@ +--- +title: "Overview" +description: "" +--- # Tutorials These tutorials demonstrate how to build Data Designer configurations and execute them through the NeMo Data Designer plugin. -!!! note - The code snippets on this page are for conceptual demonstration purposes only. - For runnable examples, jump ahead to the [Basics](basics.md) or [Seeding](seeding.md) tutorial. + +The code snippets on this page are for conceptual demonstration purposes only. +For runnable examples, jump ahead to the [Basics](/design-synthetic-data/tutorials/the-basics) or [Seeding](/design-synthetic-data/tutorials/seeding-with-external-datasets) tutorial. + ## Configuration and Execution @@ -47,8 +52,9 @@ preview = data_designer.preview(config_builder) job = data_designer.create(config_builder, num_records=1000) ``` -!!! tip - `run` versus `submit` primarily controls where the workload executes. Local `run` can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. See [Execution Modes](../execution-modes.md) for details. + +`run` versus `submit` primarily controls where the workload executes. Local `run` can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. See [Execution Modes](/design-synthetic-data/execution-modes) for details. + ## Execution-Specific Considerations @@ -63,15 +69,15 @@ When running through the plugin, supported resources depend on the execution mod ## Prerequisites -These tutorials use an [Inference Gateway](../../run-inference/about.md) provider for model calls, so a NeMo Services cluster must be running before you preview or create data — including with local CLI `run` (see [Execution Modes](../execution-modes.md#local-nemo-services-execution) for more about this distinction). -Complete [Setup](../../get-started/setup.md) to ensure you have the NeMo Services running locally and an inference provider available. +These tutorials use an [Inference Gateway](/models-and-inference/about) provider for model calls, so a NeMo Services cluster must be running before you preview or create data — including with local CLI `run` (see [Execution Modes](/design-synthetic-data/execution-modes#local-nemo-services-execution) for more about this distinction). +Complete [Setup](/get-started/setup) to ensure you have the NeMo Services running locally and an inference provider available. These tutorials reference the default NVIDIA Build model provider, which is created as `default/nvidia-build` during setup. ## Tutorials
-- **[The Basics](basics.md)** +- **[The Basics](/design-synthetic-data/tutorials/the-basics)** --- @@ -79,7 +85,7 @@ These tutorials reference the default NVIDIA Build model provider, which is crea beginner data-designer -- **[Seeding](seeding.md)** +- **[Seeding](/design-synthetic-data/tutorials/seeding-with-external-datasets)** --- diff --git a/docs/data-designer/tutorials/seeding.mdx b/docs/data-designer/tutorials/seeding.mdx index d8107e6fe6..eab7d8ebc3 100644 --- a/docs/data-designer/tutorials/seeding.mdx +++ b/docs/data-designer/tutorials/seeding.mdx @@ -1,5 +1,9 @@ - - +--- +title: "Seeding with External Datasets" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: download */} # Seeding with External Datasets @@ -17,8 +21,9 @@ Seed source support depends on where the workload executes: | **HuggingFace** | Supported | Supported | Publicly available datasets or private HuggingFace datasets. | | **Files API Filesets** | Supported when NeMo Services access is configured | Supported | Shared seed data stored through the Files API. | -!!! note - `run` versus `submit` controls where the workload executes. A local `run` can still read Files API Filesets if the configuration references them and NeMo Services access is configured. + +`run` versus `submit` controls where the workload executes. A local `run` can still read Files API Filesets if the configuration references them and NeMo Services access is configured. + ### HuggingFace Datasets @@ -55,7 +60,7 @@ FilesetFileSeedSource( ## Prerequisites -Ensure you have completed the [tutorials prerequisites](index.md#prerequisites). This tutorial uses an Inference Gateway provider, so local CLI `run` and NeMo Services execution both need access to the Inference Gateway API in a running NeMo Services cluster. +Ensure you have completed the [tutorials prerequisites](/design-synthetic-data/tutorials/overview#prerequisites). This tutorial uses an Inference Gateway provider, so local CLI `run` and NeMo Services execution both need access to the Inference Gateway API in a running NeMo Services cluster. ## Example: Medical Notes from Symptom Data @@ -135,7 +140,7 @@ config_builder.with_seed_dataset( Add columns that reference and extend the seed data: -{% raw %} + ```python # Patient details config_builder.add_column( @@ -241,7 +246,7 @@ Respond with only the notes, no other text. ) ) ``` -{% endraw %} + **Note:** The `diagnosis` and `patient_summary` variables come from the seed dataset columns. @@ -302,7 +307,11 @@ print(df.head()) preview.analysis.to_report() ``` ---8<-- "data-designer/_snippets/preview-results.md" + +The `PreviewResults` object returned by `client.data_designer.preview` stores all its fields in memory; nothing is persisted to disk by default. +Use standard Python methods to save any preview data you want to keep around longer term. +For example, the `dataset` is a regular Pandas DataFrame and can be saved to disk via methods like `to_csv` or `to_parquet`. + ### Generating the Full Dataset @@ -327,7 +336,15 @@ analysis = results.load_analysis() analysis.to_report() ``` ---8<-- "data-designer/_snippets/job-results.md" + +The Data Designer library writes several artifacts to disk when running a full generation job, including the final dataset as parquet. +When a Data Designer job runs through NeMo Services, the entire working directory of artifacts produced by the library is saved as a job result. +The `download_artifacts` method downloads this artifacts directory (stored as a `.tar.gz` archive), +unarchives it, and returns a `DataDesignerJobResults` object that can be used to load results into memory as DataFrames or other objects for programmatic inspection. + +By default, `download_artifacts` saves the artifacts to a relative local directory named after the job. +An alternative path can be passed to `download_artifacts`. + ## How Seeding Works @@ -342,6 +359,6 @@ When you configure a seed dataset: ## Next Steps -- **Execution modes:** Learn more about local and NeMo Services execution in [Execution Modes](../execution-modes.md) +- **Execution modes:** Learn more about local and NeMo Services execution in [Execution Modes](/design-synthetic-data/execution-modes) - **Column types:** Explore all available column types in the [library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/concepts/columns) - **Processors:** Transform your data with processors in the [library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.0/concepts/processors) diff --git a/docs/eula.mdx b/docs/eula.mdx index 52fabb5845..f61b1b12ef 100644 --- a/docs/eula.mdx +++ b/docs/eula.mdx @@ -1,5 +1,7 @@ -# Governing Terms - +--- +title: "EULA" +description: "" +--- The source code in the NVIDIA NeMo Platform repository is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Some NVIDIA-distributed artifacts used with or alongside NeMo Platform, including NIM containers, model weights, hosted services, or other separately distributed binaries and materials, may be governed by additional or different terms. Review the license files and terms that accompany each artifact before use. These terms may include the [NVIDIA Software License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the [Product-Specific Terms for NVIDIA AI Products](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/). diff --git a/docs/evaluator/benchmarks/agentic.mdx b/docs/evaluator/benchmarks/agentic.mdx index 04c64616c2..886fbc8905 100644 --- a/docs/evaluator/benchmarks/agentic.mdx +++ b/docs/evaluator/benchmarks/agentic.mdx @@ -1,3 +1,7 @@ +--- +title: "Agentic Benchmarks" +description: "" +--- # Agentic Benchmarks @@ -54,58 +58,60 @@ benchmark_params = { } ``` -=== "Job" - - ```python - from nemo_platform.types.evaluation import SystemBenchmarkOnlineJobParam - - job = client.evaluation.benchmark_jobs.create( - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/bfclv3-live-simple", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - benchmark_params={}, - ) + + +```python +from nemo_platform.types.evaluation import SystemBenchmarkOnlineJobParam + +job = client.evaluation.benchmark_jobs.create( + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/bfclv3-live-simple", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + benchmark_params={}, ) - ``` -=== "Result" - - Accuracy of tool call predictions +) +``` + + +Accuracy of tool call predictions - - Score name: tool-calling-accuracy - - Value range: 0.0–1.0 +- Score name: tool-calling-accuracy +- Value range: 0.0–1.0 - ```json +```json +{ + "scores": [ { - "scores": [ - { - "name": "tool-calling-accuracy", - "score_type": "range", - "count": 1, - "nan_count": 0, - "sum": 1.0, - "mean": 1.0, - "min": 1.0, - "max": 1.0, - "std_dev": 0.0, - "variance": 0.0 - } - ] + "name": "tool-calling-accuracy", + "score_type": "range", + "count": 1, + "nan_count": 0, + "sum": 1.0, + "mean": 1.0, + "min": 1.0, + "max": 1.0, + "std_dev": 0.0, + "variance": 0.0 } - ``` + ] +} +``` + + --- ## Job Management -After creating a job, navigate to [Benchmark Job Management](job-management.md) to oversee its execution and monitor progress. +After creating a job, navigate to [Benchmark Job Management](/evaluation/benchmarks/job-management) to oversee its execution and monitor progress. --- -!!! info - - - [Agent Configuration](../metrics/agent-configuration.md) - Use agents (generic or NAT) as targets in online evaluation and benchmark jobs - - [Agentic Evaluation Metrics](../metrics/agentic.md) - Detailed metric documentation for evaluating agentic workflows - - [Managing Secrets](../../get-started/concepts/manage-secrets.md) - Store API keys for external APIs - - [Evaluation Results](results.md) - Understanding and downloading results + +- [Agent Configuration](/evaluation/metrics/agent-configuration) - Use agents (generic or NAT) as targets in online evaluation and benchmark jobs +- [Agentic Evaluation Metrics](/evaluation/metrics/agentic-metrics) - Detailed metric documentation for evaluating agentic workflows +- [Managing Secrets](/get-started/core-concepts/manage-secrets) - Store API keys for external APIs +- [Evaluation Results](/evaluation/benchmarks/benchmark-results) - Understanding and downloading results + diff --git a/docs/evaluator/benchmarks/custom.mdx b/docs/evaluator/benchmarks/custom.mdx index 1e8505ce44..d3ef9da950 100644 --- a/docs/evaluator/benchmarks/custom.mdx +++ b/docs/evaluator/benchmarks/custom.mdx @@ -1,18 +1,23 @@ +--- +title: "Custom Benchmarks" +description: "" +--- # Custom Benchmarks -Custom benchmarks allow you to create reusable evaluation suites tailored to your specific use case. A benchmark combines one or more [metrics](../metrics/index.md) with a dataset, enabling consistent evaluation across multiple models or pipeline versions. +Custom benchmarks allow you to create reusable evaluation suites tailored to your specific use case. A benchmark combines one or more [metrics](/evaluation/metrics/overview) with a dataset, enabling consistent evaluation across multiple models or pipeline versions. -!!! note - Custom benchmarks can only include custom metrics that you create in your workspace. System metrics (in the `system` workspace) cannot be included in custom benchmarks at this time. To use system metrics, refer to [Industry Benchmarks](industry.md). + +Custom benchmarks can only include custom metrics that you create in your workspace. System metrics (in the `system` workspace) cannot be included in custom benchmarks at this time. To use system metrics, refer to [Industry Benchmarks](/evaluation/benchmarks/industry-benchmarks). + ## Prerequisites Before creating a custom benchmark, ensure you have: -- A [workspace](../../get-started/concepts/workspaces.md) created for your project -- One or more [custom metrics](../metrics/index.md) defined in your workspace -- A dataset uploaded as a [fileset](../../get-started/concepts/manage-files.md) +- A [workspace](/get-started/core-concepts/workspaces) created for your project +- One or more [custom metrics](/evaluation/metrics/overview) defined in your workspace +- A dataset uploaded as a [fileset](/get-started/core-concepts/manage-files) ```python import os @@ -24,8 +29,9 @@ client = NeMoPlatform( ) ``` -!!! tip - Set `NMP_BASE_URL` to your {{nem_short_name}} deployment endpoint. See [CLI and SDK initialization](../../get-started/setup.md#setup-init) for the full convention. + +Set `NMP_BASE_URL` to your NeMo Evaluator deployment endpoint. See [CLI and SDK initialization](/get-started/setup#setup-init) for the full convention. + ## Dataset Requirements @@ -63,14 +69,16 @@ For **online evaluation**, your dataset contains inputs that will be sent to the ] ``` -!!! note - Upload your dataset to a fileset in any [supported format](../metrics/similarity.md). You can select specific files using a `#` fragment pattern (for example, `my-workspace/my-fileset#data.csv`). If no pattern is specified, all parsable files in the fileset are loaded. + +Upload your dataset to a fileset in any [supported format](/evaluation/metrics/similarity-metrics). You can select specific files using a `#` fragment pattern (for example, `my-workspace/my-fileset#data.csv`). If no pattern is specified, all parsable files in the fileset are loaded. + -!!! info - Ensure your dataset columns match both: + +Ensure your dataset columns match both: - 1. The input templates defined in your metrics (for example, `{%raw%}{{output}}{%endraw%}`, `{%raw%}{{reference}}{%endraw%}`) - 2. The `prompt_template` used in online evaluation jobs (for example, `{%raw%}{{question}}{%endraw%}`) +1. The input templates defined in your metrics (for example, `{%raw%}{{output}}{%endraw%}`, `{%raw%}{{reference}}{%endraw%}`) +2. The `prompt_template` used in online evaluation jobs (for example, `{%raw%}{{question}}{%endraw%}`) + For schema-aware validation, prefer canonical evaluator fields in metric prompts and online job prompts: @@ -80,7 +88,7 @@ For schema-aware validation, prefer canonical evaluator fields in metric prompts - add fileset metadata such as `dataset.schema` and `dataset.schemas_by_path` when you want benchmark creation and job submission to validate dataset compatibility before a run starts - remember that benchmark-level `field_mapping` applies to every metric in the benchmark -Legacy raw dataset variables such as `{% raw %}{{question}}{% endraw %}`, `{% raw %}{{item.question}}{% endraw %}`, or `{% raw %}{{item.response}}{% endraw %}` still work. In that case, prompt variable names are matched directly against dataset columns unless `field_mapping.custom` remaps them. +Legacy raw dataset variables such as `{{question}}`, `{{item.question}}`, or `{{item.response}}` still work. In that case, prompt variable names are matched directly against dataset columns unless `field_mapping.custom` remaps them. ## Create a Custom Benchmark @@ -117,7 +125,7 @@ benchmark = client.evaluation.benchmarks.create( print(f"Created benchmark: {benchmark.name}") ``` -Refer to [Manage Benchmarks](manage-benchmarks.md) for listing and managing custom benchmarks. +Refer to [Manage Benchmarks](/evaluation/benchmarks/manage-benchmarks) for listing and managing custom benchmarks. ## Run Benchmark Evaluation Jobs @@ -180,12 +188,12 @@ job = client.evaluation.benchmark_jobs.create( Online evaluation generates model responses at runtime, then evaluates them against your metrics. Use this to evaluate a model's live performance. -The `model` field accepts either an inline model definition or a model reference. Refer to [Model Configuration](../metrics/model-configuration.md) for details on both formats. +The `model` field accepts either an inline model definition or a model reference. Refer to [Model Configuration](/evaluation/metrics/model-configuration) for details on both formats. #### With Inline Model ```python -{% raw %} + from nemo_platform.types.evaluation import ( BenchmarkOnlineJobParam, ModelParam, @@ -202,13 +210,13 @@ job = client.evaluation.benchmark_jobs.create( prompt_template="Answer the following customer question:\n\n{{input}}", ), ) -{% endraw %} + ``` #### With Model Reference ```python -{% raw %} + from nemo_platform.types.evaluation import BenchmarkOnlineJobParam job = client.evaluation.benchmark_jobs.create( @@ -218,23 +226,24 @@ job = client.evaluation.benchmark_jobs.create( prompt_template="Answer the following customer question:\n\n{{input}}", ), ) -{% endraw %} + ``` ## Job Management -After successfully creating a job, navigate to [Benchmark Job Management](job-management.md) to oversee its execution and monitor progress. +After successfully creating a job, navigate to [Benchmark Job Management](/evaluation/benchmarks/job-management) to oversee its execution and monitor progress. ## Retrieve Results -After the job completes, retrieve and analyze results. Refer to [Benchmark Results](results.md) for detailed examples of downloading aggregate scores, row-level scores, and analyzing results with Pandas. +After the job completes, retrieve and analyze results. Refer to [Benchmark Results](/evaluation/benchmarks/benchmark-results) for detailed examples of downloading aggregate scores, row-level scores, and analyzing results with Pandas. ## Complete Example Here is a complete workflow for creating a benchmark and running an evaluation. -!!! note - This example assumes you have already created the metrics (`exact-match`, `f1-score`) and uploaded your dataset (`qa-test-data`). Refer to [Evaluation Metrics](../metrics/index.md) for how to create custom metrics. + +This example assumes you have already created the metrics (`exact-match`, `f1-score`) and uploaded your dataset (`qa-test-data`). Refer to [Evaluation Metrics](/evaluation/metrics/overview) for how to create custom metrics. + ```python import json @@ -330,4 +339,4 @@ if status.status == "completed": 3. **Validate dataset**: Ensure your dataset files are valid and columns/keys are consistent across rows 4. **Test metrics first**: Run individual metric evaluations before combining into a benchmark -For additional help, refer to [Troubleshooting](../../troubleshooting/evaluator.md). +For additional help, refer to [Troubleshooting](/reference/troubleshooting/evaluator). diff --git a/docs/evaluator/benchmarks/discover-industry-benchmarks.mdx b/docs/evaluator/benchmarks/discover-industry-benchmarks.mdx index 1154f54c40..4b6a028f35 100644 --- a/docs/evaluator/benchmarks/discover-industry-benchmarks.mdx +++ b/docs/evaluator/benchmarks/discover-industry-benchmarks.mdx @@ -1,12 +1,18 @@ +--- +title: "List benchmarks with pagination:" +description: "" +--- ### Discover Industry Benchmarks Discover industry benchmarks available to use for your evaluation job within the `system` workspace. List all industry benchmarks or filter by label category. -!!! note - The `system` workspace is a reserved workspace for {{platform_name}} that contains ready-to-use benchmarks representing industry benchmarks with published datasets and metrics. + +The `system` workspace is a reserved workspace for NeMo Platform that contains ready-to-use benchmarks representing industry benchmarks with published datasets and metrics. + -!!! note - **Initialization:** This example uses `NeMoPlatform()` with no arguments so the SDK reads your active CLI context (set by `nemo auth login`). The `workspace="system"` is passed per-call to access the reserved system workspace. For the standard local initialization pattern, see [CLI and SDK initialization](../../get-started/setup.md#setup-init). + +**Initialization:** This example uses `NeMoPlatform()` with no arguments so the SDK reads your active CLI context (set by `nemo auth login`). The `workspace="system"` is passed per-call to access the reserved system workspace. For the standard local initialization pattern, see [CLI and SDK initialization](/get-started/setup#setup-init). + ```python from nemo_platform import NeMoPlatform diff --git a/docs/evaluator/benchmarks/hf-secret.mdx b/docs/evaluator/benchmarks/hf-secret.mdx index 303fd97c9a..8af2c0b3da 100644 --- a/docs/evaluator/benchmarks/hf-secret.mdx +++ b/docs/evaluator/benchmarks/hf-secret.mdx @@ -1,3 +1,7 @@ +--- +title: "Untitled" +description: "" +--- Most benchmarks require a Hugging Face token (`hf_token`) to access gated datasets. Create this secret before running evaluations: ```python diff --git a/docs/evaluator/benchmarks/index.mdx b/docs/evaluator/benchmarks/index.mdx index 6d2fec9fe5..33bc8bcea6 100644 --- a/docs/evaluator/benchmarks/index.mdx +++ b/docs/evaluator/benchmarks/index.mdx @@ -1,3 +1,7 @@ +--- +title: "Evaluation Benchmarks" +description: "" +--- # Evaluation Benchmarks @@ -10,7 +14,7 @@ Use benchmarks when you want to: - Compare multiple model versions using the same scoring criteria and dataset - Package validated metrics with domain-specific test data for repeatable evaluation -{{platform_name}} provides two types of benchmarks: +NeMo Platform provides two types of benchmarks: - **Industry Benchmarks**: Industry-standard academic benchmarks such as MMLU, HumanEval, and GSM8K for comparing model capabilities against published baselines - **Custom Benchmarks**: User-defined evaluation suites that combine your choice of metrics with domain-specific datasets @@ -21,14 +25,63 @@ Custom benchmarks are valuable for domain-specific evaluation where standard ben | Type | Use Case | Dataset | Metrics | |------|----------|---------|---------| -| [**Industry Benchmarks**](industry.md) | Compare against published baselines, regression testing, model selection | Canonical datasets (fixed) | Standardized metrics | -| [**Custom Benchmarks**](custom.md) | Domain-specific evaluation, production monitoring, task-specific assessment | Your evaluation data | Your choice of metrics | +| [**Industry Benchmarks**](/evaluation/benchmarks/industry-benchmarks) | Compare against published baselines, regression testing, model selection | Canonical datasets (fixed) | Standardized metrics | +| [**Custom Benchmarks**](/evaluation/benchmarks/custom-benchmarks) | Domain-specific evaluation, production monitoring, task-specific assessment | Your evaluation data | Your choice of metrics | ---8<-- "evaluator/benchmarks/discover-industry-benchmarks.md" +### Discover Industry Benchmarks + +Discover industry benchmarks available to use for your evaluation job within the `system` workspace. List all industry benchmarks or filter by label category. + + +The `system` workspace is a reserved workspace for NeMo Platform that contains ready-to-use benchmarks representing industry benchmarks with published datasets and metrics. + + + +**Initialization:** This example uses `NeMoPlatform()` with no arguments so the SDK reads your active CLI context (set by `nemo auth login`). The `workspace="system"` is passed per-call to access the reserved system workspace. For the standard local initialization pattern, see [CLI and SDK initialization](/get-started/setup#setup-init). + + +```python +from nemo_platform import NeMoPlatform + +client = NeMoPlatform() + +benchmarks = client.evaluation.benchmarks.list(workspace="system") + +print(f"{benchmarks.pagination.total_results} benchmarks") +for benchmark in benchmarks: + print(f"{benchmark.name}: {benchmark.description}") + +# List benchmarks with pagination: +benchmarks = client.evaluation.benchmarks.list( + workspace="system", page=2, page_size=100 +) +for benchmark in benchmarks: + print(f"{benchmark.name}: {benchmark.description}") + +# Filter by evaluation category label +filtered_benchmarks = client.evaluation.benchmarks.list( + workspace="system", + extra_query={"filter[data.labels.eval_category]": "advanced_reasoning"}, +) +print(filtered_benchmarks) +``` + +| Category Label | Description | +|----------------|-------------| +| `agentic` | Evaluate the performance of agent-based or multi-step reasoning models, especially in scenarios requiring planning, tool use, and iterative reasoning. | +| `advanced_reasoning` | Evaluate reasoning capabilities of large language models through complex tasks. | +| `code` | Evaluate code generation capabilities using functional correctness benchmarks that test synthesis of working programs. | +| `content_safety` | Evaluate model safety risks including vulnerability to generate harmful, biased, or misleading content. | +| `instruction_following` | Evaluate the ability to follow explicit formatting and structural instructions | +| `language_understanding` | Evaluate knowledge and reasoning across diverse subjects in different languages. | +| `math` | Evaluate mathematical reasoning abilities. | +| `question_answering` | Evaluate the ability to generate answers to questions. | +| `rag` | Evaluate the quality of RAG pipelines by measuring both retrieval and answer generation performance. | +| `retrieval` | Evaluate the quality of document retriever pipelines. | ## Create Custom Benchmarks -Create a custom benchmark by combining metrics with your dataset. Before creating a benchmark, you will need to [create the metrics](../metrics/index.md) that define how to score your model's outputs. +Create a custom benchmark by combining metrics with your dataset. Before creating a benchmark, you will need to [create the metrics](/evaluation/metrics/overview) that define how to score your model's outputs. ```python benchmark = client.evaluation.benchmarks.create( @@ -43,7 +96,7 @@ benchmark = client.evaluation.benchmarks.create( ) ``` -Refer to [Manage Benchmarks](manage-benchmarks.md) for listing and managing custom benchmarks. +Refer to [Manage Benchmarks](/evaluation/benchmarks/manage-benchmarks) for listing and managing custom benchmarks. ## Run Benchmark Jobs @@ -91,17 +144,17 @@ print(f"Job created: {job.name}") List, retrieve, and delete evaluation benchmarks using the Python SDK. You can discover industry benchmarks in the `system` workspace, list custom benchmarks in your workspace, retrieve detailed benchmark configurations, and delete custom benchmarks when no longer needed. -Refer to [Manage Benchmarks](manage-benchmarks.md) for complete SDK examples including pagination, sorting, filtering, and extended response options. +Refer to [Manage Benchmarks](/evaluation/benchmarks/manage-benchmarks) for complete SDK examples including pagination, sorting, filtering, and extended response options. ## Job Management -After successfully creating a job, refer to [Benchmark Job Management](job-management.md) to oversee its execution and monitor progress. +After successfully creating a job, refer to [Benchmark Job Management](/evaluation/benchmarks/job-management) to oversee its execution and monitor progress. ## Benchmark Categories
-- **[Custom Benchmarks](custom.md)** +- **[Custom Benchmarks](/evaluation/benchmarks/custom-benchmarks)** --- @@ -109,7 +162,7 @@ After successfully creating a job, refer to [Benchmark Job Management](job-manag RAGAS BFCL -- **[Agentic Benchmarks](agentic.md)** +- **[Agentic Benchmarks](/evaluation/benchmarks/agentic-benchmarks)** --- @@ -117,7 +170,7 @@ After successfully creating a job, refer to [Benchmark Job Management](job-manag RAGAS BFCL -- **[Industry Benchmarks](industry.md)** +- **[Industry Benchmarks](/evaluation/benchmarks/industry-benchmarks)** --- diff --git a/docs/evaluator/benchmarks/industry.mdx b/docs/evaluator/benchmarks/industry.mdx index 7451ae8ef1..949a5ee902 100644 --- a/docs/evaluator/benchmarks/industry.mdx +++ b/docs/evaluator/benchmarks/industry.mdx @@ -1,19 +1,72 @@ +--- +title: "Industry Benchmarks" +description: "" +--- # Industry Benchmarks ## Evaluate with Published Datasets -{{platform_name}} provides a streamlined API to evaluate large language models with publicly available datasets, offering over 130 industry benchmarks to run with evaluation jobs. +NeMo Platform provides a streamlined API to evaluate large language models with publicly available datasets, offering over 130 industry benchmarks to run with evaluation jobs. Benchmarks provide standardized methods for comparing model performance across different capabilities. These benchmarks are widely used in the research community and provide reliable, reproducible metrics for model assessment. -Refer to the [Run an LLM Judge Evaluation](../tutorials/run-llm-judge-evaluation.md) tutorial for details on using evaluator jobs and result handling. +Refer to the [Run an LLM Judge Evaluation](/evaluation/tutorials/run-llm-as-a-judge-evaluation) tutorial for details on using evaluator jobs and result handling. - **Standard Datasets**: Most benchmarks include predefined datasets widely used in research. - **Reproducible Metrics**: Use established methodologies to calculate metrics. - **Community Standards**: You can compare results across different models and research groups. ---8<-- "evaluator/benchmarks/discover-industry-benchmarks.md" +### Discover Industry Benchmarks + +Discover industry benchmarks available to use for your evaluation job within the `system` workspace. List all industry benchmarks or filter by label category. + + +The `system` workspace is a reserved workspace for NeMo Platform that contains ready-to-use benchmarks representing industry benchmarks with published datasets and metrics. + + + +**Initialization:** This example uses `NeMoPlatform()` with no arguments so the SDK reads your active CLI context (set by `nemo auth login`). The `workspace="system"` is passed per-call to access the reserved system workspace. For the standard local initialization pattern, see [CLI and SDK initialization](/get-started/setup#setup-init). + + +```python +from nemo_platform import NeMoPlatform + +client = NeMoPlatform() + +benchmarks = client.evaluation.benchmarks.list(workspace="system") + +print(f"{benchmarks.pagination.total_results} benchmarks") +for benchmark in benchmarks: + print(f"{benchmark.name}: {benchmark.description}") + +# List benchmarks with pagination: +benchmarks = client.evaluation.benchmarks.list( + workspace="system", page=2, page_size=100 +) +for benchmark in benchmarks: + print(f"{benchmark.name}: {benchmark.description}") + +# Filter by evaluation category label +filtered_benchmarks = client.evaluation.benchmarks.list( + workspace="system", + extra_query={"filter[data.labels.eval_category]": "advanced_reasoning"}, +) +print(filtered_benchmarks) +``` + +| Category Label | Description | +|----------------|-------------| +| `agentic` | Evaluate the performance of agent-based or multi-step reasoning models, especially in scenarios requiring planning, tool use, and iterative reasoning. | +| `advanced_reasoning` | Evaluate reasoning capabilities of large language models through complex tasks. | +| `code` | Evaluate code generation capabilities using functional correctness benchmarks that test synthesis of working programs. | +| `content_safety` | Evaluate model safety risks including vulnerability to generate harmful, biased, or misleading content. | +| `instruction_following` | Evaluate the ability to follow explicit formatting and structural instructions | +| `language_understanding` | Evaluate knowledge and reasoning across diverse subjects in different languages. | +| `math` | Evaluate mathematical reasoning abilities. | +| `question_answering` | Evaluate the ability to generate answers to questions. | +| `rag` | Evaluate the quality of RAG pipelines by measuring both retrieval and answer generation performance. | +| `retrieval` | Evaluate the quality of document retriever pipelines. | ### Choosing a Benchmark Variant @@ -35,8 +88,9 @@ Many benchmarks offer multiple variants optimized for different model types: | `hf_token` | Reference to a secret containing your Hugging Face token for accessing gated datasets. | | `tokenizer` | Hugging Face tokenizer ID, required for completions-based benchmarks. | -!!! tip - The `model` field in all benchmark examples below accepts either an inline model definition or a model reference string (for example, `"my-workspace/my-model"`). Refer to [Model Configuration](../metrics/model-configuration.md) for details. + +The `model` field in all benchmark examples below accepts either an inline model definition or a model reference string (for example, `"my-workspace/my-model"`). Refer to [Model Configuration](/evaluation/metrics/model-configuration) for details. + ## Advanced Reasoning @@ -78,177 +132,180 @@ client = NeMoPlatform( ) ``` -=== "GPQA Diamond" + + +```python +job = client.evaluation.benchmark_jobs.create( + description="GPQA Diamond evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gpqa-diamond", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="GPQA Extended evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gpqa-extended", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="GPQA Main evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gpqa-main", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="GPQA Diamond evaluation with NeMo template", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gpqa-diamond-nemo", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="GPQA Diamond chain-of-thought evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gpqa-diamond-cot", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +Requires a `/v1/completions` endpoint. - ```python - job = client.evaluation.benchmark_jobs.create( - description="GPQA Diamond evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gpqa-diamond", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "GPQA Extended" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="GPQA Extended evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gpqa-extended", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "GPQA Main" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="GPQA Main evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gpqa-main", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "GPQA Diamond (NeMo)" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="GPQA Diamond evaluation with NeMo template", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gpqa-diamond-nemo", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "GPQA Diamond CoT" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="GPQA Diamond chain-of-thought evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gpqa-diamond-cot", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "GPQA (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="GPQA few-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gpqa", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "tokenizer": "", - }, - ), - ) - ``` - -=== "BBH Instruct" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="BIG-Bench Hard evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/bbh-instruct", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "BBH (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="BIG-Bench Hard evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/bbh", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "tokenizer": "", - }, - ), - ) - ``` - -=== "MuSR (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="Multistep Soft Reasoning evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/musr", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "tokenizer": "", - }, - ), - ) - ``` +```python +job = client.evaluation.benchmark_jobs.create( + description="GPQA few-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gpqa", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "tokenizer": "", + }, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="BIG-Bench Hard evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/bbh-instruct", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +Requires a `/v1/completions` endpoint. -!!! note +```python +job = client.evaluation.benchmark_jobs.create( + description="BIG-Bench Hard evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/bbh", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "tokenizer": "", + }, + ), +) +``` + + +Requires a `/v1/completions` endpoint. - --8<-- "evaluator/benchmarks/hf-secret.md" +```python +job = client.evaluation.benchmark_jobs.create( + description="Multistep Soft Reasoning evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/musr", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "tokenizer": "", + }, + ), +) +``` + + + +Most benchmarks require a Hugging Face token (`hf_token`) to access gated datasets. Create this secret before running evaluations: + +```python +import os + +client.secrets.create( + workspace=workspace, + name="hf_token", + value=os.getenv("HF_TOKEN", ""), +) +``` + ### Results @@ -269,7 +326,7 @@ for score in aggregate.scores: print(f"{score.name}: {score.mean:.1%}") ``` -For detailed results analysis, refer to [eval-benchmark-results](results.md). +For detailed results analysis, refer to [eval-benchmark-results](/evaluation/benchmarks/benchmark-results). ## Instruction Following @@ -327,7 +384,7 @@ for score in aggregate.scores: print(f"{score.name}: {score.mean:.1%}") ``` -For detailed results analysis, refer to [eval-benchmark-results](results.md). +For detailed results analysis, refer to [eval-benchmark-results](/evaluation/benchmarks/benchmark-results). ## Language Understanding @@ -368,135 +425,130 @@ client = NeMoPlatform( ) ``` -=== "MMLU Instruct" + + +```python +job = client.evaluation.benchmark_jobs.create( + description="MMLU zero-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mmlu-instruct", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +Requires a `/v1/completions` endpoint. - ```python - job = client.evaluation.benchmark_jobs.create( - description="MMLU zero-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mmlu-instruct", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "MMLU (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MMLU few-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mmlu", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "tokenizer": "", - }, - ), - ) - ``` - -=== "MMLU-Pro Instruct" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MMLU-Pro zero-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mmlu-pro-instruct", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "MMLU-Pro (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MMLU-Pro few-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mmlu-pro", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "tokenizer": "", - }, - ), - ) - ``` - -=== "MMLU-Redux Instruct" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MMLU-Redux zero-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mmlu-redux-instruct", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "MMLU Spanish" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="Global-MMLU Spanish evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mmlu-es", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "WikiLingua" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="WikiLingua cross-lingual summarization", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/wikilingua", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` +```python +job = client.evaluation.benchmark_jobs.create( + description="MMLU few-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mmlu", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "tokenizer": "", + }, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="MMLU-Pro zero-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mmlu-pro-instruct", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +Requires a `/v1/completions` endpoint. +```python +job = client.evaluation.benchmark_jobs.create( + description="MMLU-Pro few-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mmlu-pro", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "tokenizer": "", + }, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="MMLU-Redux zero-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mmlu-redux-instruct", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="Global-MMLU Spanish evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mmlu-es", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="WikiLingua cross-lingual summarization", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/wikilingua", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + ### Results Language understanding benchmarks produce accuracy scores: @@ -512,7 +564,7 @@ for score in aggregate.scores: print(f"{score.name}: {score.mean:.1%}") ``` -For detailed results analysis, refer to [eval-benchmark-results](results.md). +For detailed results analysis, refer to [eval-benchmark-results](/evaluation/benchmarks/benchmark-results). ## Math & Reasoning @@ -541,8 +593,9 @@ Evaluate mathematical reasoning abilities from grade school arithmetic to compet **²** Judge required: Requires a judge model to evaluate free-form math responses. -!!! info - For math benchmarks requiring a judge, use a model with strong instruction-following capabilities (70B+ parameters recommended). Smaller models may produce malformed judge outputs. + +For math benchmarks requiring a judge, use a model with strong instruction-following capabilities (70B+ parameters recommended). Smaller models may produce malformed judge outputs. + ### Examples @@ -560,124 +613,120 @@ client = NeMoPlatform( ) ``` -=== "GSM8K CoT Instruct" + + +```python +job = client.evaluation.benchmark_jobs.create( + description="GSM8K chain-of-thought evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gsm8k-cot-instruct", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +Requires a `/v1/completions` endpoint. - ```python - job = client.evaluation.benchmark_jobs.create( - description="GSM8K chain-of-thought evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gsm8k-cot-instruct", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "GSM8K (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="GSM8K few-shot evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/gsm8k", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "tokenizer": "", - }, - ), - ) - ``` - -=== "MGSM CoT" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MGSM multilingual math evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mgsm-cot", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={"hf_token": "hf_token"}, - ), - ) - ``` - -=== "AIME 2025 (NeMo)" - - No judge required with NeMo template. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="AIME 2025 competition math", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/aime-2025-nemo", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "AIME 2025 (Judge)" - - Requires a judge model. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="AIME 2025 competition math with judge", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/aime-2025", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "judge": { - "model": { - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - } +```python +job = client.evaluation.benchmark_jobs.create( + description="GSM8K few-shot evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/gsm8k", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "tokenizer": "", + }, + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="MGSM multilingual math evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mgsm-cot", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={"hf_token": "hf_token"}, + ), +) +``` + + +No judge required with NeMo template. + +```python +job = client.evaluation.benchmark_jobs.create( + description="AIME 2025 competition math", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/aime-2025-nemo", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +Requires a judge model. + +```python +job = client.evaluation.benchmark_jobs.create( + description="AIME 2025 competition math with judge", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/aime-2025", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "judge": { + "model": { + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", } - }, - ), - ) - ``` - -=== "MATH Test 500 (NeMo)" - - No judge required with NeMo template. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MATH test set evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/math-test-500-nemo", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` + } + }, + ), +) +``` + + +No judge required with NeMo template. +```python +job = client.evaluation.benchmark_jobs.create( + description="MATH test set evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/math-test-500-nemo", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + ### Results Math benchmarks produce accuracy scores: @@ -693,7 +742,7 @@ for score in aggregate.scores: print(f"{score.name}: {score.mean:.1%}") ``` -For detailed results analysis, refer to [eval-benchmark-results](results.md). +For detailed results analysis, refer to [eval-benchmark-results](/evaluation/benchmarks/benchmark-results). ## Content Safety @@ -709,14 +758,14 @@ Evaluate model safety risks including vulnerability to generate harmful, biased, | `system/aegis-v2` | [AEGIS 2.0](https://aclanthology.org/2025.naacl-long.306/)—12 hazard categories using Nemotron Safety Guard | `hf_token`, `judge` | | `system/wildguard` | [WildGuard](https://arxiv.org/abs/2406.18495)—privacy, misinformation, harmful language | `hf_token`, `judge` | -!!! info - Safety benchmarks require specific judge models deployed with a `/v1/completions` endpoint (not chat/completions): - - | Benchmark | Required Judge Model | Model ID | - |-----------|---------------------|----------| - | `system/aegis-v2` | [Llama Nemotron Safety Guard V2](https://huggingface.co/nvidia/llama-3.1-nemoguard-8b-content-safety) | `nvidia/llama-3.1-nemoguard-8b-content-safety` | - | `system/wildguard` | [WildGuard](https://huggingface.co/allenai/wildguard) | `allenai/wildguard` | + +Safety benchmarks require specific judge models deployed with a `/v1/completions` endpoint (not chat/completions): +| Benchmark | Required Judge Model | Model ID | +|-----------|---------------------|----------| +| `system/aegis-v2` | [Llama Nemotron Safety Guard V2](https://huggingface.co/nvidia/llama-3.1-nemoguard-8b-content-safety) | `nvidia/llama-3.1-nemoguard-8b-content-safety` | +| `system/wildguard` | [WildGuard](https://huggingface.co/allenai/wildguard) | `allenai/wildguard` | + The benchmarks can take up to 1-3 hours. @@ -736,60 +785,60 @@ client = NeMoPlatform( ) ``` -=== "AEGIS-v2" + + +Requires Llama Nemotron Safety Guard V2 judge. - Requires Llama Nemotron Safety Guard V2 judge. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="AEGIS-v2 content safety evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/aegis-v2", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "judge": { - "model": { - "url": "/v1/completions", - "name": "nvidia/llama-3.1-nemoguard-8b-content-safety", - } - }, - }, - ), - ) - ``` - -=== "WildGuard" - - Requires WildGuard judge. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="WildGuard content safety evaluation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/wildguard", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - benchmark_params={ - "hf_token": "hf_token", - "judge": { - "model": { - "url": "/v1/completions", - "name": "allenai/wildguard", - } - }, +```python +job = client.evaluation.benchmark_jobs.create( + description="AEGIS-v2 content safety evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/aegis-v2", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "judge": { + "model": { + "url": "/v1/completions", + "name": "nvidia/llama-3.1-nemoguard-8b-content-safety", + } }, - ), - ) - ``` + }, + ), +) +``` + + +Requires WildGuard judge. +```python +job = client.evaluation.benchmark_jobs.create( + description="WildGuard content safety evaluation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/wildguard", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + benchmark_params={ + "hf_token": "hf_token", + "judge": { + "model": { + "url": "/v1/completions", + "name": "allenai/wildguard", + } + }, + }, + ), +) +``` + + ### Results Safety benchmarks produce category-level safety rates: @@ -805,11 +854,11 @@ for score in aggregate.scores: print(f"{score.name}: {score.mean:.1%}") ``` -For detailed results analysis, refer to [eval-benchmark-results](results.md). +For detailed results analysis, refer to [eval-benchmark-results](/evaluation/benchmarks/benchmark-results). ### Troubleshooting Content Safety Benchmarks -Refer to [Evaluator Troubleshooting](../../troubleshooting/evaluator.md) for general troubleshooting steps for failed evaluation jobs. +Refer to [Evaluator Troubleshooting](/reference/troubleshooting/evaluator) for general troubleshooting steps for failed evaluation jobs.
This section covers common issues for the safety harness. @@ -865,7 +914,7 @@ Safety evaluations do not support reasoning traces and may result in the job err ERROR There are at least 2 MUT (model under test) responses that start with . Reasoning traces should not be evaluated. Exiting. ``` -If the model or judge outputs reasoning traces like `reasoning contextanswer`, configure the job to only include the the final answer after the reasoning token (e.g. ``) with `reasoning.end_token`. Consider specifying `inference.max_tokens` to a reasonable limit for the model's chain of thought to conclude with the expected reasoning end token in order for the reasoning context to be properly omitted for evaluation. +If the model or judge outputs reasoning traces like `<think>reasoning context</think>answer`, configure the job to only include the the final answer after the reasoning token (e.g. `</think>`) with `reasoning.end_token`. Consider specifying `inference.max_tokens` to a reasonable limit for the model's chain of thought to conclude with the expected reasoning end token in order for the reasoning context to be properly omitted for evaluation. Additionally, if you are encountering this error, it could be caused by the model exceeding its token limit resulting in the full response being consumed by the model thinking. These results can be dropped by setting the `reasoning.include_if_not_finished` parameter. @@ -927,132 +976,127 @@ client = NeMoPlatform( ) ``` -=== "HumanEval Instruct" + + +```python +job = client.evaluation.benchmark_jobs.create( + description="HumanEval code generation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/humaneval-instruct", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +Requires a `/v1/completions` endpoint. + +```python +job = client.evaluation.benchmark_jobs.create( + description="HumanEval code generation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/humaneval", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +Extended test suite with 80x more test cases. - ```python - job = client.evaluation.benchmark_jobs.create( - description="HumanEval code generation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/humaneval-instruct", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "HumanEval (Completions)" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="HumanEval code generation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/humaneval", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "HumanEval+ (Completions)" - - Extended test suite with 80x more test cases. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="HumanEval+ code generation", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/humanevalplus", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "MBPP+ (NeMo)" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MBPP+ Python programming", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mbppplus-nemo", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "MBPP+" - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MBPP+ Python programming", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/mbppplus", - model={ - "url": "/v1", - "name": "nvidia/llama-3.3-nemotron-super-49b-v1", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "MultiPL-E JavaScript" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MultiPL-E JavaScript", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/multiple-js", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` - -=== "MultiPL-E Rust" - - Requires a `/v1/completions` endpoint. - - ```python - job = client.evaluation.benchmark_jobs.create( - description="MultiPL-E Rust", - spec=SystemBenchmarkOnlineJobParam( - benchmark="system/multiple-rs", - model={ - "url": "/v1/completions", - "name": "", - }, - params=RunConfig(parallelism=16), - ), - ) - ``` +```python +job = client.evaluation.benchmark_jobs.create( + description="HumanEval+ code generation", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/humanevalplus", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="MBPP+ Python programming", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mbppplus-nemo", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +```python +job = client.evaluation.benchmark_jobs.create( + description="MBPP+ Python programming", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/mbppplus", + model={ + "url": "/v1", + "name": "nvidia/llama-3.3-nemotron-super-49b-v1", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +Requires a `/v1/completions` endpoint. +```python +job = client.evaluation.benchmark_jobs.create( + description="MultiPL-E JavaScript", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/multiple-js", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + +Requires a `/v1/completions` endpoint. + +```python +job = client.evaluation.benchmark_jobs.create( + description="MultiPL-E Rust", + spec=SystemBenchmarkOnlineJobParam( + benchmark="system/multiple-rs", + model={ + "url": "/v1/completions", + "name": "", + }, + params=RunConfig(parallelism=16), + ), +) +``` + + ### Results Code benchmarks produce pass@k metrics measuring functional correctness: - **pass@1**: Percentage of problems solved with one attempt -- **pass@10**: Percentage of problems solved within 10 attempts (when `n_samples` > 1) +- **pass@10**: Percentage of problems solved within 10 attempts (when `n_samples` > 1) ```python aggregate = client.evaluation.benchmark_jobs.results.aggregate_scores.download( @@ -1062,10 +1106,11 @@ for score in aggregate.scores: print(f"{score.name}: {score.mean:.1%}") ``` -For detailed results analysis, refer to [eval-benchmark-results](results.md). +For detailed results analysis, refer to [eval-benchmark-results](/evaluation/benchmarks/benchmark-results). -!!! tip - **Want to experiment first?** You can try these benchmarks using the [open-source NeMo Evaluator SDK](https://github.com/NVIDIA-NeMo/evaluator) before deploying the platform. The SDK provides a lightweight way to test evaluation workflows locally. + +**Want to experiment first?** You can try these benchmarks using the [open-source NeMo Evaluator SDK](https://github.com/NVIDIA-NeMo/evaluator) before deploying the platform. The SDK provides a lightweight way to test evaluation workflows locally. + ## Run Benchmark Job @@ -1079,10 +1124,21 @@ client.workspaces.create(name=workspace) Create an evaluation job with a benchmark that satisfy the required and optional parameters. -!!! note - For benchmarks that require a Hugging Face token or other API keys for external services, create the secret to be referenced by the job. + +For benchmarks that require a Hugging Face token or other API keys for external services, create the secret to be referenced by the job. - --8<-- "evaluator/benchmarks/hf-secret.md" +Most benchmarks require a Hugging Face token (`hf_token`) to access gated datasets. Create this secret before running evaluations: + +```python +import os + +client.secrets.create( + workspace=workspace, + name="hf_token", + value=os.getenv("HF_TOKEN", ""), +) +``` + ```python from nemo_platform.types.evaluation import ( @@ -1106,4 +1162,4 @@ job = client.evaluation.benchmark_jobs.create( ## Job Management -After creating a job, navigate to [Benchmark Job Management](job-management.md) to oversee its execution and monitor progress. +After creating a job, navigate to [Benchmark Job Management](/evaluation/benchmarks/job-management) to oversee its execution and monitor progress. diff --git a/docs/evaluator/benchmarks/job-management.mdx b/docs/evaluator/benchmarks/job-management.mdx index 3c8a18905a..fc2af751e2 100644 --- a/docs/evaluator/benchmarks/job-management.mdx +++ b/docs/evaluator/benchmarks/job-management.mdx @@ -1,10 +1,15 @@ +--- +title: "Benchmark Job Management" +description: "" +--- # Benchmark Job Management Manage your evaluation job. -!!! note - **Performance Tuning**: You can improve evaluation performance by setting `job.params.parallelism` to control the number of concurrent requests. A typical default value is 16, but you might need to adjust based on your model capacity and rate limits. + +**Performance Tuning**: You can improve evaluation performance by setting `job.params.parallelism` to control the number of concurrent requests. A typical default value is 16, but you might need to adjust based on your model capacity and rate limits. + ## Monitor Job @@ -19,7 +24,7 @@ while job_status.status in ("active", "pending", "created"): print(job_status) ``` -Refer to [Evaluator Troubleshooting](../../troubleshooting/evaluator.md) for help troubleshooting job failures. +Refer to [Evaluator Troubleshooting](/reference/troubleshooting/evaluator) for help troubleshooting job failures. ## Fetch Job Logs @@ -42,7 +47,7 @@ while logs_response.next_page: ## View Evaluation Results -Evaluation results are available after the evaluation job completes. Refer to [Benchmark Results](results.md) for details on fetching evaluation results to analyze the job output. +Evaluation results are available after the evaluation job completes. Refer to [Benchmark Results](/evaluation/benchmarks/benchmark-results) for details on fetching evaluation results to analyze the job output. ### Aggregate Scores diff --git a/docs/evaluator/benchmarks/manage-benchmarks.mdx b/docs/evaluator/benchmarks/manage-benchmarks.mdx index 6a884839d7..d099a2ff1f 100644 --- a/docs/evaluator/benchmarks/manage-benchmarks.mdx +++ b/docs/evaluator/benchmarks/manage-benchmarks.mdx @@ -1,7 +1,11 @@ +--- +title: "Manage Benchmarks" +description: "" +--- # Manage Benchmarks -List, retrieve, and delete evaluation benchmarks using the {{platform_name}} Python SDK. +List, retrieve, and delete evaluation benchmarks using the NeMo Platform Python SDK. ```python import os @@ -130,8 +134,9 @@ benchmarks = client.evaluation.benchmarks.list( Delete a custom evaluation benchmark. Industry benchmarks in the `system` workspace cannot be deleted. -!!! warning - Deleting a benchmark is permanent and cannot be undone. Ensure the benchmark is not being used by any active evaluations before deletion. + +Deleting a benchmark is permanent and cannot be undone. Ensure the benchmark is not being used by any active evaluations before deletion. + ```python client.evaluation.benchmarks.delete(name="my-custom-benchmark") @@ -140,6 +145,6 @@ client.evaluation.benchmarks.delete(name="my-custom-benchmark") ## Related Topics -- [Custom Benchmarks](custom.md) - Create custom benchmarks -- [Benchmark Job Management](job-management.md) - Manage benchmark evaluation jobs -- [Benchmark Results](results.md) - View and download benchmark evaluation results +- [Custom Benchmarks](/evaluation/benchmarks/custom-benchmarks) - Create custom benchmarks +- [Benchmark Job Management](/evaluation/benchmarks/job-management) - Manage benchmark evaluation jobs +- [Benchmark Results](/evaluation/benchmarks/benchmark-results) - View and download benchmark evaluation results diff --git a/docs/evaluator/benchmarks/results.mdx b/docs/evaluator/benchmarks/results.mdx index 2344afdca6..a79e5171a6 100644 --- a/docs/evaluator/benchmarks/results.mdx +++ b/docs/evaluator/benchmarks/results.mdx @@ -1,3 +1,7 @@ +--- +title: "Benchmark Results" +description: "" +--- # Benchmark Results diff --git a/docs/evaluator/index.mdx b/docs/evaluator/index.mdx index e30c9c9575..12adfaf7d7 100644 --- a/docs/evaluator/index.mdx +++ b/docs/evaluator/index.mdx @@ -1,12 +1,16 @@ +--- +title: "About" +description: "" +--- # About Evaluating -Evaluation is powered by {{platform_name}}, a cloud-native platform for evaluating large language models (LLMs), RAG pipelines, and AI agents at enterprise scale. The evaluation API provides automated workflows for over 100 industry benchmarks, LLM-as-a-judge scoring, and specialized metrics for RAG and agent systems. +Evaluation is powered by NeMo Platform, a cloud-native platform for evaluating large language models (LLMs), RAG pipelines, and AI agents at enterprise scale. The evaluation API provides automated workflows for over 100 industry benchmarks, LLM-as-a-judge scoring, and specialized metrics for RAG and agent systems. -{{platform_name}} enables real-time evaluations of your LLM application through APIs, guiding you in refining and optimizing LLMs for enhanced performance and real-world applicability. The {{nem_short_name}} APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data. It is cost-effective and suitable for pre-deployment checks and regression testing. +NeMo Platform enables real-time evaluations of your LLM application through APIs, guiding you in refining and optimizing LLMs for enhanced performance and real-world applicability. The NeMo Evaluator APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data. It is cost-effective and suitable for pre-deployment checks and regression testing. -[**Tutorials**](tutorials/index.md){ .md-button } -[**Open Source SDK**](https://github.com/NVIDIA-NeMo/evaluator){ .md-button } +[**Tutorials**](/evaluation/tutorials/overview) +[**Open Source SDK**](https://github.com/NVIDIA-NeMo/evaluator) --- @@ -14,15 +18,16 @@ Evaluation is powered by {{platform_name}}, a cloud-native platform for evaluati Evaluator separates **evaluation definition** from **execution**. -!!! note - The code snippets below are for conceptual demonstration purposes only. - For runnable examples see the [tutorials](tutorials/index.md) and [SDK resources](sdk-resources.md). + +The code snippets below are for conceptual demonstration purposes only. +For runnable examples see the [tutorials](/evaluation/tutorials/overview) and [SDK resources](/evaluation/sdk-resources). + ### 1. Build RunConfig with the Library Use the `nemo_evaluator_sdk` package to define your metric, dataset rows, runtime configuration, and optional model or agent target: -{% raw %} + ```python from nemo_evaluator_sdk import RunConfig, ExactMatchMetric @@ -42,14 +47,14 @@ dataset = [ config = RunConfig(limit_samples=100, parallelism=8) ``` -{% endraw %} + **The library handles:** Metric definitions, dataset row schemas, prompt templates, model and agent targets, runtime parameters, retries, aggregation, and typed result objects. ### 2. Execute on the Platform -Submit your evaluation to the Evaluator service using the {{platform_name}} SDK: +Submit your evaluation to the Evaluator service using the NeMo Platform SDK: ```python from nemo_evaluator.sdk import Evaluator @@ -68,25 +73,25 @@ job.wait_until_done() result = job.get_result() ``` -**The platform handles:** Job orchestration, inference routing through {{platform_name}}'s Inference Gateway, Fileset-based datasets, distributed execution, artifact storage, status monitoring, and result download. +**The platform handles:** Job orchestration, inference routing through NeMo Platform's Inference Gateway, Fileset-based datasets, distributed execution, artifact storage, status monitoring, and result download. ## Key Differences from Standalone Library -When using Evaluator as a {{platform_name}} plugin : +When using Evaluator as a NeMo Platform plugin : -| Feature | Standalone Library | {{platform_name}} Plugin | +| Feature | Standalone Library | NeMo Platform Plugin | |---------|-------------------|-------------| | **Execution** | Local Python process | Local plugin runs for local experimentation and durable platform jobs for production | -| **Inference** | Direct model or agent endpoint calls | The same as standalone and can also route through {{platform_name}} Inference Gateway and platform-managed endpoints | -| **Datasets** | Inline rows and local files | Inline rows, local paths resolved at submission time, and {{platform_name}} [Filesets](../get-started/concepts/manage-files.md) | -| **Results Artifacts** | Results stored in memory | {{platform_name}} artifact storage with typed result download | -| **Authentication** | Local environment variables | Local environment variables for local runs and {{platform_name}} Secrets service for remote jobs | +| **Inference** | Direct model or agent endpoint calls | The same as standalone and can also route through NeMo Platform Inference Gateway and platform-managed endpoints | +| **Datasets** | Inline rows and local files | Inline rows, local paths resolved at submission time, and NeMo Platform [Filesets](/get-started/core-concepts/manage-files) | +| **Results Artifacts** | Results stored in memory | NeMo Platform artifact storage with typed result download | +| **Authentication** | Local environment variables | Local environment variables for local runs and NeMo Platform Secrets service for remote jobs | --- ## Evaluation Concepts -{{platform_name}} supports two core evaluation primitives: +NeMo Platform supports two core evaluation primitives: - **Metrics**: Scoring logic that evaluates model outputs. Use metrics when you need flexible, reusable scoring for your own datasets and task-specific criteria. @@ -98,17 +103,17 @@ There are two execution modes and two evaluation patterns: - **Offline evaluation**: Score existing dataset rows (for example, model outputs already generated). - **Online evaluation**: Generate outputs from a model as part of evaluation, then score them. -For deeper details, see [Evaluation Metrics](metrics/index.md). +For deeper details, see [Evaluation Metrics](/evaluation/metrics/overview). --- ## Tutorials -After [setting up a local instance of the platform](../get-started/setup.md), use the following tutorials to learn how to accomplish common evaluation tasks. These step-by-step guides help you evaluate models using different benchmarks and metrics. +After [setting up a local instance of the platform](/get-started/setup), use the following tutorials to learn how to accomplish common evaluation tasks. These step-by-step guides help you evaluate models using different benchmarks and metrics.
-- **[Run an LLM Judge Eval](tutorials/run-llm-judge-evaluation.md)** +- **[Run an LLM Judge Eval](/evaluation/tutorials/run-llm-as-a-judge-evaluation)** --- @@ -125,12 +130,12 @@ After [setting up a local instance of the platform](../get-started/setup.md), us Most teams get the best results by starting metric-first, then moving to benchmarks: 1. **Develop and validate your metrics first** - - Start with [Metrics](metrics/index.md) to define how quality should be scored for your use case. + - Start with [Metrics](/evaluation/metrics/overview) to define how quality should be scored for your use case. - Use live evaluation (`POST /v2/workspaces/{workspace}/evaluation/metric-evaluate`) with small `DatasetRows` payloads to iterate quickly. 1. **Scale metric evaluation to jobs** - When metrics are validated, run async metric jobs (`/evaluation/metric-jobs`) on larger datasets. - - Use filesets for production-scale inputs. See [Manage Files](../get-started/concepts/manage-files.md). + - Use filesets for production-scale inputs. See [Manage Files](/get-started/core-concepts/manage-files). 1. **Monitor and analyze results** - Track job status and progress with job management APIs. @@ -140,8 +145,8 @@ Most teams get the best results by starting metric-first, then moving to benchma ## Where to Go Next -- For metric workflows, see [Metric Jobs](metrics/index.md) and [Metric Results](metrics/results.md). -- For full endpoint details, see the [Evaluator API Reference](../api/index.md#tag-evaluator). +- For metric workflows, see [Metric Jobs](/evaluation/metrics/overview) and [Metric Results](/evaluation/metrics/metric-results). +- For full endpoint details, see the [Evaluator API Reference](/reference/api-reference#tag-evaluator). --- diff --git a/docs/evaluator/metrics/agent-configuration.mdx b/docs/evaluator/metrics/agent-configuration.mdx index 0c00f565e4..b94c33e0f8 100644 --- a/docs/evaluator/metrics/agent-configuration.mdx +++ b/docs/evaluator/metrics/agent-configuration.mdx @@ -1,3 +1,7 @@ +--- +title: "Agent Configuration" +description: "" +--- # Agent Configuration @@ -34,7 +38,7 @@ evaluator: Evaluator = client.evaluator # this object is an Evaluator resource If your agent endpoint requires authentication, configure `api_key_secret` on the `Agent`. -For local `evaluator.run(...)` calls, `api_key_secret` must name an environment variable available to the local Python process. For remote `evaluator.submit(...)` jobs, it must name a NeMo platform secret in the target workspace. See [Model API Authentication](model-configuration.md#model-api-authentication) for the local-versus-remote behavior. +For local `evaluator.run(...)` calls, `api_key_secret` must name an environment variable available to the local Python process. For remote `evaluator.submit(...)` jobs, it must name a NeMo platform secret in the target workspace. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication) for the local-versus-remote behavior. For remote `evaluator.submit(...)` jobs, create the secret in the platform workspace before submitting the job: @@ -63,14 +67,14 @@ You control the request shape with `body` and extract values from the response w | `url` | Yes | string | Base URL of the agent endpoint. | | `name` | Yes | string | Agent name or identifier. | | `format` | No | string | `generic` (default) or `nemo_agent_toolkit`. | -| `api_key_secret` | No | string | API key reference. See [Model API Authentication](model-configuration.md#model-api-authentication). | +| `api_key_secret` | No | string | API key reference. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). | | `body` | Yes | dict | Jinja template for the request payload. Use `{%raw%}{{ prompt }}{%endraw%}`, `{%raw%}{{ messages }}{%endraw%}`, or fields from the rendered prompt context. | | `response_path` | Yes | string | [JSONPath](https://datatracker.ietf.org/doc/html/rfc9535) expression to extract the response text. | | `trajectory_path` | No | string | JSONPath expression to extract the trajectory. | ### Run a Generic Agent Evaluation -{% raw %} + ```python from nemo_evaluator_sdk import Agent, RunConfigOnline @@ -99,7 +103,7 @@ for score in result.aggregate_scores.scores: print(f"{score.name}: mean={score.mean}") ``` -{% endraw %} + Use `evaluator.submit(...)` with the same argument shape when you want a durable remote job, but set `api_key_secret` to a platform secret name for the target workspace. @@ -138,7 +142,7 @@ async def invoke(request: AgentRequest) -> AgentResponse: Use the `nemo_agent_toolkit` format when evaluating agents built with the [NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/latest/index.html). This format uses the NAT streaming protocol: -1. Sends a POST to `{url}/generate/full?filter_steps=none` with `{"input_message": ""}`. +1. Sends a POST to `{url}/generate/full?filter_steps=none` with `{"input_message": "<text>"}`. 2. Reads the SSE (Server-Sent Events) stream. 3. Extracts the final `value` from the last SSE `data:` chunk. 4. Returns it as the agent response. @@ -150,11 +154,11 @@ Use the `nemo_agent_toolkit` format when evaluating agents built with the [NeMo | `url` | Yes | string | Base URL of the agent endpoint. | | `name` | Yes | string | Agent name or identifier. | | `format` | Yes | string | Set to `nemo_agent_toolkit`. | -| `api_key_secret` | No | string | API key reference. See [Model API Authentication](model-configuration.md#model-api-authentication). | +| `api_key_secret` | No | string | API key reference. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). | ### Run a NAT Agent Evaluation -{% raw %} + ```python from nemo_evaluator_sdk import Agent, RunConfigOnline @@ -186,7 +190,7 @@ job.wait_until_done() result = job.get_result() ``` -{% endraw %} + ## Model vs Agent: When to Use Which @@ -198,12 +202,11 @@ result = job.get_result() | Evaluate a custom HTTP endpoint with non-standard response format | | x | | Use a standard chat completions API | x | | -!!! info - Online evaluations accept **either** a model **or** an agent as the request target, never both. +Online evaluations accept **either** a model **or** an agent as the request target, never both. ## Related -- [Model Configuration](model-configuration.md) - Inline model targets for LLM endpoints. -- [Agentic Evaluation Metrics](agentic.md) - Metrics for evaluating agent tool calling, goal accuracy, and trajectory. -- [LLM-as-a-Judge](llm-as-a-judge.md) - Custom judge-based evaluation with flexible scoring criteria. -- [Bring Your Own Metric](remote.md) - Integrate custom evaluation endpoints. +- [Model Configuration](/evaluation/metrics/model-configuration) - Inline model targets for LLM endpoints. +- [Agentic Evaluation Metrics](/evaluation/metrics/agentic-metrics) - Metrics for evaluating agent tool calling, goal accuracy, and trajectory. +- [LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge) - Custom judge-based evaluation with flexible scoring criteria. +- [Bring Your Own Metric](/evaluation/metrics/bring-your-own-metric) - Integrate custom evaluation endpoints. diff --git a/docs/evaluator/metrics/agentic.mdx b/docs/evaluator/metrics/agentic.mdx index 0f6a433303..ca63248522 100644 --- a/docs/evaluator/metrics/agentic.mdx +++ b/docs/evaluator/metrics/agentic.mdx @@ -1,3 +1,7 @@ +--- +title: "Agentic Metrics" +description: "" +--- # Agentic Evaluation Metrics @@ -7,7 +11,7 @@ Evaluate agent-based and multi-step reasoning models using metrics powered by [R Key stages of agent workflow evaluation: -![Agent Evaluation Framework](../images/agent_eval_framework.png) +![Agent Evaluation Framework](/evaluator/images/agent_eval_framework.png) **1. Intermediate Steps Evaluation** Assesses the correctness of intermediate steps during agent execution: @@ -20,7 +24,7 @@ Evaluates the quality of the agent's final output using: - **Agent Goal Accuracy**: Measures whether the agent successfully completed the requested task. Refer to [Agent Goal Accuracy](#agent-goal-accuracy). - **Topic Adherence**: Assesses how well the agent maintained focus on the assigned topic throughout the conversation. Refer to [Topic Adherence](#topic-adherence). - **Answer Accuracy**: Evaluates the factual correctness of agent answers. Refer to [Answer Accuracy](#answer-accuracy). -- **Custom Metrics**: For domain-specific or custom evaluation criteria, use [LLM-as-a-Judge](llm-as-a-judge.md) with the `data` task type. +- **Custom Metrics**: For domain-specific or custom evaluation criteria, use [LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge) with the `data` task type. **3. Trajectory Evaluation** Evaluates the agent's decision-making process by analyzing the entire sequence of actions taken to accomplish a goal. This includes assessing whether the agent chose appropriate tools in the correct order. Refer to [Trajectory Evaluation](#trajectory-evaluation) for the expected data format and current plugin SDK support. @@ -48,8 +52,9 @@ Online target generation first calls a model or agent target, then evaluates the - Use `RunConfigOnlineModel` for model targets and `RunConfigOnline` for agent targets. - Include a `prompt_template` when the dataset row must be transformed into a model or agent request. -!!! note - **Response Usage**: In online target generation, the generated response is used as the metric response. A dataset `response` column is optional and is superseded by the generated response for that run. + +**Response Usage**: In online target generation, the generated response is used as the metric response. A dataset `response` column is optional and is superseded by the generated response for that run. + --- @@ -66,8 +71,9 @@ Agentic metrics evaluate different aspects of agent behavior: | [**Answer Accuracy**](#answer-accuracy) | Checks factual correctness | Yes | `run` + `submit` | | [**Trajectory Evaluation**](#trajectory-evaluation) | Evaluates decision-making across action sequence | Yes | Not exposed as a typed plugin SDK metric | -!!! note - Use `evaluator.run(...)` for local in-process evaluation and `evaluator.submit(...)` for durable remote platform jobs. The examples below use inline dataset rows through `dataset=[...]`, but you can use a file Path or a FilesetRef instead. + +Use `evaluator.run(...)` for local in-process evaluation and `evaluator.submit(...)` for durable remote platform jobs. The examples below use inline dataset rows through `dataset=[...]`, but you can use a file Path or a FilesetRef instead. + ## Prerequisites @@ -75,7 +81,7 @@ Before running agentic evaluations: 1. **Workspace**: Have a workspace created. All remote resources, including secrets and jobs, are scoped to a workspace. 2. **Judge LLM endpoint** *(for most metrics)*: Have access to an LLM that will serve as your judge. -3. **API key secret** *(if judge requires auth)*: If your judge endpoint requires authentication, [create a secret](../../get-started/concepts/manage-secrets.md) to store the API key. For local `run` versus remote `submit` behavior, see [Model API Authentication](model-configuration.md#model-api-authentication). +3. **API key secret** *(if judge requires auth)*: If your judge endpoint requires authentication, [create a secret](/get-started/core-concepts/manage-secrets) to store the API key. For local `run` versus remote `submit` behavior, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). 4. **Initialize the SDK**: ```python @@ -138,8 +144,9 @@ Use `dataset=[...]` for inline rows. For offline scoring options, use `config=Ru Evaluates whether the agent invoked the correct tools with the correct arguments. This metric **does not require a judge LLM**. -!!! note - **Online/offline support**: Tool Call Accuracy supports scoring existing tool calls directly. It can also score tool calls produced during online target generation when the target response includes the required tool-call fields. + +**Online/offline support**: Tool Call Accuracy supports scoring existing tool calls directly. It can also score tool calls produced during online target generation when the target response includes the required tool-call fields. + #### Data Format @@ -182,100 +189,100 @@ Evaluates whether the agent invoked the correct tools with the correct arguments } ``` -=== "Run Locally" - - ```python - from nemo_evaluator_sdk.metrics.ragas import ToolCallAccuracyMetric - metric = ToolCallAccuracyMetric() - - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": [ - {"content": "What's the weather in Paris?", "type": "human"}, - { - "content": "Let me check.", - "type": "ai", - "tool_calls": [{"name": "weather_api", "args": {"city": "Paris"}}], - }, - {"content": "Sunny, 22°C", "type": "tool"}, - {"content": "It's sunny and 22°C in Paris.", "type": "ai"}, - ], - "reference_tool_calls": [ - {"name": "weather_api", "args": {"city": "Paris"}} - ], - } - ], - ) - print(result.aggregate_scores) - ``` - -=== "Submit Job" - - ```python - from nemo_evaluator_sdk import RunConfig - from nemo_evaluator_sdk.metrics.ragas import ToolCallAccuracyMetric - metric = ToolCallAccuracyMetric() - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": [ - {"content": "What's the weather in Paris?", "type": "human"}, - { - "content": "Let me check.", - "type": "ai", - "tool_calls": [{"name": "weather_api", "args": {"city": "Paris"}}], - }, - {"content": "Sunny, 22°C", "type": "tool"}, - {"content": "It's sunny and 22°C in Paris.", "type": "ai"}, - ], - "reference_tool_calls": [ - {"name": "weather_api", "args": {"city": "Paris"}} - ], - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` - -=== "Result" + + +```python +from nemo_evaluator_sdk.metrics.ragas import ToolCallAccuracyMetric +metric = ToolCallAccuracyMetric() - ```json - { - "aggregate_scores": [ +result = evaluator.run( + metric=metric, + dataset=[ { - "name": "tool_call_accuracy", - "count": 1, - "mean": 1.0, - "min": 1.0, - "max": 1.0 + "user_input": [ + {"content": "What's the weather in Paris?", "type": "human"}, + { + "content": "Let me check.", + "type": "ai", + "tool_calls": [{"name": "weather_api", "args": {"city": "Paris"}}], + }, + {"content": "Sunny, 22°C", "type": "tool"}, + {"content": "It's sunny and 22°C in Paris.", "type": "ai"}, + ], + "reference_tool_calls": [ + {"name": "weather_api", "args": {"city": "Paris"}} + ], } - ], - "row_scores": [ + ], +) +print(result.aggregate_scores) +``` + + +```python +from nemo_evaluator_sdk import RunConfig +from nemo_evaluator_sdk.metrics.ragas import ToolCallAccuracyMetric +metric = ToolCallAccuracyMetric() + +job = evaluator.submit( + metric=metric, + dataset=[ { - "index": 0, - "scores": { - "tool_call_accuracy": 1.0 - } + "user_input": [ + {"content": "What's the weather in Paris?", "type": "human"}, + { + "content": "Let me check.", + "type": "ai", + "tool_calls": [{"name": "weather_api", "args": {"city": "Paris"}}], + }, + {"content": "Sunny, 22°C", "type": "tool"}, + {"content": "It's sunny and 22°C in Paris.", "type": "ai"}, + ], + "reference_tool_calls": [ + {"name": "weather_api", "args": {"city": "Paris"}} + ], } - ] + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + +```json +{ + "aggregate_scores": [ + { + "name": "tool_call_accuracy", + "count": 1, + "mean": 1.0, + "min": 1.0, + "max": 1.0 } - ``` - + ], + "row_scores": [ + { + "index": 0, + "scores": { + "tool_call_accuracy": 1.0 + } + } + ] +} +``` + + --- ### Tool Calling (Template) A template-based metric for evaluating tool/function call accuracy. Unlike the RAGAS Tool Call Accuracy metric, this metric uses configurable templates and produces multiple scores. -!!! note - **Online/offline support**: Tool Calling supports scoring existing tool-call outputs directly. Use online target generation when you want the model or agent target to produce the response first. + +**Online/offline support**: Tool Calling supports scoring existing tool-call outputs directly. Use online target generation when you want the model or agent target to produce the response first. + #### Scores Produced @@ -339,139 +346,139 @@ Data must use OpenAI-compliant tool calling format: } ``` -!!! note - - Function names with dots (`.`) must be replaced with underscores (`_`). - - Comparison is case-sensitive. - - Order of tool calls is ignored, which supports parallel tool calling. + +- Function names with dots (`.`) must be replaced with underscores (`_`). +- Comparison is case-sensitive. +- Order of tool calls is ignored, which supports parallel tool calling. + -=== "Run Locally" + + - {% raw %} - ```python +```python - from nemo_evaluator_sdk import ToolCallingMetric - metric = ToolCallingMetric(reference="{{item.tool_calls}}") +from nemo_evaluator_sdk import ToolCallingMetric +metric = ToolCallingMetric(reference="{{item.tool_calls}}") - result = evaluator.run( - metric=metric, - dataset=[ - { - "messages": [ - {"role": "user", "content": "Book a table for 2 at 7pm."}, - { - "role": "assistant", - "content": "Booking...", - "tool_calls": [ - { - "function": { - "name": "book_table", - "arguments": {"people": 2, "time": "7pm"}, - } +result = evaluator.run( + metric=metric, + dataset=[ + { + "messages": [ + {"role": "user", "content": "Book a table for 2 at 7pm."}, + { + "role": "assistant", + "content": "Booking...", + "tool_calls": [ + { + "function": { + "name": "book_table", + "arguments": {"people": 2, "time": "7pm"}, } - ], - }, - ], - "tool_calls": [ - { - "function": { - "name": "book_table", - "arguments": {"people": 2, "time": "7pm"}, } + ], + }, + ], + "tool_calls": [ + { + "function": { + "name": "book_table", + "arguments": {"people": 2, "time": "7pm"}, } - ], - "response": { - "choices": [ - { - "message": { - "tool_calls": [ - { - "function": { - "name": "book_table", - "arguments": '{"people": 2, "time": "7pm"}', - } + } + ], + "response": { + "choices": [ + { + "message": { + "tool_calls": [ + { + "function": { + "name": "book_table", + "arguments": '{"people": 2, "time": "7pm"}', } - ] - } + } + ] } - ] - }, - } - ], - ) - print(result.aggregate_scores) - ``` - {% endraw %} + } + ] + }, + } + ], +) +print(result.aggregate_scores) +``` -=== "Submit Job" + + - {% raw %} - ```python - from nemo_evaluator_sdk import RunConfig, ToolCallingMetric +```python +from nemo_evaluator_sdk import RunConfig, ToolCallingMetric - metric = ToolCallingMetric(reference="{{item.tool_calls}}") +metric = ToolCallingMetric(reference="{{item.tool_calls}}") - job = evaluator.submit( - metric=metric, - dataset=[ - { - "tool_calls": [ - { - "function": { - "name": "book_table", - "arguments": {"people": 2, "time": "7pm"}, - } +job = evaluator.submit( + metric=metric, + dataset=[ + { + "tool_calls": [ + { + "function": { + "name": "book_table", + "arguments": {"people": 2, "time": "7pm"}, } - ], - "response": { - "choices": [ - { - "message": { - "tool_calls": [ - { - "function": { - "name": "book_table", - "arguments": '{"people": 2, "time": "7pm"}', - } + } + ], + "response": { + "choices": [ + { + "message": { + "tool_calls": [ + { + "function": { + "name": "book_table", + "arguments": '{"people": 2, "time": "7pm"}', } - ] - } + } + ] } - ] - }, - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` - {% endraw %} - -=== "Result" - - ```json - { - "aggregate_scores": [ - { - "name": "function_name_accuracy", - "count": 1, - "mean": 1.0, - "min": 1.0, - "max": 1.0 - }, - { - "name": "function_name_and_args_accuracy", - "count": 1, - "mean": 1.0, - "min": 1.0, - "max": 1.0 + } + ] + }, } - ] - } - ``` + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + +```json +{ + "aggregate_scores": [ + { + "name": "function_name_accuracy", + "count": 1, + "mean": 1.0, + "min": 1.0, + "max": 1.0 + }, + { + "name": "function_name_and_args_accuracy", + "count": 1, + "mean": 1.0, + "min": 1.0, + "max": 1.0 + } + ] +} +``` + + --- ### Topic Adherence @@ -506,140 +513,138 @@ Measures how well the agent maintained focus on assigned topics throughout a con |-----------|------|---------|-------------| | `metric_mode` | string | `"f1"` | Scoring mode: `"f1"`, `"precision"`, or `"recall"` | -=== "Run Locally" - - ```python - from nemo_evaluator_sdk import Model - from nemo_evaluator_sdk.metrics.ragas import TopicAdherenceMetric + + +```python +from nemo_evaluator_sdk import Model +from nemo_evaluator_sdk.metrics.ragas import TopicAdherenceMetric - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = TopicAdherenceMetric(metric_mode="f1", judge_model=judge_model) +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = TopicAdherenceMetric(metric_mode="f1", judge_model=judge_model) - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": [ - {"content": "Tell me about healthy eating", "type": "human"}, - { - "content": "Eating fruits and vegetables is essential for good health.", - "type": "ai", - }, - ], - "reference_topics": ["health", "nutrition", "diet"], - } - ], - ) - print(result.aggregate_scores) - ``` - -=== "Submit Job" - - ```python - from nemo_evaluator_sdk import RunConfig, Model - from nemo_evaluator_sdk.metrics.ragas import TopicAdherenceMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = TopicAdherenceMetric(metric_mode="f1", judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": [ - {"content": "Tell me about healthy eating", "type": "human"}, - { - "content": "Eating fruits and vegetables is essential for good health.", - "type": "ai", - }, - ], - "reference_topics": ["health", "nutrition", "diet"], - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` - -=== "Online Target Generation" - - {% raw %} - ```python - from nemo_evaluator_sdk import RunConfigOnlineModel, InferenceParams, Model - from nemo_evaluator_sdk.metrics.ragas import TopicAdherenceMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - target_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="nvidia/llama-3.3-nemotron-super-49b-v1", - api_key_secret="nvidia-api-key", - ) - metric = TopicAdherenceMetric(metric_mode="f1", judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": "Tell me about healthy eating", - "reference_topics": ["health", "nutrition", "diet"], - } - ], - config=RunConfigOnlineModel( - parallelism=4, - inference=InferenceParams(temperature=0.7, max_tokens=1024), - ), - target=target_model, - prompt_template={ - "messages": [ +result = evaluator.run( + metric=metric, + dataset=[ + { + "user_input": [ + {"content": "Tell me about healthy eating", "type": "human"}, { - "role": "user", - "content": "{{item.user_input}}", - } - ] - }, - ) - print(result.aggregate_scores) - ``` - {% endraw %} + "content": "Eating fruits and vegetables is essential for good health.", + "type": "ai", + }, + ], + "reference_topics": ["health", "nutrition", "diet"], + } + ], +) +print(result.aggregate_scores) +``` + + +```python +from nemo_evaluator_sdk import RunConfig, Model +from nemo_evaluator_sdk.metrics.ragas import TopicAdherenceMetric -=== "Result" +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = TopicAdherenceMetric(metric_mode="f1", judge_model=judge_model) - ```json - { - "aggregate_scores": [ +job = evaluator.submit( + metric=metric, + dataset=[ { - "name": "topic_adherence(mode=f1)", - "count": 1, - "mean": 0.85, - "min": 0.85, - "max": 0.85 + "user_input": [ + {"content": "Tell me about healthy eating", "type": "human"}, + { + "content": "Eating fruits and vegetables is essential for good health.", + "type": "ai", + }, + ], + "reference_topics": ["health", "nutrition", "diet"], } - ], - "row_scores": [ + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + + +```python +from nemo_evaluator_sdk import RunConfigOnlineModel, InferenceParams, Model +from nemo_evaluator_sdk.metrics.ragas import TopicAdherenceMetric + +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +target_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="nvidia/llama-3.3-nemotron-super-49b-v1", + api_key_secret="nvidia-api-key", +) +metric = TopicAdherenceMetric(metric_mode="f1", judge_model=judge_model) + +result = evaluator.run( + metric=metric, + dataset=[ { - "index": 0, - "scores": { - "topic_adherence(mode=f1)": 0.85 - } + "user_input": "Tell me about healthy eating", + "reference_topics": ["health", "nutrition", "diet"], } - ] - } - ``` + ], + config=RunConfigOnlineModel( + parallelism=4, + inference=InferenceParams(temperature=0.7, max_tokens=1024), + ), + target=target_model, + prompt_template={ + "messages": [ + { + "role": "user", + "content": "{{item.user_input}}", + } + ] + }, +) +print(result.aggregate_scores) +``` + + +```json +{ + "aggregate_scores": [ + { + "name": "topic_adherence(mode=f1)", + "count": 1, + "mean": 0.85, + "min": 0.85, + "max": 0.85 + } + ], + "row_scores": [ + { + "index": 0, + "scores": { + "topic_adherence(mode=f1)": 0.85 + } + } + ] +} +``` + + --- ### Agent Goal Accuracy @@ -688,107 +693,106 @@ Compare the agent's outcome against a known reference: |-----------|------|---------|-------------| | `use_reference` | boolean | `True` | Whether to compare against a reference outcome | -=== "Run Locally" - - ```python - from nemo_evaluator_sdk import Model - from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = AgentGoalAccuracyMetric(use_reference=True, judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": [ - {"content": "Book a table at a restaurant for 8pm", "type": "human"}, - { - "content": "I'll search for restaurants.", - "type": "ai", - "tool_calls": [{"name": "restaurant_search", "args": {}}], - }, - {"content": "Found: Italian Place", "type": "tool"}, - { - "content": "Your table at Italian Place is booked for 8pm.", - "type": "ai", - }, - ], - "reference": "Successfully booked a table at a restaurant for 8pm", - } - ], - ) - print(result.aggregate_scores) - ``` - -=== "Submit Job" - - ```python - from nemo_evaluator_sdk import RunConfig, Model - from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = AgentGoalAccuracyMetric(use_reference=True, judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": [ - {"content": "Book a table at a restaurant for 8pm", "type": "human"}, - { - "content": "I'll search for restaurants.", - "type": "ai", - "tool_calls": [{"name": "restaurant_search", "args": {}}], - }, - {"content": "Found: Italian Place", "type": "tool"}, - { - "content": "Your table at Italian Place is booked for 8pm.", - "type": "ai", - }, - ], - "reference": "Successfully booked a table at a restaurant for 8pm", - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` + + +```python +from nemo_evaluator_sdk import Model +from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric -=== "Result" +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = AgentGoalAccuracyMetric(use_reference=True, judge_model=judge_model) - ```json - { - "aggregate_scores": [ +result = evaluator.run( + metric=metric, + dataset=[ { - "name": "agent_goal_accuracy", - "count": 1, - "mean": 1.0, - "min": 1.0, - "max": 1.0 + "user_input": [ + {"content": "Book a table at a restaurant for 8pm", "type": "human"}, + { + "content": "I'll search for restaurants.", + "type": "ai", + "tool_calls": [{"name": "restaurant_search", "args": {}}], + }, + {"content": "Found: Italian Place", "type": "tool"}, + { + "content": "Your table at Italian Place is booked for 8pm.", + "type": "ai", + }, + ], + "reference": "Successfully booked a table at a restaurant for 8pm", } - ], - "row_scores": [ + ], +) +print(result.aggregate_scores) +``` + + +```python +from nemo_evaluator_sdk import RunConfig, Model +from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric + +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = AgentGoalAccuracyMetric(use_reference=True, judge_model=judge_model) + +job = evaluator.submit( + metric=metric, + dataset=[ { - "index": 0, - "scores": { - "agent_goal_accuracy": 1.0 - } + "user_input": [ + {"content": "Book a table at a restaurant for 8pm", "type": "human"}, + { + "content": "I'll search for restaurants.", + "type": "ai", + "tool_calls": [{"name": "restaurant_search", "args": {}}], + }, + {"content": "Found: Italian Place", "type": "tool"}, + { + "content": "Your table at Italian Place is booked for 8pm.", + "type": "ai", + }, + ], + "reference": "Successfully booked a table at a restaurant for 8pm", } - ] + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + +```json +{ + "aggregate_scores": [ + { + "name": "agent_goal_accuracy", + "count": 1, + "mean": 1.0, + "min": 1.0, + "max": 1.0 } - ``` - + ], + "row_scores": [ + { + "index": 0, + "scores": { + "agent_goal_accuracy": 1.0 + } + } + ] +} +``` + + #### Without Reference The judge LLM infers the goal from the conversation context: @@ -828,99 +832,99 @@ The judge LLM infers the goal from the conversation context: } ``` -=== "Run Locally" + + +```python +from nemo_evaluator_sdk import Model +from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric - ```python - from nemo_evaluator_sdk import Model - from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = AgentGoalAccuracyMetric(use_reference=False, judge_model=judge_model) - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = AgentGoalAccuracyMetric(use_reference=False, judge_model=judge_model) +result = evaluator.run( + metric=metric, + dataset=[ + { + "user_input": [ + { + "content": "Set a reminder for my dentist appointment tomorrow at 2pm", + "type": "human", + }, + { + "content": "I'll set that reminder for you.", + "type": "ai", + "tool_calls": [ + { + "name": "set_reminder", + "args": { + "title": "Dentist appointment", + "date": "tomorrow", + "time": "2pm", + }, + } + ], + }, + {"content": "Reminder set successfully.", "type": "tool"}, + {"content": "Your reminder has been set.", "type": "ai"}, + ], + } + ], +) +print(result.aggregate_scores) +``` + + +```python +from nemo_evaluator_sdk import RunConfig, Model +from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": [ - { - "content": "Set a reminder for my dentist appointment tomorrow at 2pm", - "type": "human", - }, - { - "content": "I'll set that reminder for you.", - "type": "ai", - "tool_calls": [ - { - "name": "set_reminder", - "args": { - "title": "Dentist appointment", - "date": "tomorrow", - "time": "2pm", - }, - } - ], - }, - {"content": "Reminder set successfully.", "type": "tool"}, - {"content": "Your reminder has been set.", "type": "ai"}, - ], - } - ], - ) - print(result.aggregate_scores) - ``` - -=== "Submit Job" - - ```python - from nemo_evaluator_sdk import RunConfig, Model - from nemo_evaluator_sdk.metrics.ragas import AgentGoalAccuracyMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = AgentGoalAccuracyMetric(use_reference=False, judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": [ - { - "content": "Set a reminder for my dentist appointment tomorrow at 2pm", - "type": "human", - }, - { - "content": "I'll set that reminder for you.", - "type": "ai", - "tool_calls": [ - { - "name": "set_reminder", - "args": { - "title": "Dentist appointment", - "date": "tomorrow", - "time": "2pm", - }, - } - ], - }, - {"content": "Reminder set successfully.", "type": "tool"}, - {"content": "Your reminder has been set.", "type": "ai"}, - ], - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = AgentGoalAccuracyMetric(use_reference=False, judge_model=judge_model) +job = evaluator.submit( + metric=metric, + dataset=[ + { + "user_input": [ + { + "content": "Set a reminder for my dentist appointment tomorrow at 2pm", + "type": "human", + }, + { + "content": "I'll set that reminder for you.", + "type": "ai", + "tool_calls": [ + { + "name": "set_reminder", + "args": { + "title": "Dentist appointment", + "date": "tomorrow", + "time": "2pm", + }, + } + ], + }, + {"content": "Reminder set successfully.", "type": "tool"}, + {"content": "Your reminder has been set.", "type": "ai"}, + ], + } + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + --- ### Answer Accuracy @@ -937,143 +941,142 @@ Evaluates the factual correctness of an agent's answer by comparing it against a } ``` -=== "Run Locally" - - ```python - from nemo_evaluator_sdk import Model - from nemo_evaluator_sdk.metrics.ragas import AnswerAccuracyMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = AnswerAccuracyMetric(judge_model=judge_model) + + +```python +from nemo_evaluator_sdk import Model +from nemo_evaluator_sdk.metrics.ragas import AnswerAccuracyMetric - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "response": "The capital of France is Paris.", - "reference": "Paris", - } - ], - ) - print(result.aggregate_scores) - ``` - -=== "Submit Job" - - ```python - from nemo_evaluator_sdk import RunConfig, Model - from nemo_evaluator_sdk.metrics.ragas import AnswerAccuracyMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - metric = AnswerAccuracyMetric(judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "response": "The capital of France is Paris.", - "reference": "Paris", - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` - -=== "Online Target Generation" - - {% raw %} - ```python - from nemo_evaluator_sdk import RunConfigOnlineModel, InferenceParams, Model - from nemo_evaluator_sdk.metrics.ragas import AnswerAccuracyMetric - - judge_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="meta/llama-3.1-70b-instruct", - api_key_secret="nvidia-api-key", - ) - target_model = Model( - url="https://integrate.api.nvidia.com/v1/chat/completions", - name="nvidia/llama-3.3-nemotron-super-49b-v1", - api_key_secret="nvidia-api-key", - ) - metric = AnswerAccuracyMetric(judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "reference": "Paris", - } - ], - config=RunConfigOnlineModel( - parallelism=4, - inference=InferenceParams(temperature=0.7, max_tokens=1024), - ), - target=target_model, - prompt_template={ - "messages": [ - { - "role": "user", - "content": "{{item.user_input}}", - } - ] - }, - ) +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = AnswerAccuracyMetric(judge_model=judge_model) - job.wait_until_done() - result = job.get_result() - print(result.aggregate_scores) - ``` - {% endraw %} +result = evaluator.run( + metric=metric, + dataset=[ + { + "user_input": "What is the capital of France?", + "response": "The capital of France is Paris.", + "reference": "Paris", + } + ], +) +print(result.aggregate_scores) +``` + + +```python +from nemo_evaluator_sdk import RunConfig, Model +from nemo_evaluator_sdk.metrics.ragas import AnswerAccuracyMetric -=== "Result" +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +metric = AnswerAccuracyMetric(judge_model=judge_model) - ```json - { - "aggregate_scores": [ +job = evaluator.submit( + metric=metric, + dataset=[ { - "name": "nv_accuracy", - "count": 1, - "mean": 1.0, - "min": 1.0, - "max": 1.0 + "user_input": "What is the capital of France?", + "response": "The capital of France is Paris.", + "reference": "Paris", } - ], - "row_scores": [ + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + + +```python +from nemo_evaluator_sdk import RunConfigOnlineModel, InferenceParams, Model +from nemo_evaluator_sdk.metrics.ragas import AnswerAccuracyMetric + +judge_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="meta/llama-3.1-70b-instruct", + api_key_secret="nvidia-api-key", +) +target_model = Model( + url="https://integrate.api.nvidia.com/v1/chat/completions", + name="nvidia/llama-3.3-nemotron-super-49b-v1", + api_key_secret="nvidia-api-key", +) +metric = AnswerAccuracyMetric(judge_model=judge_model) + +job = evaluator.submit( + metric=metric, + dataset=[ { - "index": 0, - "scores": { - "nv_accuracy": 1.0 - } + "user_input": "What is the capital of France?", + "reference": "Paris", } - ] - } - ``` + ], + config=RunConfigOnlineModel( + parallelism=4, + inference=InferenceParams(temperature=0.7, max_tokens=1024), + ), + target=target_model, + prompt_template={ + "messages": [ + { + "role": "user", + "content": "{{item.user_input}}", + } + ] + }, +) +job.wait_until_done() +result = job.get_result() +print(result.aggregate_scores) +``` + + + +```json +{ + "aggregate_scores": [ + { + "name": "nv_accuracy", + "count": 1, + "mean": 1.0, + "min": 1.0, + "max": 1.0 + } + ], + "row_scores": [ + { + "index": 0, + "scores": { + "nv_accuracy": 1.0 + } + } + ] +} +``` + + --- ### Trajectory Evaluation Evaluates the agent's decision-making process by analyzing the entire sequence of actions (trajectory) taken to accomplish a goal. This **system metric** assesses whether the agent chose appropriate tools in the correct order. -!!! info - **NAT Format Requirement**: This metric supports the NVIDIA Agent Toolkit format with `intermediate_steps` containing detailed event traces. + +**NAT Format Requirement**: This metric supports the NVIDIA Agent Toolkit format with `intermediate_steps` containing detailed event traces. - **Current plugin SDK support**: The current plugin SDK does not expose a typed trajectory-evaluation metric class. Use the data-format details below when preparing datasets for environments where the system metric is enabled, but do not use the old generated evaluator job APIs for plugin SDK execution. +**Current plugin SDK support**: The current plugin SDK does not expose a typed trajectory-evaluation metric class. Use the data-format details below when preparing datasets for environments where the system metric is enabled, but do not use the old generated evaluator job APIs for plugin SDK execution. + #### Data Format @@ -1144,8 +1147,9 @@ judge_model = Model( ) ``` -!!! info - **Recommended model size**: Use a 70B+ parameter model as the judge for reliable results. Smaller models may fail to follow the required output schema, causing parsing errors. + +**Recommended model size**: Use a 70B+ parameter model as the judge for reliable results. Smaller models may fail to follow the required output schema, causing parsing errors. + ### Using Reasoning Models @@ -1184,9 +1188,9 @@ judge_model = Model( ) ``` -For more details on secret management, refer to [Managing Secrets](../../get-started/concepts/manage-secrets.md). +For more details on secret management, refer to [Managing Secrets](/get-started/core-concepts/manage-secrets). -For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](model-configuration.md#model-api-authentication). +For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). --- @@ -1200,7 +1204,7 @@ result = job.get_result() print(result.aggregate_scores) ``` -Navigate to [Metrics Job Management](job-management.md) for more job lifecycle details. +Navigate to [Metrics Job Management](/evaluation/metrics/job-management) for more job lifecycle details. --- @@ -1225,11 +1229,12 @@ Use `RunConfig(limit_samples=...)` when you want to test a small slice of a larg 4. **RAGAS Dependency**: These metrics are powered by RAGAS and may have version-specific behavior. -5. **NaN troubleshooting for judge-based metrics**: If you see `nan_count > 0` with `mean = null`, check judge model authentication first (API key secret, endpoint access, and model permissions). See [Model API Authentication](model-configuration.md#model-api-authentication) for `api_key_secret` behavior. Some RAGAS metrics are known to convert auth failures into `NaN` scores instead of raising a hard error. +5. **NaN troubleshooting for judge-based metrics**: If you see `nan_count > 0` with `mean = null`, check judge model authentication first (API key secret, endpoint access, and model permissions). See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication) for `api_key_secret` behavior. Some RAGAS metrics are known to convert auth failures into `NaN` scores instead of raising a hard error. -!!! info - - [Agent Configuration](agent-configuration.md) - Use agents (generic or NAT) as targets in online evaluation jobs - - [Agentic Benchmarks](../benchmarks/agentic.md) - BFCL benchmark for tool-calling evaluation - - [LLM-as-a-Judge](llm-as-a-judge.md) - Custom judge-based evaluation - - [Evaluation Results](results.md) - Understanding results - - [RAG Metrics](rag.md) - RAGAS metrics for RAG pipelines + +- [Agent Configuration](/evaluation/metrics/agent-configuration) - Use agents (generic or NAT) as targets in online evaluation jobs +- [Agentic Benchmarks](/evaluation/benchmarks/agentic-benchmarks) - BFCL benchmark for tool-calling evaluation +- [LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge) - Custom judge-based evaluation +- [Evaluation Results](/evaluation/metrics/metric-results) - Understanding results +- [RAG Metrics](/evaluation/metrics/rag-metrics) - RAGAS metrics for RAG pipelines + diff --git a/docs/evaluator/metrics/index.mdx b/docs/evaluator/metrics/index.mdx index 85b5ad529e..a71aada147 100644 --- a/docs/evaluator/metrics/index.mdx +++ b/docs/evaluator/metrics/index.mdx @@ -1,3 +1,7 @@ +--- +title: "Overview" +description: "" +--- # Evaluation Metrics @@ -11,10 +15,13 @@ A metric is a scoring definition that evaluates model or agent outputs. In the E - **Outputs**: Row-level scores and aggregate statistics. - **Execution**: Metric objects run with `dataset`, optional runtime configuration, and an optional model or agent target. -!!! note "Terminology on this page:" - - **Metric definition**: The reusable scoring configuration. - - **Metric type**: The metric family (for example exact-match, BLEU, LLM-as-a-judge). - - **Metric score**: The numeric or rubric output produced at evaluation time. + +Terminology on this page: +- **Metric definition**: The reusable scoring configuration. +- **Metric type**: The metric family (for example exact-match, BLEU, LLM-as-a-judge). +- **Metric score**: The numeric or rubric output produced at evaluation time. + + ## The Evaluation Workflow ```text @@ -34,7 +41,7 @@ A metric is a scoring definition that evaluates model or agent outputs. In the E Minimal sync evaluation with a built-in metric: -{% raw %} + ```python import os @@ -60,7 +67,7 @@ result = evaluator.run( print(result.aggregate_scores) ``` -{% endraw %} + ## Execution Modes @@ -80,15 +87,15 @@ Online evaluation jobs can target either a **model** (an OpenAI-compatible chat | **Model** | Standalone LLM endpoints using a standard chat completions API. | | **Agent** | Agentic systems, NeMo Agent Toolkit workflows, or custom HTTP endpoints with non-standard response formats. | -See [Model Configuration](model-configuration.md) and [Agent Configuration](agent-configuration.md) for setup details. +See [Model Configuration](/evaluation/metrics/model-configuration) and [Agent Configuration](/evaluation/metrics/agent-configuration) for setup details. ## Built-in vs. Custom Metrics -- **Built-in metrics**: Ready-to-use metrics provided by {{platform_name}} (for example `exact-match`, `bleu`, `rouge`). +- **Built-in metrics**: Ready-to-use metrics provided by NeMo Platform (for example `exact-match`, `bleu`, `rouge`). - **Custom metrics**: Metrics you define for domain-specific evaluation needs. -To configure inline metric objects, see [Manage Metrics](manage-metrics.md). -For custom metric creation guides, start with [Similarity Metrics](similarity.md), [LLM-as-a-Judge](llm-as-a-judge.md), or [Bring Your Own Metric](remote.md). +To configure inline metric objects, see [Manage Metrics](/evaluation/metrics/manage-metrics). +For custom metric creation guides, start with [Similarity Metrics](/evaluation/metrics/similarity-metrics), [LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge), or [Bring Your Own Metric](/evaluation/metrics/bring-your-own-metric). ## Datasets @@ -97,7 +104,7 @@ Evaluation jobs need dataset input. You can provide data in two ways: | Dataset Source | Description | Best For | |------|-------------|----------| | **DatasetRows** | Inline rows sent directly in the request | Quick testing and live evaluation | -| **FilesetRef** | Reference to a persisted [fileset](../../get-started/concepts/manage-files.md) (`workspace/fileset-name`) | Production jobs and reusable datasets | +| **FilesetRef** | Reference to a persisted [fileset](/get-started/core-concepts/manage-files) (`workspace/fileset-name`) | Production jobs and reusable datasets | Example of providing a `FilesetRef` to reference specific files or globs: @@ -124,7 +131,7 @@ Use the metric-type pages below to create and configure custom metrics.
-- **[LLM-as-a-Judge](llm-as-a-judge.md)** +- **[LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge)** --- @@ -132,7 +139,7 @@ Use the metric-type pages below to create and configure custom metrics. custom-scoring rubrics -- **[Agentic Metrics](agentic.md)** +- **[Agentic Metrics](/evaluation/metrics/agentic-metrics)** --- @@ -140,7 +147,7 @@ Use the metric-type pages below to create and configure custom metrics. RAGAS tool-calling -- **[RAG Metrics](rag.md)** +- **[RAG Metrics](/evaluation/metrics/rag-metrics)** --- @@ -148,7 +155,7 @@ Use the metric-type pages below to create and configure custom metrics. faithfulness relevancy -- **[Similarity Metrics](similarity.md)** +- **[Similarity Metrics](/evaluation/metrics/similarity-metrics)** --- @@ -156,7 +163,7 @@ Use the metric-type pages below to create and configure custom metrics. F1 ROUGE BLEU -- **[Bring Your Own Metric](remote.md)** +- **[Bring Your Own Metric](/evaluation/metrics/bring-your-own-metric)** --- @@ -164,7 +171,7 @@ Use the metric-type pages below to create and configure custom metrics. remote custom -- **[Agent Configuration](agent-configuration.md)** +- **[Agent Configuration](/evaluation/metrics/agent-configuration)** --- @@ -185,4 +192,4 @@ Scores are the metric outputs produced during evaluation: ## Manage Metric Definitions -Create inline metric objects that can be reused from Python helpers or modules. See [Manage Metrics](manage-metrics.md) for SDK patterns. +Create inline metric objects that can be reused from Python helpers or modules. See [Manage Metrics](/evaluation/metrics/manage-metrics) for SDK patterns. diff --git a/docs/evaluator/metrics/job-management.mdx b/docs/evaluator/metrics/job-management.mdx index bf263b242b..9937c411fc 100644 --- a/docs/evaluator/metrics/job-management.mdx +++ b/docs/evaluator/metrics/job-management.mdx @@ -1,14 +1,19 @@ +--- +title: "Metric Job Management" +description: "" +--- # Metric Job Management -!!! note - **Performance Tuning**: You can improve evaluation speed by passing `RunConfig(parallelism=...)` in `config=...`. Adjust parallelism based on your model capacity and rate limits. + +**Performance Tuning**: You can improve evaluation speed by passing `RunConfig(parallelism=...)` in `config=...`. Adjust parallelism based on your model capacity and rate limits. + ## Submit a Job Submit a metric evaluation with the Evaluator plugin SDK. The `submit` method returns an `EvaluatorJobResource` that you can use to monitor the job and download results. -{% raw %} + ```python import os @@ -38,7 +43,7 @@ print("Submitted job:", job.name) job.wait_until_done() result = job.get_result() ``` -{% endraw %} + ## Monitor Job @@ -54,7 +59,7 @@ if not job.check_if_complete(): `wait_until_done()` raises if the job reaches a terminal failure status or times out. -Refer to [Evaluator Troubleshooting](../../troubleshooting/evaluator.md) for help troubleshooting job failures. +Refer to [Evaluator Troubleshooting](/reference/troubleshooting/evaluator) for help troubleshooting job failures. ## Reconnect to a Job @@ -79,7 +84,7 @@ for row in result.row_scores: print(row.item, row.metrics) ``` -See [Evaluation Results](results.md) for details on aggregate and row-level result fields. +See [Evaluation Results](/evaluation/metrics/metric-results) for details on aggregate and row-level result fields. ## Download Job Artifacts diff --git a/docs/evaluator/metrics/llm-as-a-judge.mdx b/docs/evaluator/metrics/llm-as-a-judge.mdx index 8ed2b74a0a..1318a0bfd3 100644 --- a/docs/evaluator/metrics/llm-as-a-judge.mdx +++ b/docs/evaluator/metrics/llm-as-a-judge.mdx @@ -1,3 +1,7 @@ +--- +title: "LLM-as-a-Judge" +description: "" +--- # Evaluate with LLM-as-a-Judge @@ -11,7 +15,7 @@ LLM-as-a-Judge evaluation sends each dataset row to a judge LLM and parses the j - **Pre-generated data**: Score existing question-answer pairs or conversations. - **Custom criteria**: Define range scores, rubric scores, prompt templates, and parser behavior. -{{nem_short_name}} supports two execution modes through the Evaluator plugin SDK: +NeMo Evaluator supports two execution modes through the Evaluator plugin SDK: | Mode | Use Case | SDK Call | |------|----------|----------| @@ -45,8 +49,9 @@ evaluator: Evaluator = client.evaluator # this object is an Evaluator resource ## Local Execution -!!! tip - The `model` field accepts both inline model definitions and model references (for example, `"my-workspace/my-model"`). Refer to [Model Configuration](model-configuration.md) for details. + +The `model` field accepts both inline model definitions and model references (for example, `"my-workspace/my-model"`). Refer to [Model Configuration](/evaluation/metrics/model-configuration) for details. + Live evaluation is designed for rapid iteration when developing and refining your evaluation metrics. Use it to quickly test different judge prompts, scoring criteria, and data formats before committing to a full evaluation job. Results return immediately, making it easy to experiment and debug. @@ -54,7 +59,7 @@ Live evaluation is designed for rapid iteration when developing and refining you Evaluate responses using numerical range scores, such as a 1-5 scale: -{% raw %} + ```python from nemo_evaluator_sdk import ( InferenceParams, @@ -128,41 +133,40 @@ for score in result.aggregate_scores.scores: for row in result.row_scores: print(row.row_index, row.item, row.metrics) ``` -{% endraw %} -The result includes aggregate scores and row scores. Row scores are useful when you are debugging the prompt or parser because they show how each individual row was scored. -??? "Example Response" - :icon: code-square +The result includes aggregate scores and row scores. Row scores are useful when you are debugging the prompt or parser because they show how each individual row was scored. - ```python - { - "aggregate_scores": { - "scores": [ - {"name": "helpfulness", "count": 2, "mean": 4.5, "min": 4.0, "max": 5.0}, - {"name": "accuracy", "count": 2, "mean": 4.0, "min": 3.0, "max": 5.0}, - ] - }, - "row_scores": [ - { - "row_index": 0, - "item": { - "input": "What is the capital of France?", - "output": "The capital of France is Paris.", - }, - "metrics": { - "llm-judge": {"scores": [{"name": "helpfulness", "value": 5.0}]} - }, - } - ], - } - ``` + +```python +{ + "aggregate_scores": { + "scores": [ + {"name": "helpfulness", "count": 2, "mean": 4.5, "min": 4.0, "max": 5.0}, + {"name": "accuracy", "count": 2, "mean": 4.0, "min": 3.0, "max": 5.0}, + ] + }, + "row_scores": [ + { + "row_index": 0, + "item": { + "input": "What is the capital of France?", + "output": "The capital of France is Paris.", + }, + "metrics": { + "llm-judge": {"scores": [{"name": "helpfulness", "value": 5.0}]} + }, + } + ], +} +``` + ### Example with Rubric Scores Use rubric scores when you want categorical labels with explicit descriptions: -{% raw %} + ```python from nemo_evaluator_sdk import JSONScoreParser, Model, RubricScore, LLMJudgeMetric @@ -256,13 +260,13 @@ result = evaluator.run( print(result.aggregate_scores.model_dump(exclude_none=True)) ``` -{% endraw %} + ### Custom Aggregate Fields By default, aggregate scores include `count`, `mean`, `min`, and `max`. Request additional statistics with `aggregate_fields`: -{% raw %} + ```python result = evaluator.run( @@ -282,7 +286,7 @@ for score in result.aggregate_scores.scores: print(f" p50: {score.percentiles.p50:.3f}") print(f" p90: {score.percentiles.p90:.3f}") ``` -{% endraw %} + --- @@ -290,7 +294,7 @@ for score in result.aggregate_scores.scores: For production workloads, submit the same metric and dataset as a durable platform job. The SDK returns a job resource that can wait for completion and download the final `EvaluationResult`. -{% raw %} + ```python from nemo_evaluator_sdk import RunConfig, JSONScoreParser, Model, RubricScore, LLMJudgeMetric @@ -350,7 +354,7 @@ result = job.get_result() for score in result.aggregate_scores.scores: print(f"{score.name}: mean={score.mean}, count={score.count}") ``` -{% endraw %} + --- @@ -411,8 +415,9 @@ Use rubric scores for categorical evaluations with explicit criteria: } ``` -!!! tip - Rubric scores use [structured outputs](https://docs.nvidia.com/nim/large-language-models/latest/structured-generation.html) by default, which constrains the judge model to output valid JSON. This significantly reduces parsing errors. + +Rubric scores use [structured outputs](https://docs.nvidia.com/nim/large-language-models/latest/structured-generation.html) by default, which constrains the judge model to output valid JSON. This significantly reduces parsing errors. + ### Score Parsers @@ -436,19 +441,20 @@ By default, the JSON parser is used for range and rubric scores, with the score "parser": {"type": "regex", "pattern": "SCORE: (\\d+)", "method": "search"} ``` -!!! tip - **Regex method options:** + +**Regex method options:** - - `match` (default): Matches the pattern only at the **beginning** of the text. Use when your prompt instructs the judge to output the score first. - - `search`: Finds the pattern **anywhere** in the text. Uses the first match of the regex found in the judge output. +- `match` (default): Matches the pattern only at the **beginning** of the text. Use when your prompt instructs the judge to output the score first. +- `search`: Finds the pattern **anywhere** in the text. Uses the first match of the regex found in the judge output. - For example, with `method: "search"` and pattern `SCORE: (\d+)`, the parser can extract the score from: +For example, with `method: "search"` and pattern `SCORE: (\d+)`, the parser can extract the score from: - ```text - The response is accurate and well-written. SCORE: 5 - ``` +```text +The response is accurate and well-written. SCORE: 5 +``` - This would fail with the default `match` method since "SCORE:" is not at the beginning. If multiple matches exist, `search` returns the first occurrence. +This would fail with the default `match` method since "SCORE:" is not at the beginning. If multiple matches exist, `search` returns the first occurrence. + --- @@ -463,18 +469,18 @@ Customize the judge prompt to match your evaluation criteria. Use Jinja2 templat | `{%raw%}{{input}}{%endraw%}` | Input field from dataset row | | `{%raw%}{{output}}{%endraw%}` | Output field from dataset row | | `{%raw%}{{context}}{%endraw%}`, `{%raw%}{{reference}}{%endraw%}`, `{%raw%}{{messages}}{%endraw%}`, `{%raw%}{{tool_calls}}{%endraw%}`, `{%raw%}{{tools}}{%endraw%}` | Other canonical evaluator fields | -| `item.` | Any field from the dataset row | +| `item.<field>` | Any field from the dataset row | | `sample.output_text` | Model-generated response (when evaluating a model) | -| `scores` | Dictionary of score definitions (typically used in expressions/loops, for example `{% raw %}{{ scores.keys() | join(", ") }}{% endraw %}`) | +| `scores` | Dictionary of score definitions (typically used in expressions/loops, for example `{{scores.keys() | join(", ")}}`) | ### Canonical vs Legacy Prompt Variables LLM judge prompt variables define the fields required from the evaluation context: -{% raw %} + - Prefer canonical evaluator variables such as `{{input}}`, `{{output}}`, `{{context}}`, and `{{reference}}` for reusable metrics. -- Raw dataset variables such as `{{item.question}}`, `{{item.response}}`, `{{question}}`, or `{{sample.output_text}}` continue to work for backward compatibility. -{% endraw %} +- Raw dataset variables such as `{{item.question}}`, `{{item.response}}`, `{{question}}`, or `{{sample.output_text}}` continue to work for backward compatibility. + When your dataset uses different field names, keep the metric prompt stable and map dataset columns at job or benchmark submission time with `field_mapping`: @@ -505,14 +511,14 @@ With the mapping above, a dataset row like this: renders the prompt template variables as: -{% raw %} -- `{{input}}` -> `question` -> `"What is the capital of France?"` -- `{{output}}` -> `response` -> `"Paris"` + +- `{{input}}` -> `question` -> `"What is the capital of France?"` +- `{{output}}` -> `response` -> `"Paris"` Custom prompt variables are also allowed. For example, `{{input}} {{output}} {{custom_value}}` produces a required schema with all three fields, and `field_mapping.custom.custom_value` can bind that prompt variable to a dataset column when needed. When no `field_mapping` is provided, prompt variable names are matched directly against dataset columns. That means a prompt using `{{question}}` and `{{response}}` expects dataset rows with `question` and `response` fields unless you remap them explicitly. -{% endraw %} + If a prompt field should be available when present but not required in every row, add it to `optional_fields` on the metric. This is useful for prompts that can use `reference` when available but should still validate against datasets that only provide `input` and `output`. @@ -537,7 +543,7 @@ metric = { ### Schema-Aware Validation -{{nem_short_name}} derives the required prompt fields directly from the prompt variables used by the metric and validates them against dataset metadata during benchmark and job creation. +NeMo Evaluator derives the required prompt fields directly from the prompt variables used by the metric and validates them against dataset metadata during benchmark and job creation. - Add fileset metadata `dataset.schema` for a default row schema. - Add `dataset.schemas_by_path` when different files in the same fileset have different row shapes. @@ -551,7 +557,7 @@ Benchmark-level `field_mapping` is shared by every metric in that benchmark. If ### Example: Custom Judge Template ```python -{% raw %} + JUDGE_TEMPLATE = """You are an expert evaluator assessing AI assistant responses. Evaluate the response on these criteria: @@ -591,7 +597,7 @@ metric = { ] } } -{% endraw %} + ``` --- @@ -600,7 +606,7 @@ metric = { If your judge model endpoint requires an API key, store it as a secret. The secret is automatically resolved from the same workspace as your evaluation. -For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](model-configuration.md#model-api-authentication). +For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). ### Create a Secret @@ -631,7 +637,7 @@ metric = { Control judge model behavior with inference parameters: ```python -{% raw %} + "prompt_template": { "messages": [...], "temperature": 0.1, # Lower for more consistent scoring @@ -639,10 +645,12 @@ Control judge model behavior with inference parameters: "timeout": 30, # Request timeout in seconds "stop": ["<{{ end_of_text }}>"] # Stop sequences } -{% endraw %} + ``` -!!! note "The default value for `max_tokens` for judge models is set to `1024`. It is highly recommended to set an appropriate value for your judge model based on the expected outputs (for example, `structured_output` is used by default to format model output, ensure your `max_tokens` is set to accommodate the full JSON output). Incomplete JSON outputs will cause parsing errors and result in NaN score values." + +The default value for `max_tokens` for judge models is set to `1024`. It is highly recommended to set an appropriate value for your judge model based on the expected outputs (for example, `structured_output` is used by default to format model output, ensure your `max_tokens` is set to accommodate the full JSON output). Incomplete JSON outputs will cause parsing errors and result in NaN score values. + ### Reasoning Model Configuration @@ -679,9 +687,9 @@ metric = { 4. **Live Evaluation Limits**: Live evaluations are limited to 10 rows. Use job-based evaluation for larger datasets. -!!! info - - - [Model Configuration](model-configuration.md) - Inline models vs model references - - [Evaluation Results](results.md) - Understanding and downloading results - - [Agentic Evaluation](agentic.md) - Evaluate agent workflows - - [RAG Evaluation](rag.md) - Evaluate retrieval-augmented generation + +- [Model Configuration](/evaluation/metrics/model-configuration) - Inline models vs model references +- [Evaluation Results](/evaluation/metrics/metric-results) - Understanding and downloading results +- [Agentic Evaluation](/evaluation/metrics/agentic-metrics) - Evaluate agent workflows +- [RAG Evaluation](/evaluation/metrics/rag-metrics) - Evaluate retrieval-augmented generation + diff --git a/docs/evaluator/metrics/manage-metrics.mdx b/docs/evaluator/metrics/manage-metrics.mdx index a0c88e7e88..bc9e1de744 100644 --- a/docs/evaluator/metrics/manage-metrics.mdx +++ b/docs/evaluator/metrics/manage-metrics.mdx @@ -1,3 +1,7 @@ +--- +title: "Manage Metrics" +description: "" +--- # Manage Metrics @@ -23,7 +27,7 @@ evaluator: Evaluator = client.evaluator # this object is an Evaluator resource Metric objects are normal Python objects from `nemo_evaluator_sdk.metrics.*`. Keep them close to the evaluation code so the definition, dataset fields, and execution request stay in sync. -{% raw %} + ```python from nemo_evaluator_sdk import ExactMatchMetric @@ -43,7 +47,7 @@ result = evaluator.run( for score in result.aggregate_scores.scores: print(f"{score.name}: mean={score.mean}") ``` -{% endraw %} + Use `run` for fast local execution while developing a metric. Use `submit` for durable remote execution through the platform job service. @@ -51,7 +55,7 @@ Use `run` for fast local execution while developing a metric. Use `submit` for d Because metrics are inline objects, reuse is usually just a Python helper function or module-level factory. -{% raw %} + ```python from nemo_evaluator_sdk import F1Metric @@ -65,7 +69,7 @@ def answer_f1_metric() -> F1Metric: metric = answer_f1_metric() ``` -{% endraw %} + ## Choose Metric Classes @@ -88,11 +92,11 @@ from nemo_evaluator_sdk import RunConfig config = RunConfig(parallelism=4, limit_samples=100) ``` -For online evaluations, provide a model or agent target and use the online parameter classes described in [Model Configuration](model-configuration.md) and [Agent Configuration](agent-configuration.md). +For online evaluations, provide a model or agent target and use the online parameter classes described in [Model Configuration](/evaluation/metrics/model-configuration) and [Agent Configuration](/evaluation/metrics/agent-configuration). ## Submit a Durable Job -{% raw %} + ```python from nemo_evaluator_sdk import RunConfig, ExactMatchMetric @@ -110,10 +114,10 @@ job = evaluator.submit( job.wait_until_done() result = job.get_result() ``` -{% endraw %} + ## Related Topics -- [Metric Results](results.md) - Work with `EvaluationResult`, aggregate scores, and row scores -- [Manage Metric Jobs](job-management.md) - Submit, monitor, reconnect to, and download job results -- [Similarity Metrics](similarity.md) - Configure exact match, F1, BLEU, ROUGE, and string/number checks +- [Metric Results](/evaluation/metrics/metric-results) - Work with `EvaluationResult`, aggregate scores, and row scores +- [Manage Metric Jobs](/evaluation/metrics/job-management) - Submit, monitor, reconnect to, and download job results +- [Similarity Metrics](/evaluation/metrics/similarity-metrics) - Configure exact match, F1, BLEU, ROUGE, and string/number checks diff --git a/docs/evaluator/metrics/model-configuration.mdx b/docs/evaluator/metrics/model-configuration.mdx index efb1444d1c..be228b7b38 100644 --- a/docs/evaluator/metrics/model-configuration.mdx +++ b/docs/evaluator/metrics/model-configuration.mdx @@ -1,3 +1,7 @@ +--- +title: "Model Configuration" +description: "" +--- # Model Configuration @@ -65,7 +69,7 @@ client.secrets.create( Use `target=model` when the evaluator should call the model to generate the sample output before scoring. -{% raw %} + ```python from nemo_evaluator_sdk import ( @@ -99,13 +103,13 @@ result = evaluator.run( ) ``` -{% endraw %} + ## Model on a Judge Metric Use a model field on the metric when the metric itself calls an LLM to score existing outputs. -{% raw %} + ```python from nemo_evaluator_sdk import Model, RangeScore, LLMJudgeMetric @@ -152,7 +156,7 @@ result = evaluator.run( ) ``` -{% endraw %} + ## Runtime Parameters @@ -182,5 +186,6 @@ Use plain `RunConfig` for offline evaluations where the dataset already contains The plugin SDK examples on this page use inline `Model` objects. If your deployment resolves platform model entities into model endpoint details, perform that lookup before constructing the `Model`, then pass the resulting inline model to the metric or request. -!!! info - For evaluating agentic systems, use an `Agent` request target instead of a `Model`. See [Agent Configuration](agent-configuration.md). + +For evaluating agentic systems, use an `Agent` request target instead of a `Model`. See [Agent Configuration](/evaluation/metrics/agent-configuration). + diff --git a/docs/evaluator/metrics/rag.mdx b/docs/evaluator/metrics/rag.mdx index 1691c911e7..c336018ddb 100644 --- a/docs/evaluator/metrics/rag.mdx +++ b/docs/evaluator/metrics/rag.mdx @@ -1,3 +1,7 @@ +--- +title: "RAG Metrics" +description: "" +--- # RAG Evaluation Metrics @@ -75,7 +79,7 @@ client.secrets.create( ) ``` -Reference secrets by name in your model configuration. For local `run` versus remote `submit` behavior, see [Model API Authentication](model-configuration.md#model-api-authentication). +Reference secrets by name in your model configuration. For local `run` versus remote `submit` behavior, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). ```python judge_model = Model( @@ -85,8 +89,10 @@ judge_model = Model( ) ``` -!!! tip "RAGAS metrics accept inline model definitions for `judge_model` and, where required, `embeddings_model`." - See [Model Configuration](model-configuration.md) for details. + +RAGAS metrics accept inline model definitions for `judge_model` and, where required, `embeddings_model`. +See [Model Configuration](/evaluation/metrics/model-configuration) for details. + --- @@ -113,7 +119,7 @@ judge_model = Model( The metric examples below use these inline values: -For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](model-configuration.md#model-api-authentication). +For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). ```python judge_model = Model( @@ -150,7 +156,7 @@ offline_rows = [ Use online arguments when the evaluator should generate the response first: -{% raw %} + ```python online_dataset = [ @@ -173,7 +179,7 @@ online_config = RunConfigOnlineModel( inference=InferenceParams(temperature=0.2, max_tokens=1024), ) ``` -{% endraw %} + --- @@ -196,50 +202,49 @@ Measures the fraction of relevant content retrieved compared to the total releva } ``` -=== "Local Evaluation" - - ```python - metric = ContextRecallMetric(judge_model=judge_model) - - result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" - - ```python - metric = ContextRecallMetric(judge_model=judge_model) + + +```python +metric = ContextRecallMetric(judge_model=judge_model) - job = evaluator.submit(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - job.wait_until_done() - result = job.get_result() - ``` +result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) -=== "Result" +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ContextRecallMetric(judge_model=judge_model) - ```json +job = evaluator.submit(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) +job.wait_until_done() +result = job.get_result() +``` + + +```json +{ + "scores": [ { - "scores": [ - { - "count": 1, - "histogram": {}, - "name": "context_recall", - "nan_count": 0, - "max": 1.0, - "mean": 1.0, - "min": 1.0, - "percentiles": {}, - "score_type": "range", - "std_dev": 0.0, - "sum": 1.0, - "variance": 0.0 - } - ] + "count": 1, + "histogram": {}, + "name": "context_recall", + "nan_count": 0, + "max": 1.0, + "mean": 1.0, + "min": 1.0, + "percentiles": {}, + "score_type": "range", + "std_dev": 0.0, + "sum": 1.0, + "variance": 0.0 } - ``` - + ] +} +``` + + --- ## Context Precision @@ -261,45 +266,44 @@ Measures the proportion of relevant chunks in the retrieved contexts (precision@ } ``` -=== "Local Evaluation" - - ```python - metric = ContextPrecisionMetric(judge_model=judge_model) - - result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" - - ```python - metric = ContextPrecisionMetric(judge_model=judge_model) + + +```python +metric = ContextPrecisionMetric(judge_model=judge_model) - job = evaluator.submit(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - job.wait_until_done() - result = job.get_result() - ``` +result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) -=== "Result" +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ContextPrecisionMetric(judge_model=judge_model) - ```json +job = evaluator.submit(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) +job.wait_until_done() +result = job.get_result() +``` + + +```json +{ + "scores": [ { - "scores": [ - { - "count": 1, - "histogram": {}, - "name": "context_precision", - "nan_count": 0, - "max": 1.0, - "mean": 1.0, - "min": 1.0 - } - ] + "count": 1, + "histogram": {}, + "name": "context_precision", + "nan_count": 0, + "max": 1.0, + "mean": 1.0, + "min": 1.0 } - ``` - + ] +} +``` + + --- ## Context Relevance @@ -320,45 +324,45 @@ Measures how relevant the retrieved contexts are to the user input. } ``` -=== "Local Evaluation" - - ```python - metric = ContextRelevanceMetric(judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "retrieved_contexts": ["Paris is the capital and largest city of France."], - } - ], - config=RunConfig(parallelism=8), - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" - - ```python - metric = ContextRelevanceMetric(judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "retrieved_contexts": ["Paris is the capital and largest city of France."], - } - ], - config=RunConfig(parallelism=8), - ) - job.wait_until_done() - result = job.get_result() - ``` + + +```python +metric = ContextRelevanceMetric(judge_model=judge_model) +result = evaluator.run( + metric=metric, + dataset=[ + { + "user_input": "What is the capital of France?", + "retrieved_contexts": ["Paris is the capital and largest city of France."], + } + ], + config=RunConfig(parallelism=8), +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ContextRelevanceMetric(judge_model=judge_model) + +job = evaluator.submit( + metric=metric, + dataset=[ + { + "user_input": "What is the capital of France?", + "retrieved_contexts": ["Paris is the capital and largest city of France."], + } + ], + config=RunConfig(parallelism=8), +) +job.wait_until_done() +result = job.get_result() +``` + + --- ## Context Entity Recall @@ -379,45 +383,45 @@ Measures how many important entities from the reference are present in the retri } ``` -=== "Local Evaluation" - - ```python - metric = ContextEntityRecallMetric(judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "retrieved_contexts": ["Paris is the capital and largest city of France."], - "reference": "Paris is the capital of France.", - } - ], - config=RunConfig(parallelism=8), - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" - - ```python - metric = ContextEntityRecallMetric(judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "retrieved_contexts": ["Paris is the capital and largest city of France."], - "reference": "Paris is the capital of France.", - } - ], - config=RunConfig(parallelism=8), - ) - job.wait_until_done() - result = job.get_result() - ``` + + +```python +metric = ContextEntityRecallMetric(judge_model=judge_model) + +result = evaluator.run( + metric=metric, + dataset=[ + { + "retrieved_contexts": ["Paris is the capital and largest city of France."], + "reference": "Paris is the capital of France.", + } + ], + config=RunConfig(parallelism=8), +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ContextEntityRecallMetric(judge_model=judge_model) +job = evaluator.submit( + metric=metric, + dataset=[ + { + "retrieved_contexts": ["Paris is the capital and largest city of France."], + "reference": "Paris is the capital of France.", + } + ], + config=RunConfig(parallelism=8), +) +job.wait_until_done() +result = job.get_result() +``` + + --- ## Faithfulness @@ -439,50 +443,49 @@ Measures factual consistency of the response with the retrieved context. } ``` -=== "Local Evaluation" - - ```python - metric = FaithfulnessMetric(judge_model=judge_model) - - result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Online Evaluation" - - ```python - metric = FaithfulnessMetric(judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=online_dataset, - config=online_config, - target=generation_model, - prompt_template=online_prompt_template, - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" + + +```python +metric = FaithfulnessMetric(judge_model=judge_model) - ```python - metric = FaithfulnessMetric(judge_model=judge_model) +result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - job = evaluator.submit( - metric=metric, - dataset=online_dataset, - config=online_config, - target=generation_model, - prompt_template=online_prompt_template, - ) - job.wait_until_done() - result = job.get_result() - ``` +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = FaithfulnessMetric(judge_model=judge_model) + +result = evaluator.run( + metric=metric, + dataset=online_dataset, + config=online_config, + target=generation_model, + prompt_template=online_prompt_template, +) +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = FaithfulnessMetric(judge_model=judge_model) + +job = evaluator.submit( + metric=metric, + dataset=online_dataset, + config=online_config, + target=generation_model, + prompt_template=online_prompt_template, +) +job.wait_until_done() +result = job.get_result() +``` + + --- ## Response Groundedness @@ -503,50 +506,49 @@ Evaluates whether the response is grounded in the retrieved context without hall } ``` -=== "Local Evaluation" - - ```python - metric = ResponseGroundednessMetric(judge_model=judge_model) - - result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Online Evaluation" - - ```python - metric = ResponseGroundednessMetric(judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=online_dataset, - config=online_config, - target=generation_model, - prompt_template=online_prompt_template, - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" + + +```python +metric = ResponseGroundednessMetric(judge_model=judge_model) - ```python - metric = ResponseGroundednessMetric(judge_model=judge_model) +result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - job = evaluator.submit( - metric=metric, - dataset=online_dataset, - config=online_config, - target=generation_model, - prompt_template=online_prompt_template, - ) - job.wait_until_done() - result = job.get_result() - ``` +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ResponseGroundednessMetric(judge_model=judge_model) + +result = evaluator.run( + metric=metric, + dataset=online_dataset, + config=online_config, + target=generation_model, + prompt_template=online_prompt_template, +) +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ResponseGroundednessMetric(judge_model=judge_model) + +job = evaluator.submit( + metric=metric, + dataset=online_dataset, + config=online_config, + target=generation_model, + prompt_template=online_prompt_template, +) +job.wait_until_done() +result = job.get_result() +``` + + --- ## Noise Sensitivity @@ -570,55 +572,55 @@ Measures robustness when retrieved contexts contain noisy or irrelevant informat } ``` -=== "Local Evaluation" - - ```python - metric = NoiseSensitivityMetric(judge_model=judge_model) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "retrieved_contexts": [ - "Paris is the capital and largest city of France.", - "Berlin is the capital of Germany.", - ], - "response": "The capital of France is Paris.", - "reference": "Paris is the capital of France.", - } - ], - config=RunConfig(parallelism=8), - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" - - ```python - metric = NoiseSensitivityMetric(judge_model=judge_model) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "user_input": "What is the capital of France?", - "retrieved_contexts": [ - "Paris is the capital and largest city of France.", - "Berlin is the capital of Germany.", - ], - "response": "The capital of France is Paris.", - "reference": "Paris is the capital of France.", - } - ], - config=RunConfig(parallelism=8), - ) - job.wait_until_done() - result = job.get_result() - ``` + + +```python +metric = NoiseSensitivityMetric(judge_model=judge_model) + +result = evaluator.run( + metric=metric, + dataset=[ + { + "user_input": "What is the capital of France?", + "retrieved_contexts": [ + "Paris is the capital and largest city of France.", + "Berlin is the capital of Germany.", + ], + "response": "The capital of France is Paris.", + "reference": "Paris is the capital of France.", + } + ], + config=RunConfig(parallelism=8), +) +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = NoiseSensitivityMetric(judge_model=judge_model) + +job = evaluator.submit( + metric=metric, + dataset=[ + { + "user_input": "What is the capital of France?", + "retrieved_contexts": [ + "Paris is the capital and largest city of France.", + "Berlin is the capital of Germany.", + ], + "response": "The capital of France is Paris.", + "reference": "Paris is the capital of France.", + } + ], + config=RunConfig(parallelism=8), +) +job.wait_until_done() +result = job.get_result() +``` + + --- ## Response Relevancy @@ -646,62 +648,61 @@ Measures how relevant a response is to the user input using generated questions |-----------|------|---------|-------------| | `strictness` | int | `1` | Number of parallel questions generated. NIM supports `1`. | -=== "Local Evaluation" - - ```python - metric = ResponseRelevancyMetric( - judge_model=judge_model, - embeddings_model=embeddings_model, - strictness=1, - ) - - result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Online Evaluation" - - ```python - metric = ResponseRelevancyMetric( - judge_model=judge_model, - embeddings_model=embeddings_model, - strictness=1, - ) - - result = evaluator.run( - metric=metric, - dataset=online_dataset, - config=online_config, - target=generation_model, - prompt_template=online_prompt_template, - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - -=== "Remote Job" - - ```python - metric = ResponseRelevancyMetric( - judge_model=judge_model, - embeddings_model=embeddings_model, - strictness=1, - ) - - job = evaluator.submit( - metric=metric, - dataset=online_dataset, - config=online_config, - target=generation_model, - prompt_template=online_prompt_template, - ) - job.wait_until_done() - result = job.get_result() - ``` + + +```python +metric = ResponseRelevancyMetric( + judge_model=judge_model, + embeddings_model=embeddings_model, + strictness=1, +) +result = evaluator.run(metric=metric, dataset=offline_rows, config=RunConfig(parallelism=8)) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ResponseRelevancyMetric( + judge_model=judge_model, + embeddings_model=embeddings_model, + strictness=1, +) + +result = evaluator.run( + metric=metric, + dataset=online_dataset, + config=online_config, + target=generation_model, + prompt_template=online_prompt_template, +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + +```python +metric = ResponseRelevancyMetric( + judge_model=judge_model, + embeddings_model=embeddings_model, + strictness=1, +) + +job = evaluator.submit( + metric=metric, + dataset=online_dataset, + config=online_config, + target=generation_model, + prompt_template=online_prompt_template, +) +job.wait_until_done() +result = job.get_result() +``` + + --- ## Dataset Format @@ -715,8 +716,9 @@ RAGAS metrics use specific column names: | `response` | string | Some metrics | Generated answer. Required for offline response-quality metrics; generated as `sample.output_text` for online model requests. | | `reference` | string | Some metrics | Reference answer or ground truth | -!!! note - Different metrics require different columns. Check the metric documentation for specific requirements. + +Different metrics require different columns. Check the metric documentation for specific requirements. + ### Example Dataset @@ -798,7 +800,7 @@ client.secrets.create(name="judge-api-key", value="") client.secrets.create(name="embedding-api-key", value="") ``` -Reference secrets by name in your metric configuration. For local `run` versus remote `submit` behavior, see [Model API Authentication](model-configuration.md#model-api-authentication). +Reference secrets by name in your metric configuration. For local `run` versus remote `submit` behavior, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). ```python judge_model = Model( @@ -831,13 +833,15 @@ print(f"Saved artifacts under {artifacts_dir}") |-------|-------|----------| | `judge_model` is required | Missing judge LLM config for metric | Add `judge_model` to metric configuration | | `embeddings_model` is required | Using `response_relevancy` without embeddings | Add `embeddings_model` to metric configuration | -| Job stuck in "pending" | Model endpoint not accessible | Verify endpoint URLs and API key secrets. See [Model API Authentication](model-configuration.md#model-api-authentication) | -| Authentication failed | Invalid or missing API key | Check `api_key_secret` for the execution mode. See [Model API Authentication](model-configuration.md#model-api-authentication) | -| `nan_count > 0` and `mean = null` | Judge/model call failures, such as auth, endpoint, quota, or timeout. Some RAGAS metrics are known to return `NaN` instead of raising on these failures. | Inspect row-level `error`; verify API key, endpoint, and model access | +| Job stuck in "pending" | Model endpoint not accessible | Verify endpoint URLs and API key secrets. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication) | +| Authentication failed | Invalid or missing API key | Check `api_key_secret` for the execution mode. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication) | +| `nan_count > 0` and `mean = null` | Judge/model call failures, such as auth, endpoint, quota, or timeout. Some RAGAS metrics are known to return `NaN` instead of raising on these failures. | Inspect row-level `error`; verify API key, endpoint, and model access | | Low faithfulness scores | Context doesn't support the response | Improve retrieval or response generation | -!!! warning "If you see `nan_count > 0` with `mean = null`, first validate judge model authentication." - For some RAGAS metrics, auth failures can be converted to `NaN` scores instead of surfacing as a hard error. + +If you see `nan_count > 0` with `mean = null`, first validate judge model authentication. +For some RAGAS metrics, auth failures can be converted to `NaN` scores instead of surfacing as a hard error. + ### Tips for Better Results @@ -851,7 +855,7 @@ print(f"Saved artifacts under {artifacts_dir}") ## Important Notes -1. **Secret Management**: API keys should be referenced through `api_key_secret`, with different local `run` and remote `submit` behavior. See [Model API Authentication](model-configuration.md#model-api-authentication). Never pass API keys directly in the request. +1. **Secret Management**: API keys should be referenced through `api_key_secret`, with different local `run` and remote `submit` behavior. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). Never pass API keys directly in the request. 2. **Column Names**: RAGAS metrics use specific column names: - `user_input` (not `question`) - `response` (not `answer`) @@ -865,7 +869,7 @@ print(f"Saved artifacts under {artifacts_dir}") 2. **Dataset Format**: RAGAS metrics use specific column names (`user_input`, `retrieved_contexts`, `response`, `reference`). Ensure your data matches this structure. -!!! info - - - [LLM-as-a-Judge](llm-as-a-judge.md) - Custom judge-based evaluation - - [Agentic Metrics](agentic.md) - Evaluate agent workflows + +- [LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge) - Custom judge-based evaluation +- [Agentic Metrics](/evaluation/metrics/agentic-metrics) - Evaluate agent workflows + diff --git a/docs/evaluator/metrics/remote.mdx b/docs/evaluator/metrics/remote.mdx index bac8007710..af4fa161d0 100644 --- a/docs/evaluator/metrics/remote.mdx +++ b/docs/evaluator/metrics/remote.mdx @@ -1,7 +1,11 @@ +--- +title: "Bring Your Own Metric" +description: "" +--- # Bring Your Own Metric -{{platform_name}} offers [built-in metrics](index.md) that can be configured to evaluate on your custom data. Remote metrics let you bring your own metric logic into the {{platform_name}} evaluation workflow by serving that logic from a REST API. +NeMo Platform offers [built-in metrics](/evaluation/metrics/overview) that can be configured to evaluate on your custom data. Remote metrics let you bring your own metric logic into the NeMo Platform evaluation workflow by serving that logic from a REST API. A remote metric gives you control over the evaluation logic, request payload, and reported scores while the Evaluator plugin SDK handles dataset iteration, result aggregation, retries, and job execution. @@ -14,7 +18,7 @@ Remote metrics support two types: | **Generic Remote** (`remote`) | Custom endpoints with configurable request body and score extraction | User-defined Jinja template | | **NeMo Agent Toolkit Remote** (`nemo-agent-toolkit-remote`) | NeMo Agent Toolkit evaluator endpoints | Fixed: `{evaluator_name, item}` | -{{nem_short_name}} supports two execution modes through the Evaluator plugin SDK: +NeMo Evaluator supports two execution modes through the Evaluator plugin SDK: | Mode | Use Case | SDK Call | |------|----------|----------| @@ -58,7 +62,7 @@ Local execution provides immediate results for rapid iteration when developing a Use a generic remote metric when you need full control over the request payload and score extraction: -{% raw %} + ```python from nemo_evaluator_sdk import JSONScoreParser, RemoteScore, RemoteMetric @@ -93,10 +97,10 @@ result = evaluator.run( for score in result.aggregate_scores.scores: print(f"{score.name}: mean={score.mean}, count={score.count}") ``` -{% endraw %} + **Key configuration:** -- `body`: Jinja template for the request payload. Use `{% raw %}{{ item. }}{% endraw %}` to access dataset columns. +- `body`: Jinja template for the request payload. Use `{{item.<column>}}` to access dataset columns. - `scores`: List of score definitions with a `parser` object containing [JSONPath](https://datatracker.ietf.org/doc/html/rfc9535) expression for extracting values from the response. ### NeMo Agent Toolkit Remote Metric @@ -135,7 +139,7 @@ for score in result.aggregate_scores.scores: The NAT metric automatically: -- Sends payload: `{"evaluator_name": "", "item": }`. +- Sends payload: `{"evaluator_name": "<name>", "item": <row_data>}`. - Extracts the score from: `$.result.score`. --- @@ -144,84 +148,84 @@ The NAT metric automatically: For production workloads, submit the same metric and dataset as a durable platform job. The returned job resource can wait for completion and download the final `EvaluationResult`. - - -=== "Generic Remote Metric" - - {% raw %} - ```python - from nemo_evaluator_sdk import RunConfig, JSONScoreParser, RemoteScore, RemoteMetric - - metric = RemoteMetric( - url="https://my-evaluation-server.test/evaluate", - body={ - "reference": "{{item.reference}}", - "response": "{{item.output}}", - }, - scores=[ - RemoteScore( - name="accuracy", - parser=JSONScoreParser(json_path="$.result.accuracy"), - minimum=0.0, - maximum=1.0, - ) - ], - timeout_seconds=30.0, - max_retries=3, - ) - - - job = evaluator.submit( - metric=metric, - dataset=[ - {"reference": "Paris", "output": "Paris"}, - {"reference": "2", "output": "2"}, - ], - config=RunConfig(parallelism=8), - ) - print("Submitted job:", job.name) - - job.wait_until_done() - result = job.get_result() - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}, count={score.count}") - ``` - {% endraw %} - -=== "NAT Remote Metric" - - ```python - from nemo_evaluator_sdk import RunConfig, NemoAgentToolkitRemoteMetric - - metric = NemoAgentToolkitRemoteMetric( - url="http://localhost:8001/evaluate_item", - evaluator_name="similarity_eval", - timeout_seconds=30.0, - max_retries=3, - ) - - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "id": "item_1", - "input_obj": "What is the capital of France?", - "expected_output_obj": "The capital of France is Paris.", - "output_obj": "Paris is the capital.", - "trajectory": [], - "expected_trajectory": [], - "full_dataset_entry": {}, - } - ], - config=RunConfig(parallelism=4), - ) - job.wait_until_done() - result = job.get_result() - ``` - - +{/* markdownlint-disable MD046 */} + + + + +```python +from nemo_evaluator_sdk import RunConfig, JSONScoreParser, RemoteScore, RemoteMetric + +metric = RemoteMetric( + url="https://my-evaluation-server.test/evaluate", + body={ + "reference": "{{item.reference}}", + "response": "{{item.output}}", + }, + scores=[ + RemoteScore( + name="accuracy", + parser=JSONScoreParser(json_path="$.result.accuracy"), + minimum=0.0, + maximum=1.0, + ) + ], + timeout_seconds=30.0, + max_retries=3, +) + + +job = evaluator.submit( + metric=metric, + dataset=[ + {"reference": "Paris", "output": "Paris"}, + {"reference": "2", "output": "2"}, + ], + config=RunConfig(parallelism=8), +) +print("Submitted job:", job.name) + +job.wait_until_done() +result = job.get_result() + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}, count={score.count}") +``` + + + +```python +from nemo_evaluator_sdk import RunConfig, NemoAgentToolkitRemoteMetric + +metric = NemoAgentToolkitRemoteMetric( + url="http://localhost:8001/evaluate_item", + evaluator_name="similarity_eval", + timeout_seconds=30.0, + max_retries=3, +) + + +job = evaluator.submit( + metric=metric, + dataset=[ + { + "id": "item_1", + "input_obj": "What is the capital of France?", + "expected_output_obj": "The capital of France is Paris.", + "output_obj": "Paris is the capital.", + "trajectory": [], + "expected_trajectory": [], + "full_dataset_entry": {}, + } + ], + config=RunConfig(parallelism=4), +) +job.wait_until_done() +result = job.get_result() +``` + + +{/* markdownlint-enable MD046 */} --- @@ -229,9 +233,9 @@ For production workloads, submit the same metric and dataset as a durable platfo If your remote endpoint requires authentication, store the API key as a platform secret and reference it from your metric: -For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](model-configuration.md#model-api-authentication). +For local `run` versus remote `submit` behavior of `api_key_secret`, see [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication). + -{% raw %} ```python from nemo_evaluator_sdk import JSONScoreParser, RemoteScore, SecretRef, RemoteMetric @@ -244,9 +248,9 @@ metric = RemoteMetric( result = evaluator.run(metric=metric, dataset=[{"input": "test"}]) ``` -{% endraw %} -The API key is sent in the `Authorization: Bearer ` header. For local execution, the SDK resolves the key according to the local `api_key_secret` behavior. For durable remote jobs, the job runtime receives the secret securely. + +The API key is sent in the `Authorization: Bearer <key>` header. For local execution, the SDK resolves the key according to the local `api_key_secret` behavior. For durable remote jobs, the job runtime receives the secret securely. --- @@ -330,7 +334,7 @@ And must return: | `body` | dict | (Generic only) Jinja template for request payload | | `scores` | list | (Generic only) List of score configuration objects (refer to Score Configuration section) | | `evaluator_name` | string | (NAT only) Name of the NAT evaluator | -| `api_key_secret` | `SecretRef` | Optional API key reference. See [Model API Authentication](model-configuration.md#model-api-authentication) | +| `api_key_secret` | `SecretRef` | Optional API key reference. See [Model API Authentication](/evaluation/metrics/model-configuration#model-api-authentication) | | `timeout_seconds` | float | Request timeout (default: 30.0) | | `max_retries` | int | Max retry attempts (default: 3) | @@ -377,7 +381,8 @@ RemoteScore( 3. **Live evaluation limits**: Live evaluations are limited to 10 rows. Use job-based evaluation for larger datasets. -!!! info - - [Evaluation Results](results.md) - Understanding and downloading results - - [LLM-as-a-Judge](llm-as-a-judge.md) - Use an LLM to evaluate outputs - - [Agentic Evaluation](agentic.md) - Evaluate agent workflows + +- [Evaluation Results](/evaluation/metrics/metric-results) - Understanding and downloading results +- [LLM-as-a-Judge](/evaluation/metrics/llm-as-a-judge) - Use an LLM to evaluate outputs +- [Agentic Evaluation](/evaluation/metrics/agentic-metrics) - Evaluate agent workflows + diff --git a/docs/evaluator/metrics/results.mdx b/docs/evaluator/metrics/results.mdx index 9732170bfc..94eba18157 100644 --- a/docs/evaluator/metrics/results.mdx +++ b/docs/evaluator/metrics/results.mdx @@ -1,3 +1,7 @@ +--- +title: "Metric Results" +description: "" +--- # Metric Results @@ -12,7 +16,7 @@ An `EvaluationResult` contains: ## Get Results from a Local Run -{% raw %} + ```python import os @@ -36,11 +40,11 @@ result = evaluator.run( ], ) ``` -{% endraw %} + ## Get Results from a Submitted Job -{% raw %} + ```python from nemo_evaluator_sdk import RunConfig, ExactMatchMetric @@ -58,7 +62,7 @@ job = evaluator.submit( job.wait_until_done() ``` -{% endraw %} + ### Get Results in Memory diff --git a/docs/evaluator/metrics/similarity.mdx b/docs/evaluator/metrics/similarity.mdx index ca88c25c46..d0e0a960e0 100644 --- a/docs/evaluator/metrics/similarity.mdx +++ b/docs/evaluator/metrics/similarity.mdx @@ -1,11 +1,15 @@ +--- +title: "Similarity Metrics" +description: "" +--- # Similarity Metrics -{{platform_name}} offers [built-in metrics](index.md) that can be configured to evaluate on your custom data. Similarity metrics compare generated or precomputed text against references, labels, or numeric/string expectations. They support Jinja templates so you can map your dataset columns to the values each metric evaluates. +NeMo Platform offers [built-in metrics](/evaluation/metrics/overview) that can be configured to evaluate on your custom data. Similarity metrics compare generated or precomputed text against references, labels, or numeric/string expectations. They support Jinja templates so you can map your dataset columns to the values each metric evaluates. Template functionality provides maximum flexibility for evaluating your models on proprietary, domain-specific, or novel tasks. You can bring your own datasets, define your own prompts and templates using [Jinja](https://github.com/pallets/jinja), and select the metrics that matter most for your use case. This approach is ideal when: -- You want to evaluate on tasks, data, or formats not covered by [industry benchmarks](../benchmarks/industry.md) or built-in metrics. +- You want to evaluate on tasks, data, or formats not covered by [industry benchmarks](/evaluation/benchmarks/industry-benchmarks) or built-in metrics. - You need to measure model performance using custom or business-specific criteria. - You want to experiment with new evaluation methodologies, metrics, or workflows. - You need to create custom prompts and templates for specific use cases. @@ -37,13 +41,13 @@ result = job.get_result() All similarity metrics support Jinja templating with these variables: -- `{%raw%}{{item}}{%endraw%}` - Access dataset columns (e.g., `{%raw%}{{item.question}}{%endraw%}`, `{%raw%}{{item.answer}}{%endraw%}`) -- `{%raw%}{{sample.output_text}}{%endraw%}` - The model's generated output for online runs +- `{%raw%}{{item}}{%endraw%}` - Access dataset columns (e.g., `{%raw%}{{item.question}}{%endraw%}`, `{%raw%}{{item.answer}}{%endraw%}`) +- `{%raw%}{{sample.output_text}}{%endraw%}` - The model's generated output for online runs - Jinja filters: `lower`, `upper`, `trim`, `replace`, etc. Use Jinja filters to normalize text before comparison: -{% raw %} + ```python from nemo_evaluator_sdk import ExactMatchMetric @@ -52,7 +56,7 @@ metric = ExactMatchMetric( candidate="{{item.output | lower | trim}}", ) ``` -{% endraw %} + ## BLEU Metric @@ -65,92 +69,92 @@ BLEU (Bilingual Evaluation Understudy) measures the similarity between machine-g **Metric Output:** A score between 0 and 100, where 100 indicates perfect match with references. -=== "Local Evaluation" - - {% raw %} - ```python - from nemo_evaluator_sdk import BLEUMetric - - metric = BLEUMetric( - references=["{{item.reference_1}}", "{{item.reference_2}}"], - candidate="{{item.model_output}}", - ) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "reference_1": "The cat sits on the mat.", - "reference_2": "A cat is sitting on the mat.", - "model_output": "The cat is on the mat.", - }, - { - "reference_1": "Hello world!", - "reference_2": "Hi world!", - "model_output": "Hello world!", - }, - ], - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - {% endraw %} - -=== "Remote Job" - - {% raw %} - ```python - from nemo_evaluator_sdk import BLEUMetric - - metric = BLEUMetric( - references=["{{item.reference_1}}", "{{item.reference_2}}"], - candidate="{{item.model_output}}", - description="BLEU score for translation quality", - ) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "reference_1": "The cat sits on the mat.", - "reference_2": "A cat is sitting on the mat.", - "model_output": "The cat is on the mat.", - }, - { - "reference_1": "Hello world!", - "reference_2": "Hi world!", - "model_output": "Hello world!", - }, - ], - ) - job.wait_until_done() - result = job.get_result() - ``` - {% endraw %} - -=== "Example Result" - {% raw %} - ```json - { - "scores": [ + + + +```python +from nemo_evaluator_sdk import BLEUMetric + +metric = BLEUMetric( + references=["{{item.reference_1}}", "{{item.reference_2}}"], + candidate="{{item.model_output}}", +) + +result = evaluator.run( + metric=metric, + dataset=[ + { + "reference_1": "The cat sits on the mat.", + "reference_2": "A cat is sitting on the mat.", + "model_output": "The cat is on the mat.", + }, { - "name": "sentence", - "count": 2, - "mean": 76.86, - "min": 53.73, - "max": 100.0 + "reference_1": "Hello world!", + "reference_2": "Hi world!", + "model_output": "Hello world!", }, + ], +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + + + +```python +from nemo_evaluator_sdk import BLEUMetric + +metric = BLEUMetric( + references=["{{item.reference_1}}", "{{item.reference_2}}"], + candidate="{{item.model_output}}", + description="BLEU score for translation quality", +) + +job = evaluator.submit( + metric=metric, + dataset=[ { - "name": "corpus", - "count": 1, - "mean": 53.895 - } - ] + "reference_1": "The cat sits on the mat.", + "reference_2": "A cat is sitting on the mat.", + "model_output": "The cat is on the mat.", + }, + { + "reference_1": "Hello world!", + "reference_2": "Hi world!", + "model_output": "Hello world!", + }, + ], +) +job.wait_until_done() +result = job.get_result() +``` + + + + +```json +{ + "scores": [ + { + "name": "sentence", + "count": 2, + "mean": 76.86, + "min": 53.73, + "max": 100.0 + }, + { + "name": "corpus", + "count": 1, + "mean": 53.895 } - ``` - {% endraw %} + ] +} +``` + + ## Exact Match Metric Exact Match compares the candidate text with the reference text for perfect equality. This metric returns 1 if the strings match exactly and 0 otherwise. @@ -162,72 +166,71 @@ Exact Match compares the candidate text with the reference text for perfect equa **Metric Output:** Binary score (0 or 1). -=== "Local Evaluation" - - {% raw %} - ```python - from nemo_evaluator_sdk import ExactMatchMetric - - metric = ExactMatchMetric( - reference="{{item.correct_answer | lower | trim}}", - candidate="{{item.model_answer | lower | trim}}", - description="Exact match for question answering", - ) - - result = evaluator.run( - metric=metric, - dataset=[ - {"correct_answer": "Paris", "model_answer": "Paris"}, - {"correct_answer": "London", "model_answer": "london "}, - {"correct_answer": "Berlin", "model_answer": "Munich"}, - ], - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - {% endraw %} - -=== "Remote Job" - - {% raw %} - ```python - from nemo_evaluator_sdk import ExactMatchMetric - - metric = ExactMatchMetric( - reference="{{item.correct_answer | lower | trim}}", - candidate="{{item.model_answer | lower | trim}}", - ) - - job = evaluator.submit( - metric=metric, - dataset=[ - {"correct_answer": "Paris", "model_answer": "Paris"}, - {"correct_answer": "London", "model_answer": "london "}, - {"correct_answer": "Berlin", "model_answer": "Munich"}, - ], - ) - job.wait_until_done() - result = job.get_result() - ``` - {% endraw %} - -=== "Example Result" - - ```json + + + +```python +from nemo_evaluator_sdk import ExactMatchMetric + +metric = ExactMatchMetric( + reference="{{item.correct_answer | lower | trim}}", + candidate="{{item.model_answer | lower | trim}}", + description="Exact match for question answering", +) + +result = evaluator.run( + metric=metric, + dataset=[ + {"correct_answer": "Paris", "model_answer": "Paris"}, + {"correct_answer": "London", "model_answer": "london "}, + {"correct_answer": "Berlin", "model_answer": "Munich"}, + ], +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + + + +```python +from nemo_evaluator_sdk import ExactMatchMetric + +metric = ExactMatchMetric( + reference="{{item.correct_answer | lower | trim}}", + candidate="{{item.model_answer | lower | trim}}", +) + +job = evaluator.submit( + metric=metric, + dataset=[ + {"correct_answer": "Paris", "model_answer": "Paris"}, + {"correct_answer": "London", "model_answer": "london "}, + {"correct_answer": "Berlin", "model_answer": "Munich"}, + ], +) +job.wait_until_done() +result = job.get_result() +``` + + + +```json +{ + "scores": [ { - "scores": [ - { - "name": "exact-match", - "count": 3, - "mean": 0.667, - "min": 0.0, - "max": 1.0 - } - ] + "name": "exact-match", + "count": 3, + "mean": 0.667, + "min": 0.0, + "max": 1.0 } - ``` - + ] +} +``` + + ## F1 Metric F1 measures token-level overlap between candidate and reference text. It balances precision and recall, making it useful when there are multiple acceptable ways to phrase a response. @@ -239,75 +242,74 @@ F1 measures token-level overlap between candidate and reference text. It balance **Metric Output:** A score between 0 and 1. -=== "Local Evaluation" - - {% raw %} - ```python - from nemo_evaluator_sdk import F1Metric - - metric = F1Metric( - reference="{{item.reference}}", - candidate="{{item.answer}}", - ) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "reference": "the capital of France is Paris", - "answer": "Paris is the capital of France", - }, - {"reference": "a red apple", "answer": "red apple"}, - ], - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - {% endraw %} - -=== "Remote Job" - - {% raw %} - ```python - from nemo_evaluator_sdk import F1Metric - - metric = F1Metric( - reference="{{item.reference}}", - candidate="{{item.answer}}", - ) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "reference": "the capital of France is Paris", - "answer": "Paris is the capital of France", - }, - {"reference": "a red apple", "answer": "red apple"}, - ], - ) - job.wait_until_done() - result = job.get_result() - ``` - {% endraw %} - -=== "Example Result" - - ```json - { - "scores": [ + + + +```python +from nemo_evaluator_sdk import F1Metric + +metric = F1Metric( + reference="{{item.reference}}", + candidate="{{item.answer}}", +) + +result = evaluator.run( + metric=metric, + dataset=[ { - "name": "f1", - "count": 2, - "mean": 0.75, - "min": 0.5, - "max": 1.0 - } - ] - } - ``` + "reference": "the capital of France is Paris", + "answer": "Paris is the capital of France", + }, + {"reference": "a red apple", "answer": "red apple"}, + ], +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + + +```python +from nemo_evaluator_sdk import F1Metric + +metric = F1Metric( + reference="{{item.reference}}", + candidate="{{item.answer}}", +) + +job = evaluator.submit( + metric=metric, + dataset=[ + { + "reference": "the capital of France is Paris", + "answer": "Paris is the capital of France", + }, + {"reference": "a red apple", "answer": "red apple"}, + ], +) +job.wait_until_done() +result = job.get_result() +``` + + + +```json +{ + "scores": [ + { + "name": "f1", + "count": 2, + "mean": 0.75, + "min": 0.5, + "max": 1.0 + } + ] +} +``` + + ## Number Check Metric Number Check performs numerical comparisons and operations on extracted values. Supports equality, inequality, comparison operators, and absolute difference calculations. @@ -322,81 +324,80 @@ Number Check performs numerical comparisons and operations on extracted values. ### Supported Operations - Equality: `"equals"`, `"=="` -- Inequality: `"!="`, `"<>"`, `"not equals"` -- Comparisons: `">"`, `"gt"`, `">="`, `"gte"`, `"<"`, `"lt"`, `"<="`, `"lte"` +- Inequality: `"!="`, `"<>"`, `"not equals"` +- Comparisons: `">"`, `"gt"`, `">="`, `"gte"`, `"<"`, `"lt"`, `"<="`, `"lte"` - Absolute difference: `"absolute difference"` (requires `epsilon` parameter) -=== "Local Evaluation" - - {% raw %} - ```python - from nemo_evaluator_sdk import NumberCheckMetric - - metric = NumberCheckMetric( - operation="absolute difference", - epsilon=0.5, - left_template="{{item.expected}}", - right_template="{{item.predicted}}", - description="Check if values match within tolerance", - ) - - result = evaluator.run( - metric=metric, - dataset=[ - {"expected": "100", "predicted": "100"}, - {"expected": "42.5", "predicted": "42.3"}, - {"expected": "99", "predicted": "101"}, - ], - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - {% endraw %} - -=== "Remote Job" - - {% raw %} - ```python - from nemo_evaluator_sdk import NumberCheckMetric - - metric = NumberCheckMetric( - operation=">", - left_template="{{item.predicted}}", - right_template="0.5", - description="Score must be greater than 0.5", - ) - - job = evaluator.submit( - metric=metric, - dataset=[ - {"predicted": "1"}, - {"predicted": "0.75"}, - {"predicted": "0.5"}, - {"predicted": "0.1"}, - ], - ) - job.wait_until_done() - result = job.get_result() - ``` - {% endraw %} - -=== "Example Result" - - ```json + + + +```python +from nemo_evaluator_sdk import NumberCheckMetric + +metric = NumberCheckMetric( + operation="absolute difference", + epsilon=0.5, + left_template="{{item.expected}}", + right_template="{{item.predicted}}", + description="Check if values match within tolerance", +) + +result = evaluator.run( + metric=metric, + dataset=[ + {"expected": "100", "predicted": "100"}, + {"expected": "42.5", "predicted": "42.3"}, + {"expected": "99", "predicted": "101"}, + ], +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + + + +```python +from nemo_evaluator_sdk import NumberCheckMetric + +metric = NumberCheckMetric( + operation=">", + left_template="{{item.predicted}}", + right_template="0.5", + description="Score must be greater than 0.5", +) + +job = evaluator.submit( + metric=metric, + dataset=[ + {"predicted": "1"}, + {"predicted": "0.75"}, + {"predicted": "0.5"}, + {"predicted": "0.1"}, + ], +) +job.wait_until_done() +result = job.get_result() +``` + + + +```json +{ + "scores": [ { - "scores": [ - { - "name": "number-check", - "count": 3, - "mean": 0.6667, - "min": 0.0, - "max": 1.0 - } - ] + "name": "number-check", + "count": 3, + "mean": 0.6667, + "min": 0.0, + "max": 1.0 } - ``` - + ] +} +``` + + ## ROUGE Metric ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures overlap between generated text and reference text. It is commonly used for summarization and long-form generation quality checks. @@ -408,94 +409,93 @@ ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures overlap betwe **Metric Output:** ROUGE-1, ROUGE-2, ROUGE-3, and ROUGE-L F1 scores between 0 and 1. -=== "Local Evaluation" - - {% raw %} - ```python - from nemo_evaluator_sdk import ROUGEMetric - - metric = ROUGEMetric( - reference="{{item.reference_summary}}", - candidate="{{item.model_summary}}", - ) - - result = evaluator.run( - metric=metric, - dataset=[ - { - "reference_summary": "The cat sat on the mat and looked out the window.", - "model_summary": "A cat sat on a mat near the window.", - }, - { - "reference_summary": "The launch was postponed because of high winds.", - "model_summary": "High winds delayed the launch.", - }, - ], - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - {% endraw %} - -=== "Remote Job" - - {% raw %} - ```python - from nemo_evaluator_sdk import ROUGEMetric - - metric = ROUGEMetric( - reference="{{item.reference_summary}}", - candidate="{{item.model_summary}}", - ) - - job = evaluator.submit( - metric=metric, - dataset=[ - { - "reference_summary": "The cat sat on the mat and looked out the window.", - "model_summary": "A cat sat on a mat near the window.", - }, - { - "reference_summary": "The launch was postponed because of high winds.", - "model_summary": "High winds delayed the launch.", - }, - ], - ) - job.wait_until_done() - result = job.get_result() - ``` - {% endraw %} - -=== "Example Result" - - ```json - { - "scores": [ + + + +```python +from nemo_evaluator_sdk import ROUGEMetric + +metric = ROUGEMetric( + reference="{{item.reference_summary}}", + candidate="{{item.model_summary}}", +) + +result = evaluator.run( + metric=metric, + dataset=[ { - "name": "rouge_1_score", - "count": 2, - "mean": 0.72 + "reference_summary": "The cat sat on the mat and looked out the window.", + "model_summary": "A cat sat on a mat near the window.", }, { - "name": "rouge_2_score", - "count": 2, - "mean": 0.43 + "reference_summary": "The launch was postponed because of high winds.", + "model_summary": "High winds delayed the launch.", }, + ], +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + + + +```python +from nemo_evaluator_sdk import ROUGEMetric + +metric = ROUGEMetric( + reference="{{item.reference_summary}}", + candidate="{{item.model_summary}}", +) + +job = evaluator.submit( + metric=metric, + dataset=[ { - "name": "rouge_3_score", - "count": 2, - "mean": 0.31 + "reference_summary": "The cat sat on the mat and looked out the window.", + "model_summary": "A cat sat on a mat near the window.", }, { - "name": "rouge_L_score", - "count": 2, - "mean": 0.67 - } - ] - } - ``` + "reference_summary": "The launch was postponed because of high winds.", + "model_summary": "High winds delayed the launch.", + }, + ], +) +job.wait_until_done() +result = job.get_result() +``` + + +```json +{ + "scores": [ + { + "name": "rouge_1_score", + "count": 2, + "mean": 0.72 + }, + { + "name": "rouge_2_score", + "count": 2, + "mean": 0.43 + }, + { + "name": "rouge_3_score", + "count": 2, + "mean": 0.31 + }, + { + "name": "rouge_L_score", + "count": 2, + "mean": 0.67 + } + ] +} +``` + + ## String Check Metric String Check performs various string operations and comparisons. Supports equality, containment, and prefix/suffix checks. @@ -511,78 +511,77 @@ String Check performs various string operations and comparisons. Supports equali ### Supported Operations - Equality: `"equals"`, `"=="` -- Inequality: `"!="`, `"<>"`, `"not equals"` +- Inequality: `"!="`, `"<>"`, `"not equals"` - Containment: `"contains"`, `"not contains"` - Pattern: `"startswith"`, `"endswith"` -=== "Local Evaluation" - - {% raw %} - ```python - from nemo_evaluator_sdk import StringCheckMetric - - metric = StringCheckMetric( - operation="contains", - left_template="{{item.output | trim}}", - right_template="{{item.must_contain}}", - ) - - result = evaluator.run( - metric=metric, - dataset=[ - {"output": "The answer is: 42", "must_contain": "answer"}, - {"output": "Result: Success", "must_contain": "Success"}, - {"output": "Error occurred", "must_contain": "Success"}, - ], - ) - - for score in result.aggregate_scores.scores: - print(f"{score.name}: mean={score.mean}") - ``` - {% endraw %} - -=== "Remote Job" - - {% raw %} - ```python - from nemo_evaluator_sdk import StringCheckMetric - - metric = StringCheckMetric( - operation="startswith", - left_template="{{item.output}}", - right_template="Answer:", - description="Check if output starts with 'Answer:'", - ) - - job = evaluator.submit( - metric=metric, - dataset=[ - {"output": "Answer: 42"}, - {"output": "Answer: Success"}, - {"output": "Error occurred"}, - ], - ) - job.wait_until_done() - result = job.get_result() - ``` - {% endraw %} - -=== "Example Result" - - ```json + + + +```python +from nemo_evaluator_sdk import StringCheckMetric + +metric = StringCheckMetric( + operation="contains", + left_template="{{item.output | trim}}", + right_template="{{item.must_contain}}", +) + +result = evaluator.run( + metric=metric, + dataset=[ + {"output": "The answer is: 42", "must_contain": "answer"}, + {"output": "Result: Success", "must_contain": "Success"}, + {"output": "Error occurred", "must_contain": "Success"}, + ], +) + +for score in result.aggregate_scores.scores: + print(f"{score.name}: mean={score.mean}") +``` + + + + +```python +from nemo_evaluator_sdk import StringCheckMetric + +metric = StringCheckMetric( + operation="startswith", + left_template="{{item.output}}", + right_template="Answer:", + description="Check if output starts with 'Answer:'", +) + +job = evaluator.submit( + metric=metric, + dataset=[ + {"output": "Answer: 42"}, + {"output": "Answer: Success"}, + {"output": "Error occurred"}, + ], +) +job.wait_until_done() +result = job.get_result() +``` + + + +```json +{ + "scores": [ { - "scores": [ - { - "name": "string-check", - "count": 3, - "mean": 0.667, - "min": 0.0, - "max": 1.0 - } - ] + "name": "string-check", + "count": 3, + "mean": 0.667, + "min": 0.0, + "max": 1.0 } - ``` - + ] +} +``` + + ## Dataset Format @@ -592,10 +591,10 @@ The examples on this page use inline dataset rows with `dataset=[...]`. Template - `candidate` reads from an `item` field for offline rows when configured. - If `candidate` is omitted for BLEU, Exact Match, F1, or ROUGE, the metric uses `sample.output_text`, which is populated during online evaluations. -Keep field names consistent between the dataset rows and the templates you configure. For example, `{%raw%}{{item.expected}}{%endraw%}` requires each row to include an `expected` field. +Keep field names consistent between the dataset rows and the templates you configure. For example, `{%raw%}{{item.expected}}{%endraw%}` requires each row to include an `expected` field. ## Related Topics -- [Metric Overview](index.md) -- [Model Configuration](model-configuration.md) -- [RAG Evaluation Metrics](rag.md) +- [Metric Overview](/evaluation/metrics/overview) +- [Model Configuration](/evaluation/metrics/model-configuration) +- [RAG Evaluation Metrics](/evaluation/metrics/rag-metrics) diff --git a/docs/evaluator/sdk-resources.mdx b/docs/evaluator/sdk-resources.mdx index 53ce5db735..ea39bd3243 100644 --- a/docs/evaluator/sdk-resources.mdx +++ b/docs/evaluator/sdk-resources.mdx @@ -1,15 +1,19 @@ +--- +title: "SDK Resources" +description: "" +--- -# Evaluator {{platform_name}} SDK Resources +# Evaluator NeMo Platform SDK Resources The `nemo_evaluator_sdk` package provides context-agnostic objects for defining metrics, datasets, evaluation configuration, and result handling. -When you want to execute those evaluations through the {{platform_name}} Evaluator plugin, use the Evaluator SDK resource mounted on the `nemo_platform` SDK. -This page explains the {{platform_name}}-specific objects used to run local plugin jobs, submit durable platform jobs, and retrieve evaluator job results. +When you want to execute those evaluations through the NeMo Platform Evaluator plugin, use the Evaluator SDK resource mounted on the `nemo_platform` SDK. +This page explains the NeMo Platform-specific objects used to run local plugin jobs, submit durable platform jobs, and retrieve evaluator job results. ## Evaluator -The `Evaluator` resource is the sync SDK object for working with the Evaluator plugin on {{platform_name}}. +The `Evaluator` resource is the sync SDK object for working with the Evaluator plugin on NeMo Platform. It is accessed directly from a `NeMoPlatform` instance: ```python @@ -61,7 +65,7 @@ The `dataset` argument accepts inline rows, local dataset paths, local glob path ### Run locally -{% raw %} + ```python from nemo_evaluator_sdk import ExactMatchMetric @@ -77,11 +81,11 @@ result = evaluator.run(metric=metric, dataset=dataset) print(result.aggregate_scores) ``` -{% endraw %} + ### Submit a platform job -{% raw %} + ```python from nemo_evaluator_sdk import ExactMatchMetric @@ -99,7 +103,7 @@ result = job.get_result() print(result.aggregate_scores) ``` -{% endraw %} + ## AsyncEvaluator @@ -129,7 +133,7 @@ evaluator: AsyncEvaluator = client.evaluator `AsyncEvaluator.run()` and `AsyncEvaluator.submit()` accept the same arguments as the sync methods [above](#run-arguments). -{% raw %} + ```python import asyncio @@ -154,7 +158,7 @@ async def main() -> None: asyncio.run(main()) ``` -{% endraw %} + ## EvaluatorJobResource diff --git a/docs/evaluator/tutorials/index.mdx b/docs/evaluator/tutorials/index.mdx index 9b5870b9bd..c4cc057615 100644 --- a/docs/evaluator/tutorials/index.mdx +++ b/docs/evaluator/tutorials/index.mdx @@ -1,15 +1,19 @@ +--- +title: "Overview" +description: "" +--- # Evaluation Tutorials -Use these tutorials to become familiar with [evaluation with {{platform_name}}](../index.md). +Use these tutorials to become familiar with [evaluation with NeMo Platform](/evaluation/about). ## Before You Start -Set up [a local instance of the platform](../../get-started/setup.md) for the following tutorials. +Set up [a local instance of the platform](/get-started/setup) for the following tutorials.
-- **[Run an LLM Judge Eval](run-llm-judge-evaluation.md)** +- **[Run an LLM Judge Eval](/evaluation/tutorials/run-llm-as-a-judge-evaluation)** --- @@ -21,4 +25,4 @@ Set up [a local instance of the platform](../../get-started/setup.md) for the fo ## How It Works -For the conceptual overview of how Evaluator separates definition (library) from execution (platform), see [About Evaluating → How It Works](../index.md#how-it-works-library-platform). For runnable SDK examples, see [SDK Resources](../sdk-resources.md). +For the conceptual overview of how Evaluator separates definition (library) from execution (platform), see [About Evaluating → How It Works](/evaluation/about#how-it-works-library-platform). For runnable SDK examples, see [SDK Resources](/evaluation/sdk-resources). diff --git a/docs/evaluator/tutorials/run-llm-judge-evaluation.mdx b/docs/evaluator/tutorials/run-llm-judge-evaluation.mdx index c7c8aadce5..4f0ff66abd 100644 --- a/docs/evaluator/tutorials/run-llm-judge-evaluation.mdx +++ b/docs/evaluator/tutorials/run-llm-judge-evaluation.mdx @@ -1,13 +1,17 @@ - - - - - +*/} # Evaluate Response Quality with LLM-as-a-Judge @@ -33,12 +37,11 @@ This tutorial shows you how to build, validate, and iterate on LLM judge metrics | Complexity | Sophistication and depth of the response | 0-4 | | Verbosity | Appropriate level of detail | 0-4 | -!!! tip - This tutorial takes approximately **20 minutes** to complete. +This tutorial takes approximately **20 minutes** to complete. ## Prerequisites -1. Install and start {{platform_name}} using the [Setup guide](../../get-started/setup.md). +1. Install and start NeMo Platform using the [Setup guide](/get-started/setup). ```bash ! pip install nemo-platform[all] @@ -59,7 +62,7 @@ Before you begin, here is a quick overview of the resources you will use: - **Evaluator resource**: The plugin SDK resource mounted at `client.evaluator`. Use it to run metrics locally or submit durable platform jobs. - **Metric**: An inline Python object that defines how to score model outputs. In this tutorial, we create LLM judge metrics that prompt a model to rate responses. -- **Fileset**: A dataset registered with {{platform_name}}. The evaluator plugin SDK accepts fileset references directly, so this tutorial passes the registered HelpSteer2 split to evaluations as a `FilesetRef`. +- **Fileset**: A dataset registered with NeMo Platform. The evaluator plugin SDK accepts fileset references directly, so this tutorial passes the registered HelpSteer2 split to evaluations as a `FilesetRef`. - **Workspace**: A workspace that isolates your resources. Secrets, filesets, and jobs belong to a workspace. - **Job**: A durable remote platform task created with `evaluator.submit(...)`. - **Evaluation**: The process of scoring model outputs using one or more metrics. Use `evaluator.run(...)` for local in-process execution, `evaluator.submit(...)` for durable jobs @@ -139,8 +142,9 @@ print( ) ``` -!!! note - Local runs resolve `Model(api_key_secret="NVIDIA_API_KEY")` from your local environment. Remote jobs run in the platform job runtime, so they use the platform secret name created above. + +Local runs resolve `Model(api_key_secret="NVIDIA_API_KEY")` from your local environment. Remote jobs run in the platform job runtime, so they use the platform secret name created above. + --- @@ -178,8 +182,9 @@ REMOTE_JUDGE_MODEL = Model( ) ``` -!!! tip - When using hosted APIs, keep `parallelism` low to avoid rate-limit errors. You can increase it for locally deployed models. + +When using hosted APIs, keep `parallelism` low to avoid rate-limit errors. You can increase it for locally deployed models. + --- @@ -227,8 +232,9 @@ except ConflictError: print(f"{fileset.workspace}/{fileset.name} dataset already registered") ``` -!!! note - HelpSteer2 contains prompt-response pairs with human ratings for helpfulness, correctness, coherence, complexity, and verbosity. Each rating is on a 0-4 scale. We'll use these human scores as ground truth to validate our LLM judge. + +HelpSteer2 contains prompt-response pairs with human ratings for helpfulness, correctness, coherence, complexity, and verbosity. Each rating is on a 0-4 scale. We'll use these human scores as ground truth to validate our LLM judge. + Create a fileset reference for the validation split. The evaluator SDK resolves the selected fileset path when it runs, so you do not need to download the split into memory yourself: @@ -249,11 +255,11 @@ Now let's create our first LLM judge metric. We'll start with a simple prompt fo A metric definition includes: - **Model**: Which LLM to use as the judge -- **Prompt template**: Instructions for the judge, with `{% raw %}{{item...}}{% endraw %}` fields filled from your dataset rows +- **Prompt template**: Instructions for the judge, with `{{item...}}` fields filled from your dataset rows - **Score definition**: The name, scale, and how to parse the judge's output ```python -{% raw %} + def create_helpfulness_metric(prompt_template: str, judge_model: Model) -> LLMJudgeMetric: """Create a helpfulness metric with the given system prompt.""" score = RangeScore( @@ -291,11 +297,12 @@ Respond with JSON only: {"helpfulness": <0-4>}""" metric_v1_local = create_helpfulness_metric(PROMPT_V1, LOCAL_JUDGE_MODEL) metric_v1_remote = create_helpfulness_metric(PROMPT_V1, REMOTE_JUDGE_MODEL) -{% endraw %} + ``` -!!! tip - Use low temperature for evaluation tasks. Low or zero temperature produces outputs with less variability, which is critical for reproducible scoring. This ensures the same response gets the same score across runs, making it easier to validate your judge and compare prompt versions. + +Use low temperature for evaluation tasks. Low or zero temperature produces outputs with less variability, which is critical for reproducible scoring. This ensures the same response gets the same score across runs, making it easier to validate your judge and compare prompt versions. + --- @@ -304,7 +311,7 @@ metric_v1_remote = create_helpfulness_metric(PROMPT_V1, REMOTE_JUDGE_MODEL) Before running a durable job, test your metric with a few examples using `evaluator.run(...)`. This runs locally in-process and returns results immediately, which is useful for prompt iteration. ```python -{% raw %} + quick_test_result = evaluator.run( metric=metric_v1_local, dataset=[ @@ -338,7 +345,7 @@ print("Quick test results:") for row in quick_test_result.row_scores: helpfulness = score_value(row, "helpfulness") print(f" Row {row.row_index}: helpfulness = {helpfulness}") -{% endraw %} + ``` **Expected output:** @@ -531,8 +538,9 @@ else: print("\nPrompt V1 performs better; simpler prompts can work well.") ``` -!!! note - More complex prompts do not always perform better. The best prompt depends on the model, task, and how well it aligns with the original annotation guidelines. If your V1 prompt outperforms V2, that's a valid result. Use what works best for your use case. + +More complex prompts do not always perform better. The best prompt depends on the model, task, and how well it aligns with the original annotation guidelines. If your V1 prompt outperforms V2, that's a valid result. Use what works best for your use case. + --- @@ -593,17 +601,18 @@ plt.show() print("Saved visualization to score_distributions.png") ``` -!!! tip - If you run this tutorial as a headless script instead of a notebook, configure a non-interactive Matplotlib backend before importing `pyplot`: + +If you run this tutorial as a headless script instead of a notebook, configure a non-interactive Matplotlib backend before importing `pyplot`: - ```python - import matplotlib +```python +import matplotlib - matplotlib.use("Agg") - import matplotlib.pyplot as plt - ``` +matplotlib.use("Agg") +import matplotlib.pyplot as plt +``` - In that mode, save the figure with `plt.savefig(...)` and skip `plt.show()`. +In that mode, save the figure with `plt.savefig(...)` and skip `plt.show()`. + *The code above generates a chart showing score distributions. Your results will vary depending on the model used and the specific samples evaluated.* @@ -630,8 +639,9 @@ for name, scores in [ ) ``` -!!! tip - If your judge's distribution looks very different from humans, such as always scoring 3-4 while humans use the full range, adjust your prompt to calibrate the scoring criteria. + +If your judge's distribution looks very different from humans, such as always scoring 3-4 while humans use the full range, adjust your prompt to calibrate the scoring criteria. + --- @@ -668,8 +678,9 @@ client.workspaces.delete(name=WORKSPACE) print("Cleanup complete!") ``` -!!! note - Workspaces cannot be deleted while they contain resources. The code above deletes resources in dependency order. + +Workspaces cannot be deleted while they contain resources. The code above deletes resources in dependency order. + --- @@ -804,12 +815,12 @@ In this tutorial, you learned how to: ## Next Steps -- **Experiment with rubric scores**: Use [categorical rubrics](../metrics/llm-as-a-judge.md) instead of numeric ranges for more interpretable criteria +- **Experiment with rubric scores**: Use [categorical rubrics](/evaluation/metrics/llm-as-a-judge) instead of numeric ranges for more interpretable criteria - **Try different judge models**: Larger models often correlate better with human judgment -- **Explore other evaluation types**: [RAG evaluation](../metrics/rag.md) or [agentic evaluation](../metrics/agentic.md) +- **Explore other evaluation types**: [RAG evaluation](/evaluation/metrics/rag-metrics) or [agentic evaluation](/evaluation/metrics/agentic-metrics) ## Related -- [LLM-as-a-Judge Reference](../metrics/llm-as-a-judge.md) - Complete guide to judge configuration -- [SDK Resources](../sdk-resources.md) - Evaluator plugin SDK resource reference -- [Manage Metrics](../metrics/manage-metrics.md) - Using evaluator SDK metric objects +- [LLM-as-a-Judge Reference](/evaluation/metrics/llm-as-a-judge) - Complete guide to judge configuration +- [SDK Resources](/evaluation/sdk-resources) - Evaluator plugin SDK resource reference +- [Manage Metrics](/evaluation/metrics/manage-metrics) - Using evaluator SDK metric objects diff --git a/docs/example-applications/about.mdx b/docs/example-applications/about.mdx index a36c3da430..94b70baf97 100644 --- a/docs/example-applications/about.mdx +++ b/docs/example-applications/about.mdx @@ -1,3 +1,5 @@ -# Example Applications - -{{platform_name}} allows you to build complex AI applications by combining multiple components together. We've compiled a few examples that can be applied directly, extended and customized, or used as inspiration. \ No newline at end of file +--- +title: "Example Applications" +description: "" +--- +NeMo Platform allows you to build complex AI applications by combining multiple components together. We've compiled a few examples that can be applied directly, extended and customized, or used as inspiration. \ No newline at end of file diff --git a/docs/fern/_images/nemo-platform-architecture.svg b/docs/fern/_images/nemo-platform-architecture.svg new file mode 100644 index 0000000000..4d1bea95ac --- /dev/null +++ b/docs/fern/_images/nemo-platform-architecture.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/fern/apis/nemo-platform/generators.yml b/docs/fern/apis/nemo-platform/generators.yml new file mode 100644 index 0000000000..8635608553 --- /dev/null +++ b/docs/fern/apis/nemo-platform/generators.yml @@ -0,0 +1,3 @@ +api: + specs: + - openapi: ./openapi.yaml diff --git a/docs/fern/apis/nemo-platform/openapi.yaml b/docs/fern/apis/nemo-platform/openapi.yaml new file mode 100644 index 0000000000..ed2a4cf8cc --- /dev/null +++ b/docs/fern/apis/nemo-platform/openapi.yaml @@ -0,0 +1,28781 @@ +openapi: 3.1.0 +info: + title: Nemo Platform API + description: API for Nemo Platform services + version: 0.1.1 +paths: + /apis/auth/discovery: + get: + tags: + - Discovery + summary: Discover auth configuration + description: "Return authentication configuration for CLI/SDK discovery.\n\n\ + This endpoint is unauthenticated and returns the information clients\nneed\ + \ to authenticate with this NeMo Platform deployment.\n\n**Response fields:**\n\ + \n- `auth_enabled`: Whether authentication is enabled on this cluster\n- `oidc`:\ + \ OIDC configuration (only present when OIDC is enabled)\n - `issuer`: The\ + \ OIDC issuer URL\n - `authorization_endpoint`: Authorization endpoint for\ + \ browser-based flows\n - `token_endpoint`: Token exchange endpoint\n -\ + \ `device_authorization_endpoint`: Device flow authorization endpoint (for\ + \ CLI)\n - `userinfo_endpoint`: UserInfo endpoint\n - `client_id`: OAuth\ + \ client ID to use\n - `default_scopes`: OAuth scopes to request during authentication\n\ + \ - `scope_prefix`: Prefix to prepend to custom scopes (those with ':' or\ + \ '.default')" + operationId: get_auth_discovery_apis_auth_discovery_get + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/AuthDiscoveryResponse' + /apis/auth/v2/iam/role-bindings: + get: + tags: + - IAM + summary: List role bindings + description: List all role bindings (Platform Admin only) + operationId: list_role_bindings_apis_auth_v2_iam_role_bindings_get + parameters: + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/RoleBindingFilter' + description: Filter role bindings by principal, workspace, role, granted_by, + is_active, granted_at, and revoked_at. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/RoleBindingsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - IAM + summary: Create role binding + description: Create a new role binding (Platform Admin only) + operationId: create_role_binding_apis_auth_v2_iam_role_bindings_post + parameters: + - name: wait_role_propagation + in: query + required: false + schema: + type: boolean + description: 'If true, wait for role to propagate before returning (default: + true). Set to false for bulk operations.' + default: true + title: Wait Role Propagation + description: 'If true, wait for role to propagate before returning (default: + true). Set to false for bulk operations.' + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/RoleBindingInput' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/RoleBinding' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/auth/v2/iam/role-bindings/{name}: + get: + tags: + - IAM + summary: Get role binding + description: Get a specific role binding (Platform Admin only) + operationId: get_role_binding_apis_auth_v2_iam_role_bindings__name__get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/RoleBinding' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - IAM + summary: Revoke role binding + description: Revoke a role binding (Platform Admin only) + operationId: revoke_role_binding_apis_auth_v2_iam_role_bindings__name__delete + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: wait_role_propagation + in: query + required: false + schema: + type: boolean + description: 'If true, wait for role to propagate before returning (default: + true). Set to false for bulk operations.' + default: true + title: Wait Role Propagation + description: 'If true, wait for role to propagate before returning (default: + true). Set to false for bulk operations.' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/entities/{id}: + get: + tags: + - Entity Store + summary: Get entity by ID (debug/internal) + description: 'Get a specific entity by its unique identifier. + + This endpoint is primarily for debugging and internal use. + + + Example: + + ``` + + GET /apis/entities/v2/entities/customization-config-5Q2LoF8z8M9JZxZsHwJKNn + + ```' + operationId: get_entity_by_id_apis_entities_v2_entities__id__get + parameters: + - name: id + in: path + required: true + schema: + type: string + title: Id + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Entity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces: + post: + tags: + - Entity Store + summary: Create a new workspace + description: "Create a new workspace.\n\nThe creator is automatically granted\ + \ Admin role on the workspace.\nBy default, this endpoint waits for the Admin\ + \ role to propagate before returning.\nUse `wait_role_propagation=false` to\ + \ skip waiting (useful for bulk operations).\n\nExample:\n```\nPOST /apis/entities/v2/workspaces\n\ + {\n \"name\": \"ml-team\",\n \"description\": \"Machine Learning Team\ + \ workspace\"\n}\n```" + operationId: create_workspace_apis_entities_v2_workspaces_post + parameters: + - name: wait_role_propagation + in: query + required: false + schema: + type: boolean + description: 'If true, wait for Admin role to propagate before returning + (default: true). Set to false for bulk operations.' + default: true + title: Wait Role Propagation + description: 'If true, wait for Admin role to propagate before returning (default: + true). Set to false for bulk operations.' + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceInput' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Workspace' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Entity Store + summary: List all workspaces + description: 'List all workspaces with pagination. + + + When authentication is enabled, only workspaces the principal has access to + + are returned. Service principals and platform admins have access to all workspaces. + + + Query Parameters: + + - page, page_size: Pagination + + - sort: Sort field + + - filter: Advanced filters (JSON, text, or bracket notation) + + + Example: + + ``` + + GET /apis/entities/v2/workspaces?sort=-created_at&page=1&page_size=10 + + ```' + operationId: list_workspaces_apis_entities_v2_workspaces_get + parameters: + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number + default: 1 + title: Page + description: Page number + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Items per page + default: 100 + title: Page Size + description: Items per page + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/GenericSortField' + description: Sort field + default: -created_at + description: Sort field + - name: filter + in: query + required: false + schema: + description: 'Query filter expression. Supports text and JSON syntaxes: + + - Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT + IN AND OR and negation prefix - + + - Object (JSON): {"name":{"$like":"value"}} with operators $eq, $like, + $lt, $lte, $gt, $gte, $in, $nin, $and, $or, $not + + - Bracket notation: ?filter[name][$like]=value + + - Relationship traversal: ?filter[relationship][$exists]=true or ?filter[relationship][field]=value' + title: Filter + type: string + description: 'Query filter expression. Supports text and JSON syntaxes: + + - Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT + IN AND OR and negation prefix - + + - Object (JSON): {"name":{"$like":"value"}} with operators $eq, $like, $lt, + $lte, $gt, $gte, $in, $nin, $and, $or, $not + + - Bracket notation: ?filter[name][$like]=value + + - Relationship traversal: ?filter[relationship][$exists]=true or ?filter[relationship][field]=value' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspacesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{name}: + get: + tags: + - Entity Store + summary: Get workspace by ID + description: 'Get a specific workspace by ID. + + + Example: + + ``` + + GET /apis/entities/v2/workspaces/ml-team + + ```' + operationId: get_workspace_apis_entities_v2_workspaces__name__get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Workspace' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Entity Store + summary: Update workspace + description: "Update a workspace's description.\n\nExample:\n```\nPUT /apis/entities/v2/workspaces/ml-team\n\ + {\n \"description\": \"Updated description for ML Team\"\n}\n```" + operationId: update_workspace_apis_entities_v2_workspaces__name__put + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceUpdate' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Workspace' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Entity Store + summary: Delete workspace + description: 'Delete a workspace. + + + This marks the workspace for deletion and returns immediately. The workspace + + will no longer be accessible via the API. An asynchronous cleanup controller + + will handle deletion of all entities and external resources. + + + Role bindings are immediately deleted to revoke access. + + + Example: + + ``` + + DELETE /apis/entities/v2/workspaces/ml-team + + ```' + operationId: delete_workspace_apis_entities_v2_workspaces__name__delete + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{workspace}/entities/{entity_type}: + post: + tags: + - Entity Store + summary: Create a new entity + description: "Create a new entity of the specified type in the given workspace.\n\ + \nIf name is not provided, it will be auto-generated based on the entity type.\n\ + \nExample:\n```\nPOST /apis/entities/v2/workspaces/default/entities/customization_config\n\ + {\n \"name\": \"my-config\",\n \"data\": {\n \"target_id\": \"\ + llama-2-7b\",\n \"training_options\": {\"learning_rate\": 0.01}\n \ + \ }\n}\n```" + operationId: create_entity_apis_entities_v2_workspaces__workspace__entities__entity_type__post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: entity_type + in: path + required: true + schema: + type: string + title: Entity Type + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/EntityCreateInput' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Entity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Entity Store + summary: List entities + description: 'List all entities of a specific type in the given workspace. + + + Use workspace="-" to list entities across all workspaces the principal has + + access to. + + + Query Parameters: + + - sort: Sort field + + - page, page_size: Pagination + + - filter: Advanced filters (JSON, text, or bracket notation) + + + Examples: + + ``` + + GET /apis/entities/v2/workspaces/default/entities/customization_config?sort=-created_at + + GET /apis/entities/v2/workspaces/-/entities/customization_config # Cross-workspace + query + + ```' + operationId: list_entities_apis_entities_v2_workspaces__workspace__entities__entity_type__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: entity_type + in: path + required: true + schema: + type: string + title: Entity Type + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number + default: 1 + title: Page + description: Page number + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Items per page + default: 100 + title: Page Size + description: Items per page + - name: sort + in: query + required: false + schema: + type: string + description: Sort field + examples: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + default: -created_at + title: Sort + description: Sort field + - name: filter + in: query + required: false + schema: + description: 'Query filter expression. Supports text and JSON syntaxes: + + - Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT + IN AND OR and negation prefix - + + - Object (JSON): {"name":{"$like":"value"}} with operators $eq, $like, + $lt, $lte, $gt, $gte, $in, $nin, $and, $or, $not + + - Bracket notation: ?filter[name][$like]=value + + - Relationship traversal: ?filter[relationship][$exists]=true or ?filter[relationship][field]=value' + title: Filter + type: string + description: 'Query filter expression. Supports text and JSON syntaxes: + + - Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT + IN AND OR and negation prefix - + + - Object (JSON): {"name":{"$like":"value"}} with operators $eq, $like, $lt, + $lte, $gt, $gte, $in, $nin, $and, $or, $not + + - Bracket notation: ?filter[name][$like]=value + + - Relationship traversal: ?filter[relationship][$exists]=true or ?filter[relationship][field]=value' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/EntitiesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{workspace}/entities/{entity_type}/{name}: + get: + tags: + - Entity Store + summary: Get entity by name + description: 'Get a specific entity by its workspace, type, and name. + + + Example: + + ``` + + GET /apis/entities/v2/workspaces/default/entities/customization_config/my-config + + ```' + operationId: get_entity_by_name_apis_entities_v2_workspaces__workspace__entities__entity_type___name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: entity_type + in: path + required: true + schema: + type: string + title: Entity Type + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: parent + in: query + required: false + schema: + description: Parent entity ID for nested entities + title: Parent + type: string + description: Parent entity ID for nested entities + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Entity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Entity Store + summary: Update entity by name + description: "Update an entity by its name. Optionally change the entity's name.\n\ + \nExample:\n```\nPUT /apis/entities/v2/workspaces/default/entities/customization_config/my-config\n\ + {\n \"data\": {\n \"target_id\": \"llama-2-7b\",\n \"training_options\"\ + : {\"learning_rate\": 0.02}\n }\n}\n```" + operationId: update_entity_by_name_apis_entities_v2_workspaces__workspace__entities__entity_type___name__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: entity_type + in: path + required: true + schema: + type: string + title: Entity Type + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: parent + in: query + required: false + schema: + description: Parent entity ID for nested entities + title: Parent + type: string + description: Parent entity ID for nested entities + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/EntityUpdate' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Entity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Entity Store + summary: Delete entity by name + description: 'Delete an entity by its name. + + + Example: + + ``` + + DELETE /apis/entities/v2/workspaces/default/entities/customization_config/my-config + + ```' + operationId: delete_entity_by_name_apis_entities_v2_workspaces__workspace__entities__entity_type___name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: entity_type + in: path + required: true + schema: + type: string + title: Entity Type + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: parent + in: query + required: false + schema: + description: Parent entity ID for nested entities + title: Parent + type: string + description: Parent entity ID for nested entities + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{workspace}/members: + get: + tags: + - Entity Store + summary: List workspace members + description: 'List all members of a workspace with their roles. + + + Returns a list of all principals with active role bindings in the workspace. + + + Example: + + ``` + + GET /apis/entities/v2/workspaces/ml-team/members + + ```' + operationId: list_workspace_members_apis_entities_v2_workspaces__workspace__members_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceMemberListResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - Entity Store + summary: Add workspace member + description: "Add a new member to the workspace with specified roles.\n\nThis\ + \ creates role bindings for the specified principal with the given roles.\n\ + By default, this endpoint waits for the roles to propagate before returning.\n\ + Use `wait_role_propagation=false` to skip waiting (useful for bulk operations).\n\ + \nExample:\n```\nPOST /apis/entities/v2/workspaces/ml-team/members\n{\n \ + \ \"principal\": \"user@example.com\",\n \"roles\": [\"Editor\"]\n}\n\ + ```" + operationId: add_workspace_member_apis_entities_v2_workspaces__workspace__members_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: wait_role_propagation + in: query + required: false + schema: + type: boolean + description: 'If true, wait for roles to propagate before returning (default: + true). Set to false for bulk operations.' + default: true + title: Wait Role Propagation + description: 'If true, wait for roles to propagate before returning (default: + true). Set to false for bulk operations.' + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceMemberInput' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceMember' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{workspace}/members/{principal_id}: + put: + tags: + - Entity Store + summary: Update workspace member roles + description: "Update the roles for a workspace member.\n\nThis will revoke existing\ + \ roles not in the new list and add new roles.\nBy default, this endpoint\ + \ waits for the roles to propagate before returning.\nUse `wait_role_propagation=false`\ + \ to skip waiting (useful for bulk operations).\n\nExample:\n```\nPUT /apis/entities/v2/workspaces/ml-team/members/user@example.com\n\ + {\n \"roles\": [\"Viewer\", \"Editor\"]\n}\n```" + operationId: update_workspace_member_apis_entities_v2_workspaces__workspace__members__principal_id__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: principal_id + in: path + required: true + schema: + type: string + title: Principal Id + - name: wait_role_propagation + in: query + required: false + schema: + type: boolean + description: 'If true, wait for roles to propagate before returning (default: + true). Set to false for bulk operations.' + default: true + title: Wait Role Propagation + description: 'If true, wait for roles to propagate before returning (default: + true). Set to false for bulk operations.' + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceMemberUpdate' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/WorkspaceMember' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Entity Store + summary: Remove workspace member + description: 'Remove a member from the workspace by revoking all their roles. + + + This revokes all active role bindings for the principal in the workspace. + + By default, this endpoint waits for all roles to be revoked before returning. + + Use `wait_role_propagation=false` to skip waiting (useful for bulk operations). + + + Example: + + ``` + + DELETE /apis/entities/v2/workspaces/ml-team/members/user@example.com + + ```' + operationId: remove_workspace_member_apis_entities_v2_workspaces__workspace__members__principal_id__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: principal_id + in: path + required: true + schema: + type: string + title: Principal Id + - name: wait_role_propagation + in: query + required: false + schema: + type: boolean + description: 'If true, wait for roles to propagate before returning (default: + true). Set to false for bulk operations.' + default: true + title: Wait Role Propagation + description: 'If true, wait for roles to propagate before returning (default: + true). Set to false for bulk operations.' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{workspace}/projects: + post: + tags: + - Entity Store + summary: Create a new project + description: "Create a new project in the given workspace.\n\nExample:\n```\n\ + POST /apis/entities/v2/workspaces/default/projects\n{\n \"name\": \"ml-project\"\ + ,\n \"description\": \"Machine Learning project\"\n}\n```" + operationId: create_project_apis_entities_v2_workspaces__workspace__projects_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ProjectInput' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Project' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Entity Store + summary: List all projects + description: 'List all projects in a workspace with pagination. + + + Query Parameters: + + - page, page_size: Pagination + + - sort: Sort field + + - filter: Advanced filters + + + Example: + + ``` + + GET /apis/entities/v2/workspaces/default/projects?sort=-created_at&page=1&page_size=10 + + ```' + operationId: list_projects_apis_entities_v2_workspaces__workspace__projects_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number + default: 1 + title: Page + description: Page number + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Items per page + default: 100 + title: Page Size + description: Items per page + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/ProjectSortField' + description: Sort field + default: -created_at + description: Sort field + - name: filter + in: query + required: false + schema: + description: 'Query filter expression. Supports text and JSON syntaxes: + + - Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT + IN AND OR and negation prefix - + + - Object (JSON): {"name":{"$like":"value"}} with operators $eq, $like, + $lt, $lte, $gt, $gte, $in, $nin, $and, $or, $not + + - Bracket notation: ?filter[name][$like]=value + + - Relationship traversal: ?filter[relationship][$exists]=true or ?filter[relationship][field]=value' + title: Filter + type: string + description: 'Query filter expression. Supports text and JSON syntaxes: + + - Text: name:"value" AND status>500 with operators : ~ > >= < <= IN NOT + IN AND OR and negation prefix - + + - Object (JSON): {"name":{"$like":"value"}} with operators $eq, $like, $lt, + $lte, $gt, $gte, $in, $nin, $and, $or, $not + + - Bracket notation: ?filter[name][$like]=value + + - Relationship traversal: ?filter[relationship][$exists]=true or ?filter[relationship][field]=value' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ProjectsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/entities/v2/workspaces/{workspace}/projects/{name}: + get: + tags: + - Entity Store + summary: Get project by name + description: 'Get a specific project by its workspace and name. + + + Example: + + ``` + + GET /apis/entities/v2/workspaces/default/projects/ml-project + + ```' + operationId: get_project_apis_entities_v2_workspaces__workspace__projects__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Project' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Entity Store + summary: Update project + description: "Update a project's description.\n\nExample:\n```\nPUT /apis/entities/v2/workspaces/default/projects/ml-project\n\ + {\n \"description\": \"Updated description for ML project\"\n}\n```" + operationId: update_project_apis_entities_v2_workspaces__workspace__projects__name__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ProjectUpdate' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Project' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Entity Store + summary: Delete project + description: 'Delete a project. + + + Example: + + ``` + + DELETE /apis/entities/v2/workspaces/default/projects/ml-project + + ```' + operationId: delete_project_apis_entities_v2_workspaces__workspace__projects__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-job-results: + get: + tags: + - Evaluator + summary: List Benchmark Job Results + description: List stored evaluation results for benchmark jobs. + operationId: list_benchmark_job_results_apis_evaluation_v2_workspaces__workspace__benchmark_job_results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + enum: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + examples: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + default: -created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - name: aggregate_fields + in: query + required: false + schema: + type: array + items: + enum: + - nan_count + - sum + - mean + - min + - max + - std_dev + - variance + - score_type + - percentiles + - histogram + - rubric_distribution + - mode_category + type: string + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + default: [] + title: Aggregate Fields + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + additionalProperties: false + description: Filter for list benchmark job results. + properties: + name: + description: Filter job results by name. + title: Name + type: string + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: Filter results by benchmark reference. + metrics: + description: Filter results by metric reference. + title: Metrics + type: string + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: Filter results by dataset if the benchmark job is configured + with the fileset reference. + model: + allOf: + - $ref: '#/components/schemas/ModelRef' + description: Filter results by model if the benchmark job is configured + with the model reference. + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter job results by creation date range. + title: BenchmarkJobResultsListFilter + type: object + description: 'Filter benchmark job results by name, benchmark, metrics, dataset, + model, and dates. Supports JSON filter syntax with operators: $eq, $like, + $lt, $lte, $gt, $gte, $in, $nin, $and, $or, $not. Also supports text filter + syntax.' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkJobResultsListResponse' + '422': + description: Query Parameter Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-job-results/{name}: + get: + tags: + - Evaluator + summary: Get Benchmark Job Result + description: Get a specific benchmark job result by workspace and job name. + operationId: get_benchmark_job_result_apis_evaluation_v2_workspaces__workspace__benchmark_job_results__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: aggregate_fields + in: query + required: false + schema: + type: array + items: + enum: + - nan_count + - sum + - mean + - min + - max + - std_dev + - variance + - score_type + - percentiles + - histogram + - rubric_distribution + - mode_category + type: string + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + default: [] + title: Aggregate Fields + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + responses: + '200': + description: Benchmark Job Result Found + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkJobResult' + '404': + description: Benchmark Job Result Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Evaluator + summary: Delete Benchmark Job Result + description: Delete an evaluation benchmark job result. + operationId: delete_benchmark_job_result_apis_evaluation_v2_workspaces__workspace__benchmark_job_results__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Benchmark Job Result Deleted Successfully + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '404': + description: Benchmark Job Result Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs: + post: + tags: + - Evaluator + summary: Create Job + operationId: create_job_apis_evaluation_v2_workspaces__workspace__benchmark_jobs_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkEvaluationJobRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkEvaluationJob' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Evaluator + summary: List Jobs + operationId: list_jobs_apis_evaluation_v2_workspaces__workspace__benchmark_jobs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/BenchmarkEvaluationJobsSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: -created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/BenchmarkEvaluationJobsListFilter' + description: Filter jobs on various criteria. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkEvaluationJobsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{job}/results/aggregate-scores/download: + get: + tags: + - Evaluator + summary: Download Job Result Aggregate-Scores + operationId: download_job_result_aggregate_scores_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__job__results_aggregate_scores_download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkEvaluationResult' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{job}/results/artifacts/download: + get: + tags: + - Evaluator + summary: Download Job Result Artifacts + operationId: download_job_result_artifacts_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__job__results_artifacts_download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + responses: + '200': + description: Successful Response + content: + application/octet-stream: + schema: + type: string + format: binary + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{job}/results/row-scores/download: + get: + tags: + - Evaluator + summary: Download Job Result Row-Scores + operationId: download_job_result_row_scores_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__job__results_row_scores_download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: limit + in: query + required: false + schema: + title: Limit + type: integer + responses: + '200': + description: Successful Response + content: + application/jsonl: + schema: + $ref: '#/components/schemas/RowScore' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{job}/results/{name}: + get: + tags: + - Evaluator + summary: Get Job Result + operationId: get_job_result_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__job__results__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResultResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{job}/results/{name}/download: + get: + tags: + - Evaluator + summary: Download Job Result + operationId: download_job_result_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__job__results__name__download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/octet-stream: + schema: + type: string + format: binary + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{name}: + get: + tags: + - Evaluator + summary: Get Job + operationId: get_job_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkEvaluationJob' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Evaluator + summary: Delete Job + operationId: delete_job_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Successful Response + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{name}/cancel: + post: + tags: + - Evaluator + summary: Cancel Job + operationId: cancel_job_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__name__cancel_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkEvaluationJob' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{name}/logs: + get: + tags: + - Evaluator + summary: Get Job Logs + operationId: get_job_logs_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__name__logs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: limit + in: query + required: false + schema: + title: Limit + type: integer + - name: page_cursor + in: query + required: false + schema: + title: Page Cursor + type: string + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobLogPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{name}/results: + get: + tags: + - Evaluator + summary: List Job Results + operationId: list_job_results_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__name__results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobListResultResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmark-jobs/{name}/status: + get: + tags: + - Evaluator + summary: Get Job Status + operationId: get_job_status_apis_evaluation_v2_workspaces__workspace__benchmark_jobs__name__status_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStatusResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/benchmarks: + get: + tags: + - Evaluator + summary: List Benchmarks + description: List all available evaluation benchmarks. + operationId: list_benchmarks_apis_evaluation_v2_workspaces__workspace__benchmarks_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: extended_response + in: query + required: false + schema: + type: boolean + description: Whether to return the extended benchmark. + default: false + title: Extended Response + description: Whether to return the extended benchmark. + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + enum: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + examples: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + default: -created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + additionalProperties: false + description: Filter for list benchmarks query. + properties: + name: + description: Filter benchmarks by name. + title: Name + type: string + description: + description: Filter benchmarks by description. + title: Description + type: string + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: Filter custom benchmarks by dataset used for evaluation + (format workspace/fileset-name). + project: + description: Filter benchmarks by project name. + title: Project + type: string + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter benchmarks by creation date range. + updated_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter benchmarks by last update date range. + title: BenchmarksListFilter + type: object + description: 'Filter benchmarks by name, description, dataset, project, and + dates. Supports JSON filter syntax with operators: $eq, $like, $lt, $lte, + $gt, $gte, $in, $nin, $and, $or, $not. Also supports text filter syntax.' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarksListResponse' + '400': + description: Invalid Request Body + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + post: + tags: + - Evaluator + summary: Create Benchmark + description: 'Create a new custom evaluation benchmark. + + + Benchmarks can be reused across multiple evaluations. The benchmark type determines + + the evaluation method (currently only LLM-as-a-Judge is supported).' + operationId: create_benchmark_apis_evaluation_v2_workspaces__workspace__benchmarks_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: extended_response + in: query + required: false + schema: + type: boolean + description: Whether to return the extended benchmark. + default: false + title: Extended Response + description: Whether to return the extended benchmark. + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/BenchmarkRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + anyOf: + - $ref: '#/components/schemas/Benchmark' + - $ref: '#/components/schemas/ExtendedBenchmark' + title: Response Create Benchmark Apis Evaluation V2 Workspaces Workspace Benchmarks + Post + '400': + description: Invalid Request Body + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '403': + description: Operation Not Permitted + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '409': + description: Benchmark Already Exists + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + /apis/evaluation/v2/workspaces/{workspace}/benchmarks/{name}: + get: + tags: + - Evaluator + summary: Get Benchmark + description: Get a specific evaluation benchmark by workspace and benchmark + name. + operationId: get_benchmark_apis_evaluation_v2_workspaces__workspace__benchmarks__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: extended_response + in: query + required: false + schema: + type: boolean + description: Whether to return the extended benchmark. + default: false + title: Extended Response + description: Whether to return the extended benchmark. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + anyOf: + - $ref: '#/components/schemas/Benchmark' + - $ref: '#/components/schemas/ExtendedBenchmark' + - $ref: '#/components/schemas/SystemBenchmark' + title: Response Get Benchmark Apis Evaluation V2 Workspaces Workspace Benchmarks Name Get + '404': + description: Benchmark Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + delete: + tags: + - Evaluator + summary: Delete Benchmark + description: Delete a custom evaluation benchmark. Predefined benchmarks cannot + be deleted. + operationId: delete_benchmark_apis_evaluation_v2_workspaces__workspace__benchmarks__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Benchmark Deleted Successfully + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '400': + description: Invalid Request Body + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '403': + description: Operation Not Permitted + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '404': + description: Benchmark Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + /apis/evaluation/v2/workspaces/{workspace}/metric-evaluate: + post: + tags: + - Evaluator + summary: Evaluate Metric + description: 'Run a synchronous metric evaluation on a dataset. + + + This endpoint evaluates the given dataset using the specified metric and returns + + results immediately. Use this for quick, interactive evaluations with small + datasets + + (up to 10 rows). For larger evaluations, use the async job-based evaluation + endpoints. + + + The metric can be specified either as a URN reference to a stored metric + + (e.g., "workspace/metric_name") or as an inline metric definition. + + + The dataset must be provided inline with rows. + + + **Aggregate Score Fields:** + + The `name` and `count` fields are always included in aggregate scores. + + By default, additional fields returned are: nan_count, sum, mean, min, max. + + Use the `aggregate_fields` query parameter to customize which optional fields + + are included (e.g., std_dev, variance, percentiles, histogram, rubric_distribution, + mode_category).' + operationId: evaluate_metric_apis_evaluation_v2_workspaces__workspace__metric_evaluate_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: aggregate_fields + in: query + required: false + schema: + type: array + items: + enum: + - nan_count + - sum + - mean + - min + - max + - std_dev + - variance + - score_type + - percentiles + - histogram + - rubric_distribution + - mode_category + type: string + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + default: [] + title: Aggregate Fields + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationRequest' + responses: + '200': + description: Evaluation Completed Successfully + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationResponse' + '400': + description: Invalid Request Body + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '404': + description: Metric Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + /apis/evaluation/v2/workspaces/{workspace}/metric-job-results: + get: + tags: + - Evaluator + summary: List Metric Job Results + description: List stored evaluation results for metric jobs. + operationId: list_metric_job_results_apis_evaluation_v2_workspaces__workspace__metric_job_results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + enum: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + examples: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + default: -created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - name: aggregate_fields + in: query + required: false + schema: + type: array + items: + enum: + - nan_count + - sum + - mean + - min + - max + - std_dev + - variance + - score_type + - percentiles + - histogram + - rubric_distribution + - mode_category + type: string + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + default: [] + title: Aggregate Fields + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + additionalProperties: false + description: Filter for list metric job results. + properties: + name: + description: Filter job results by name. + title: Name + type: string + metric: + allOf: + - $ref: '#/components/schemas/MetricRef' + description: Filter results by metric reference. Jobs with inline metric + configuration will not be included when filtering by metric. + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: Filter results by dataset if the metric job is configured + with the fileset reference. + model: + allOf: + - $ref: '#/components/schemas/ModelRef' + description: Filter results by model if the metric job is configured + with the model reference. + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter job results by creation date range. + title: MetricJobResultsListFilter + type: object + description: 'Filter metric job results by name, metric, dataset, model, and + dates. Supports JSON filter syntax with operators: $eq, $like, $lt, $lte, + $gt, $gte, $in, $nin, $and, $or, $not. Also supports text filter syntax.' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/MetricJobResultsListResponse' + '422': + description: Query Parameter Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + /apis/evaluation/v2/workspaces/{workspace}/metric-job-results/{name}: + get: + tags: + - Evaluator + summary: Get Metric Job Result + description: Get a specific metric job result by workspace and job name. + operationId: get_metric_job_result_apis_evaluation_v2_workspaces__workspace__metric_job_results__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: aggregate_fields + in: query + required: false + schema: + type: array + items: + enum: + - nan_count + - sum + - mean + - min + - max + - std_dev + - variance + - score_type + - percentiles + - histogram + - rubric_distribution + - mode_category + type: string + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + default: [] + title: Aggregate Fields + description: 'Aggregate score fields to include in the response (comma-separated + or repeated). Default: (''nan_count'', ''sum'', ''mean'', ''min'', ''max''). + Available: (''nan_count'', ''sum'', ''mean'', ''min'', ''max'', ''std_dev'', + ''variance'', ''score_type'', ''percentiles'', ''histogram'', ''rubric_distribution'', + ''mode_category'').' + responses: + '200': + description: Metric Job Result Found + content: + application/json: + schema: + $ref: '#/components/schemas/MetricJobResult' + '404': + description: Metric Job Result Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Evaluator + summary: Delete Metric Job Result + description: Delete an evaluation metric job result. + operationId: delete_metric_job_result_apis_evaluation_v2_workspaces__workspace__metric_job_results__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Metric Job Result Deleted Successfully + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '404': + description: Metric Job Result Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs: + post: + tags: + - Evaluator + summary: Create Job + operationId: create_job_apis_evaluation_v2_workspaces__workspace__metric_jobs_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationJobRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationJob' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Evaluator + summary: List Jobs + operationId: list_jobs_apis_evaluation_v2_workspaces__workspace__metric_jobs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/MetricEvaluationJobsSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: -created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/MetricEvaluationJobsListFilter' + description: Filter jobs on various criteria. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationJobsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{job}/results/aggregate-scores/download: + get: + tags: + - Evaluator + summary: Download Job Result Aggregate-Scores + operationId: download_job_result_aggregate_scores_apis_evaluation_v2_workspaces__workspace__metric_jobs__job__results_aggregate_scores_download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/AggregatedMetricResult' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{job}/results/artifacts/download: + get: + tags: + - Evaluator + summary: Download Job Result Artifacts + operationId: download_job_result_artifacts_apis_evaluation_v2_workspaces__workspace__metric_jobs__job__results_artifacts_download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + responses: + '200': + description: Successful Response + content: + application/octet-stream: + schema: + type: string + format: binary + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{job}/results/row-scores/download: + get: + tags: + - Evaluator + summary: Download Job Result Row-Scores + operationId: download_job_result_row_scores_apis_evaluation_v2_workspaces__workspace__metric_jobs__job__results_row_scores_download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: limit + in: query + required: false + schema: + title: Limit + type: integer + responses: + '200': + description: Successful Response + content: + application/jsonl: + schema: + $ref: '#/components/schemas/RowScore' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{job}/results/{name}: + get: + tags: + - Evaluator + summary: Get Job Result + operationId: get_job_result_apis_evaluation_v2_workspaces__workspace__metric_jobs__job__results__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResultResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{job}/results/{name}/download: + get: + tags: + - Evaluator + summary: Download Job Result + operationId: download_job_result_apis_evaluation_v2_workspaces__workspace__metric_jobs__job__results__name__download_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/octet-stream: + schema: + type: string + format: binary + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{name}: + get: + tags: + - Evaluator + summary: Get Job + operationId: get_job_apis_evaluation_v2_workspaces__workspace__metric_jobs__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationJob' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Evaluator + summary: Delete Job + operationId: delete_job_apis_evaluation_v2_workspaces__workspace__metric_jobs__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Successful Response + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{name}/cancel: + post: + tags: + - Evaluator + summary: Cancel Job + operationId: cancel_job_apis_evaluation_v2_workspaces__workspace__metric_jobs__name__cancel_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/MetricEvaluationJob' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{name}/logs: + get: + tags: + - Evaluator + summary: Get Job Logs + operationId: get_job_logs_apis_evaluation_v2_workspaces__workspace__metric_jobs__name__logs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: limit + in: query + required: false + schema: + title: Limit + type: integer + - name: page_cursor + in: query + required: false + schema: + title: Page Cursor + type: string + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobLogPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{name}/results: + get: + tags: + - Evaluator + summary: List Job Results + operationId: list_job_results_apis_evaluation_v2_workspaces__workspace__metric_jobs__name__results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobListResultResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metric-jobs/{name}/status: + get: + tags: + - Evaluator + summary: Get Job Status + operationId: get_job_status_apis_evaluation_v2_workspaces__workspace__metric_jobs__name__status_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStatusResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/evaluation/v2/workspaces/{workspace}/metrics: + get: + tags: + - Evaluator + summary: List Metrics + description: List evaluation metrics. + operationId: list_metrics_apis_evaluation_v2_workspaces__workspace__metrics_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + enum: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + examples: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + default: -created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + additionalProperties: false + description: Filter for list metrics query. + properties: + name: + description: Filter metrics by name. + title: Name + type: string + description: + description: Filter metrics by description. + title: Description + type: string + type: + allOf: + - $ref: '#/components/schemas/MetricType' + description: Filter metrics by metric type (e.g. llm-judge, exact-match, + route, system) + project: + description: Filter metrics by project name. + title: Project + type: string + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter metrics by creation date range. + updated_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter metrics by last update date range. + title: MetricsListFilter + type: object + description: 'Filter metrics by name, description, type, project, and dates. + Supports JSON filter syntax with operators: $eq, $like, $lt, $lte, $gt, + $gte, $in, $nin, $and, $or, $not. Also supports text filter syntax.' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/MetricsListResponse' + '400': + description: Invalid Request Body + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + /apis/evaluation/v2/workspaces/{workspace}/metrics/{name}: + get: + tags: + - Evaluator + summary: Get Metric + description: Get a specific evaluation metric by workspace and metric name. + operationId: get_metric_apis_evaluation_v2_workspaces__workspace__metrics__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Metric Found + content: + application/json: + schema: + oneOf: + - $ref: '#/components/schemas/LLMJudgeMetricResponse' + - $ref: '#/components/schemas/TopicAdherenceMetricResponse' + - $ref: '#/components/schemas/AgentGoalAccuracyMetricResponse' + - $ref: '#/components/schemas/AnswerAccuracyMetricResponse' + - $ref: '#/components/schemas/ContextRelevanceMetricResponse' + - $ref: '#/components/schemas/ResponseGroundednessMetricResponse' + - $ref: '#/components/schemas/ContextRecallMetricResponse' + - $ref: '#/components/schemas/ContextPrecisionMetricResponse' + - $ref: '#/components/schemas/ContextEntityRecallMetricResponse' + - $ref: '#/components/schemas/ResponseRelevancyMetricResponse' + - $ref: '#/components/schemas/FaithfulnessMetricResponse' + - $ref: '#/components/schemas/NoiseSensitivityMetricResponse' + - $ref: '#/components/schemas/ToolCallAccuracyMetricResponse' + - $ref: '#/components/schemas/BLEUMetricResponse' + - $ref: '#/components/schemas/ExactMatchMetricResponse' + - $ref: '#/components/schemas/F1MetricResponse' + - $ref: '#/components/schemas/NumberCheckMetricResponse' + - $ref: '#/components/schemas/RemoteMetricResponse' + - $ref: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + - $ref: '#/components/schemas/ROUGEMetricResponse' + - $ref: '#/components/schemas/StringCheckMetricResponse' + - $ref: '#/components/schemas/ToolCallingMetricResponse' + - $ref: '#/components/schemas/SystemMetricResponse' + - $ref: '#/components/schemas/LLMJudgeMetricResponse' + - $ref: '#/components/schemas/TopicAdherenceMetricResponse' + - $ref: '#/components/schemas/AgentGoalAccuracyMetricResponse' + - $ref: '#/components/schemas/AnswerAccuracyMetricResponse' + - $ref: '#/components/schemas/ContextRelevanceMetricResponse' + - $ref: '#/components/schemas/ResponseGroundednessMetricResponse' + - $ref: '#/components/schemas/ContextRecallMetricResponse' + - $ref: '#/components/schemas/ContextPrecisionMetricResponse' + - $ref: '#/components/schemas/ContextEntityRecallMetricResponse' + - $ref: '#/components/schemas/ResponseRelevancyMetricResponse' + - $ref: '#/components/schemas/FaithfulnessMetricResponse' + - $ref: '#/components/schemas/NoiseSensitivityMetricResponse' + - $ref: '#/components/schemas/ToolCallAccuracyMetricResponse' + - $ref: '#/components/schemas/BLEUMetricResponse' + - $ref: '#/components/schemas/ExactMatchMetricResponse' + - $ref: '#/components/schemas/F1MetricResponse' + - $ref: '#/components/schemas/NumberCheckMetricResponse' + - $ref: '#/components/schemas/RemoteMetricResponse' + - $ref: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + - $ref: '#/components/schemas/ROUGEMetricResponse' + - $ref: '#/components/schemas/StringCheckMetricResponse' + - $ref: '#/components/schemas/ToolCallingMetricResponse' + - $ref: '#/components/schemas/SystemMetricResponse' + discriminator: + propertyName: type + mapping: + llm-judge: '#/components/schemas/LLMJudgeMetricResponse' + topic_adherence: '#/components/schemas/TopicAdherenceMetricResponse' + agent_goal_accuracy: '#/components/schemas/AgentGoalAccuracyMetricResponse' + answer_accuracy: '#/components/schemas/AnswerAccuracyMetricResponse' + context_relevance: '#/components/schemas/ContextRelevanceMetricResponse' + response_groundedness: '#/components/schemas/ResponseGroundednessMetricResponse' + context_recall: '#/components/schemas/ContextRecallMetricResponse' + context_precision: '#/components/schemas/ContextPrecisionMetricResponse' + context_entity_recall: '#/components/schemas/ContextEntityRecallMetricResponse' + response_relevancy: '#/components/schemas/ResponseRelevancyMetricResponse' + faithfulness: '#/components/schemas/FaithfulnessMetricResponse' + noise_sensitivity: '#/components/schemas/NoiseSensitivityMetricResponse' + tool_call_accuracy: '#/components/schemas/ToolCallAccuracyMetricResponse' + bleu: '#/components/schemas/BLEUMetricResponse' + exact-match: '#/components/schemas/ExactMatchMetricResponse' + f1: '#/components/schemas/F1MetricResponse' + number-check: '#/components/schemas/NumberCheckMetricResponse' + remote: '#/components/schemas/RemoteMetricResponse' + nemo-agent-toolkit-remote: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + rouge: '#/components/schemas/ROUGEMetricResponse' + string-check: '#/components/schemas/StringCheckMetricResponse' + tool-calling: '#/components/schemas/ToolCallingMetricResponse' + system: '#/components/schemas/SystemMetricResponse' + system-retriever: '#/components/schemas/SystemMetricResponse' + title: Response 200 Get Metric Apis Evaluation V2 Workspaces Workspace Metrics Name Get + '404': + description: Metric Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - Evaluator + summary: Create Metric + description: 'Create a new custom evaluation metric. + + + Metrics can be reused across multiple evaluations. The metric type determines + + the evaluation method (currently only LLM-as-a-Judge is supported).' + operationId: create_metric_apis_evaluation_v2_workspaces__workspace__metrics__name__post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + oneOf: + - $ref: '#/components/schemas/LLMJudgeMetricInput' + - $ref: '#/components/schemas/TopicAdherenceMetricInput' + - $ref: '#/components/schemas/AgentGoalAccuracyMetricInput' + - $ref: '#/components/schemas/AnswerAccuracyMetricInput' + - $ref: '#/components/schemas/ContextRelevanceMetricInput' + - $ref: '#/components/schemas/ResponseGroundednessMetricInput' + - $ref: '#/components/schemas/ContextRecallMetricInput' + - $ref: '#/components/schemas/ContextPrecisionMetricInput' + - $ref: '#/components/schemas/ContextEntityRecallMetricInput' + - $ref: '#/components/schemas/ResponseRelevancyMetricInput' + - $ref: '#/components/schemas/FaithfulnessMetricInput' + - $ref: '#/components/schemas/NoiseSensitivityMetricInput' + - $ref: '#/components/schemas/ToolCallAccuracyMetricInput' + - $ref: '#/components/schemas/BLEUMetricInput' + - $ref: '#/components/schemas/ExactMatchMetricInput' + - $ref: '#/components/schemas/F1MetricInput' + - $ref: '#/components/schemas/NumberCheckMetricInput' + - $ref: '#/components/schemas/RemoteMetricInput' + - $ref: '#/components/schemas/NemoAgentToolkitRemoteMetricInput' + - $ref: '#/components/schemas/ROUGEMetricInput' + - $ref: '#/components/schemas/StringCheckMetricInput' + - $ref: '#/components/schemas/ToolCallingMetricInput' + discriminator: + propertyName: type + mapping: + llm-judge: '#/components/schemas/LLMJudgeMetricInput' + topic_adherence: '#/components/schemas/TopicAdherenceMetricInput' + agent_goal_accuracy: '#/components/schemas/AgentGoalAccuracyMetricInput' + answer_accuracy: '#/components/schemas/AnswerAccuracyMetricInput' + context_relevance: '#/components/schemas/ContextRelevanceMetricInput' + response_groundedness: '#/components/schemas/ResponseGroundednessMetricInput' + context_recall: '#/components/schemas/ContextRecallMetricInput' + context_precision: '#/components/schemas/ContextPrecisionMetricInput' + context_entity_recall: '#/components/schemas/ContextEntityRecallMetricInput' + response_relevancy: '#/components/schemas/ResponseRelevancyMetricInput' + faithfulness: '#/components/schemas/FaithfulnessMetricInput' + noise_sensitivity: '#/components/schemas/NoiseSensitivityMetricInput' + tool_call_accuracy: '#/components/schemas/ToolCallAccuracyMetricInput' + bleu: '#/components/schemas/BLEUMetricInput' + exact-match: '#/components/schemas/ExactMatchMetricInput' + f1: '#/components/schemas/F1MetricInput' + number-check: '#/components/schemas/NumberCheckMetricInput' + remote: '#/components/schemas/RemoteMetricInput' + nemo-agent-toolkit-remote: '#/components/schemas/NemoAgentToolkitRemoteMetricInput' + rouge: '#/components/schemas/ROUGEMetricInput' + string-check: '#/components/schemas/StringCheckMetricInput' + tool-calling: '#/components/schemas/ToolCallingMetricInput' + title: Metric Request + responses: + '200': + description: Successful Response + content: + application/json: + schema: + oneOf: + - $ref: '#/components/schemas/LLMJudgeMetricResponse' + - $ref: '#/components/schemas/TopicAdherenceMetricResponse' + - $ref: '#/components/schemas/AgentGoalAccuracyMetricResponse' + - $ref: '#/components/schemas/AnswerAccuracyMetricResponse' + - $ref: '#/components/schemas/ContextRelevanceMetricResponse' + - $ref: '#/components/schemas/ResponseGroundednessMetricResponse' + - $ref: '#/components/schemas/ContextRecallMetricResponse' + - $ref: '#/components/schemas/ContextPrecisionMetricResponse' + - $ref: '#/components/schemas/ContextEntityRecallMetricResponse' + - $ref: '#/components/schemas/ResponseRelevancyMetricResponse' + - $ref: '#/components/schemas/FaithfulnessMetricResponse' + - $ref: '#/components/schemas/NoiseSensitivityMetricResponse' + - $ref: '#/components/schemas/ToolCallAccuracyMetricResponse' + - $ref: '#/components/schemas/BLEUMetricResponse' + - $ref: '#/components/schemas/ExactMatchMetricResponse' + - $ref: '#/components/schemas/F1MetricResponse' + - $ref: '#/components/schemas/NumberCheckMetricResponse' + - $ref: '#/components/schemas/RemoteMetricResponse' + - $ref: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + - $ref: '#/components/schemas/ROUGEMetricResponse' + - $ref: '#/components/schemas/StringCheckMetricResponse' + - $ref: '#/components/schemas/ToolCallingMetricResponse' + - $ref: '#/components/schemas/SystemMetricResponse' + discriminator: + propertyName: type + mapping: + llm-judge: '#/components/schemas/LLMJudgeMetricResponse' + topic_adherence: '#/components/schemas/TopicAdherenceMetricResponse' + agent_goal_accuracy: '#/components/schemas/AgentGoalAccuracyMetricResponse' + answer_accuracy: '#/components/schemas/AnswerAccuracyMetricResponse' + context_relevance: '#/components/schemas/ContextRelevanceMetricResponse' + response_groundedness: '#/components/schemas/ResponseGroundednessMetricResponse' + context_recall: '#/components/schemas/ContextRecallMetricResponse' + context_precision: '#/components/schemas/ContextPrecisionMetricResponse' + context_entity_recall: '#/components/schemas/ContextEntityRecallMetricResponse' + response_relevancy: '#/components/schemas/ResponseRelevancyMetricResponse' + faithfulness: '#/components/schemas/FaithfulnessMetricResponse' + noise_sensitivity: '#/components/schemas/NoiseSensitivityMetricResponse' + tool_call_accuracy: '#/components/schemas/ToolCallAccuracyMetricResponse' + bleu: '#/components/schemas/BLEUMetricResponse' + exact-match: '#/components/schemas/ExactMatchMetricResponse' + f1: '#/components/schemas/F1MetricResponse' + number-check: '#/components/schemas/NumberCheckMetricResponse' + remote: '#/components/schemas/RemoteMetricResponse' + nemo-agent-toolkit-remote: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + rouge: '#/components/schemas/ROUGEMetricResponse' + string-check: '#/components/schemas/StringCheckMetricResponse' + tool-calling: '#/components/schemas/ToolCallingMetricResponse' + system: '#/components/schemas/SystemMetricResponse' + system-retriever: '#/components/schemas/SystemMetricResponse' + title: Response Create Metric Apis Evaluation V2 Workspaces Workspace Metrics Name Post + '201': + description: Metric Created Successfully + content: + application/json: + schema: + oneOf: + - $ref: '#/components/schemas/LLMJudgeMetricResponse' + - $ref: '#/components/schemas/TopicAdherenceMetricResponse' + - $ref: '#/components/schemas/AgentGoalAccuracyMetricResponse' + - $ref: '#/components/schemas/AnswerAccuracyMetricResponse' + - $ref: '#/components/schemas/ContextRelevanceMetricResponse' + - $ref: '#/components/schemas/ResponseGroundednessMetricResponse' + - $ref: '#/components/schemas/ContextRecallMetricResponse' + - $ref: '#/components/schemas/ContextPrecisionMetricResponse' + - $ref: '#/components/schemas/ContextEntityRecallMetricResponse' + - $ref: '#/components/schemas/ResponseRelevancyMetricResponse' + - $ref: '#/components/schemas/FaithfulnessMetricResponse' + - $ref: '#/components/schemas/NoiseSensitivityMetricResponse' + - $ref: '#/components/schemas/ToolCallAccuracyMetricResponse' + - $ref: '#/components/schemas/BLEUMetricResponse' + - $ref: '#/components/schemas/ExactMatchMetricResponse' + - $ref: '#/components/schemas/F1MetricResponse' + - $ref: '#/components/schemas/NumberCheckMetricResponse' + - $ref: '#/components/schemas/RemoteMetricResponse' + - $ref: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + - $ref: '#/components/schemas/ROUGEMetricResponse' + - $ref: '#/components/schemas/StringCheckMetricResponse' + - $ref: '#/components/schemas/ToolCallingMetricResponse' + - $ref: '#/components/schemas/SystemMetricResponse' + discriminator: + propertyName: type + mapping: + llm-judge: '#/components/schemas/LLMJudgeMetricResponse' + topic_adherence: '#/components/schemas/TopicAdherenceMetricResponse' + agent_goal_accuracy: '#/components/schemas/AgentGoalAccuracyMetricResponse' + answer_accuracy: '#/components/schemas/AnswerAccuracyMetricResponse' + context_relevance: '#/components/schemas/ContextRelevanceMetricResponse' + response_groundedness: '#/components/schemas/ResponseGroundednessMetricResponse' + context_recall: '#/components/schemas/ContextRecallMetricResponse' + context_precision: '#/components/schemas/ContextPrecisionMetricResponse' + context_entity_recall: '#/components/schemas/ContextEntityRecallMetricResponse' + response_relevancy: '#/components/schemas/ResponseRelevancyMetricResponse' + faithfulness: '#/components/schemas/FaithfulnessMetricResponse' + noise_sensitivity: '#/components/schemas/NoiseSensitivityMetricResponse' + tool_call_accuracy: '#/components/schemas/ToolCallAccuracyMetricResponse' + bleu: '#/components/schemas/BLEUMetricResponse' + exact-match: '#/components/schemas/ExactMatchMetricResponse' + f1: '#/components/schemas/F1MetricResponse' + number-check: '#/components/schemas/NumberCheckMetricResponse' + remote: '#/components/schemas/RemoteMetricResponse' + nemo-agent-toolkit-remote: '#/components/schemas/NemoAgentToolkitRemoteMetricResponse' + rouge: '#/components/schemas/ROUGEMetricResponse' + string-check: '#/components/schemas/StringCheckMetricResponse' + tool-calling: '#/components/schemas/ToolCallingMetricResponse' + system: '#/components/schemas/SystemMetricResponse' + system-retriever: '#/components/schemas/SystemMetricResponse' + title: Response 201 Create Metric Apis Evaluation V2 Workspaces Workspace Metrics Name Post + '400': + description: Invalid Request Body + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '403': + description: Not Authorized to Create Metric. + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '409': + description: Metric Already Exists + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + delete: + tags: + - Evaluator + summary: Delete Metric + description: Delete a custom evaluation metric. Predefined metrics cannot be + deleted. + operationId: delete_metric_apis_evaluation_v2_workspaces__workspace__metrics__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Metric Deleted Successfully + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '403': + description: Not Authorized to Delete Metric. + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '404': + description: Metric Not Found + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '500': + description: Internal Server Error + content: + application/json: + schema: + $ref: '#/components/schemas/ErrorResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/files/v2/workspaces/{workspace}/filesets: + post: + tags: + - Files + summary: Create Fileset + description: 'Create a new fileset. + + + If no storage configuration is provided, the default storage backend will + be used.' + operationId: create_fileset_apis_files_v2_workspaces__workspace__filesets_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateFilesetRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetOutput' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Files + summary: List Filesets + description: 'List Filesets endpoint with filtering and pagination. + + + Supports filtering by name, description, purpose, storage_type, created_at, + and updated_at via query parameters. + + Returns paginated results with sorting options.' + operationId: list_filesets_apis_files_v2_workspaces__workspace__filesets_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 100 + minimum: 1 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/GenericSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: -created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/FilesetFilter' + description: Filter filesets by name, description, purpose, storage_type, + created_at, and updated_at. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetOutputsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/files/v2/workspaces/{workspace}/filesets/{name}: + get: + tags: + - Files + summary: Get Fileset by Workspace and Name + description: 'Get Fileset by Workspace and Name. + + + Returns the details of a specific fileset identified by its workspace and + name.' + operationId: retrieve_fileset_apis_files_v2_workspaces__workspace__filesets__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetOutput' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Files + summary: Delete Fileset + description: 'Delete Fileset. + + + Permanently deletes a fileset from the platform. + + Returns metadata about the deleted fileset. + + For local storage backends, this also deletes the underlying files.' + operationId: delete_fileset_apis_files_v2_workspaces__workspace__filesets__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetOutput' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + patch: + tags: + - Files + summary: Update Fileset Metadata + description: Update Fileset Metadata. + operationId: update_fileset_metadata_apis_files_v2_workspaces__workspace__filesets__name__patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateFilesetRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetOutput' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/files/v2/workspaces/{workspace}/filesets/{name}/-/{path}: + head: + tags: + - Files + summary: Get File Metadata + description: 'Get file metadata without downloading content. + + + HEAD requests are often used before Range GETs to ensure the server + + supports partial downloads (e.g., DuckDB''s httpfs). + + Returns Accept-Ranges, Content-Length, and Content-Type headers.' + operationId: head_file_apis_files_v2_workspaces__workspace__filesets__name_____path__head + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: path + in: path + required: true + schema: + type: string + title: Path + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Files + summary: Download File Content + description: 'Download file content from a fileset. + + + Supports HTTP Range requests for partial content retrieval (status 206). + + Returns the full file content (status 200) if no Range header is provided. + + For external resources (HuggingFace, NGC), content is cached locally on first + access.' + operationId: download_file_apis_files_v2_workspaces__workspace__filesets__name_____path__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: path + in: path + required: true + schema: + type: string + title: Path + responses: + '200': + description: Successful Response + content: + application/octet-stream: + schema: + type: string + format: binary + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Files + summary: Upload Fileset Content + description: Upload file content to a fileset. + operationId: upload_file_apis_files_v2_workspaces__workspace__filesets__name_____path__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: path + in: path + required: true + schema: + type: string + title: Path + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetFileOutput' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/octet-stream: + schema: + type: string + format: binary + description: Raw binary file content + required: true + description: Upload the file either as a raw octet stream. + delete: + tags: + - Files + summary: Delete a specific file from a fileset + description: 'Delete a specific file from a fileset. + + + Permanently deletes the file from the storage backend. + + Returns metadata about the deleted file.' + operationId: delete_file_apis_files_v2_workspaces__workspace__filesets__name_____path__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: path + in: path + required: true + schema: + type: string + title: Path + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/FilesetFileOutput' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/files/v2/workspaces/{workspace}/filesets/{name}/files: + get: + tags: + - Files + summary: List Fileset Files + description: 'List Files in Fileset. + + + Returns a list of files stored in the specified fileset. + + Optionally filter by path prefix to list files under a specific directory. + + + Each file includes a cache_status field: + + - "not_cacheable": File is on default storage, caching not applicable + + - "cached": File exists in cache storage + + - "caching": File is currently being downloaded and cached + + - "not_cached": File not in cache, will be cached on next download + + - null: External storage, but cache status not checked (use include_cache_status=true)' + operationId: list_fileset_files_apis_files_v2_workspaces__workspace__filesets__name__files_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: path + in: query + required: false + schema: + description: Filter files by path prefix + title: Path + type: string + description: Filter files by path prefix + - name: include_cache_status + in: query + required: false + schema: + type: boolean + description: Check and return cache status for each file. When false, storage + files return null for cache_status. + default: false + title: Include Cache Status + description: Check and return cache status for each file. When false, storage + files return null for cache_status. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ListFilesetFilesResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/files/v2/workspaces/{workspace}/filesets/{name}/otlp/v1/logs: + post: + tags: + - OTLP + summary: Upload OTLP Logs to Fileset + description: 'Upload OTLP logs to a specified fileset in JSON or Protobuf format. + + + Supports both application/json and application/x-protobuf content types.' + operationId: upload_otlp_logs_apis_files_v2_workspaces__workspace__filesets__name__otlp_v1_logs_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: content-type + in: header + required: false + schema: + type: string + default: application/json + title: Content-Type + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/OtelExportLogsServiceResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/files/v2/workspaces/{workspace}/filesets/{name}/otlp/v1/logs/query: + post: + tags: + - OTLP + summary: Query OTLP Logs from Fileset + description: 'Query logs from parquet files in a fileset. + + + This is an internal endpoint that runs DuckDB queries with direct storage + + access.' + operationId: query_otlp_logs_apis_files_v2_workspaces__workspace__filesets__name__otlp_v1_logs_query_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/LogQueryRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobLogPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/guardrails/v2/workspaces/{workspace}/checks: + post: + tags: + - Guardrails + summary: Guardrail check request + description: Chat completion for the provided conversation. + operationId: check_apis_guardrails_v2_workspaces__workspace__checks_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailCheckRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailCheckResponse' + '400': + description: Invalid Request Body + '422': + description: Validation Error + '500': + description: Internal Server Error + /apis/guardrails/v2/workspaces/{workspace}/configs: + get: + tags: + - Guardrails + summary: List Guardrail Configs + description: 'List available guardrail configs. + + + Lists guardrail configs for a specific workspace.' + operationId: list_guardrail_configs_apis_guardrails_v2_workspaces__workspace__configs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/GenericSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/GuardrailConfigFilter' + description: Filter guardrail configs by name, description, project, created_at, + and updated_at. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailConfigsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - Guardrails + summary: Create Config + description: Create a new guardrail config. + operationId: create_config_apis_guardrails_v2_workspaces__workspace__configs_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailConfigInput' + responses: + '201': + description: Config created successfully. + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailConfig' + '422': + description: Validation Error + '409': + description: Config already exists. + /apis/guardrails/v2/workspaces/{workspace}/configs/{name}: + get: + tags: + - Guardrails + summary: Get Guardrail Config + description: Get info about a guardrail configuration. + operationId: get_guardrail_config_apis_guardrails_v2_workspaces__workspace__configs__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailConfig' + '404': + description: Config does not exist. + '422': + description: Validation Error + patch: + tags: + - Guardrails + summary: Update Config + description: 'Update model metadata. If the request body has an empty field, + + keep the old value.' + operationId: update_config_apis_guardrails_v2_workspaces__workspace__configs__name__patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailConfigUpdate' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/GuardrailConfig' + '404': + description: Config does not exist. + '422': + description: Validation Error + delete: + tags: + - Guardrails + summary: Delete Config + description: Delete a guardrail config. + operationId: delete_config_apis_guardrails_v2_workspaces__workspace__configs__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful model deletion. + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteResponse' + '404': + description: Config does not exist. + '422': + description: Unable to delete config due to validation error while processing + request. + /apis/inference-gateway/v2/workspaces/{workspace}/model/{name}/-/{trailing_uri}: + patch: + tags: + - Inference Gateway + summary: Model Inference Proxy PATCH + description: 'Proxy requests to model entity inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: gateway_proxy_patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy PATCH request to model entity inference endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + delete: + tags: + - Inference Gateway + summary: Model Inference Proxy DELETE + description: 'Proxy requests to model entity inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: gateway_proxy_delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy DELETE request to model entity inference endpoint + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Inference Gateway + summary: Model Inference Proxy PUT + description: 'Proxy requests to model entity inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: gateway_proxy_put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy PUT request to model entity inference endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + post: + tags: + - Inference Gateway + summary: Model Inference Proxy POST + description: 'Proxy requests to model entity inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: gateway_proxy_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy POST request to model entity inference endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + get: + tags: + - Inference Gateway + summary: Model Inference Proxy GET + description: 'Proxy requests to model entity inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: gateway_proxy_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy GET request to model entity inference endpoint + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/openai/-/v1/models: + get: + tags: + - Inference Gateway + summary: OpenAI List Models + description: 'This endpoint aggregates models from all model entities and returns + them + + in OpenAI''s list models format. Each model ID is the model entity identifier + + in format workspace/model_entity_name.' + operationId: openai_proxy_list_models + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: List models request to OpenAI-compatible endpoint + content: + application/json: + schema: + $ref: '#/components/schemas/OpenAIListModelsResp' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/openai/-/v1/models/{name}: + get: + tags: + - Inference Gateway + summary: OpenAI Get Model + description: 'Retrieve information about a specific OpenAI-compatible model. + + Workspace is always taken from the URL path; name may be model_entity_name + + or workspace/model_entity_name (workspace prefix is ignored).' + operationId: openai_proxy_get_model + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Get model request to OpenAI-compatible endpoint + content: + application/json: + schema: + $ref: '#/components/schemas/OpenAIModelResp' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/openai/-/{trailing_uri}: + patch: + tags: + - Inference Gateway + summary: OpenAI Inference Proxy PATCH + description: 'Proxy requests to OpenAI-compatible inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: openai_proxy_patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy PATCH request to OpenAI-compatible endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + delete: + tags: + - Inference Gateway + summary: OpenAI Inference Proxy DELETE + description: 'Proxy requests to OpenAI-compatible inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: openai_proxy_delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy DELETE request to OpenAI-compatible endpoint + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Inference Gateway + summary: OpenAI Inference Proxy PUT + description: 'Proxy requests to OpenAI-compatible inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: openai_proxy_put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy PUT request to OpenAI-compatible endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + post: + tags: + - Inference Gateway + summary: OpenAI Inference Proxy POST + description: 'Proxy requests to OpenAI-compatible inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: openai_proxy_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy POST request to OpenAI-compatible endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + get: + tags: + - Inference Gateway + summary: OpenAI Inference Proxy GET + description: 'Proxy requests to OpenAI-compatible inference endpoints. + + + All inference requests must resolve to a `VirtualModel`. The platform''s + + provider reconciler auto-creates an implicit `autoprovisioned` VirtualModel + + for every served model entity (named after the entity, with + + `default_model_entity` set to the entity ref) so this is the typical case; + + operators can also create custom VirtualModels for routing, plugin chains, + + LoRA escape-hatches, etc. Requests for which no VirtualModel can be found + + return `404`.' + operationId: openai_proxy_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy GET request to OpenAI-compatible endpoint + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/provider/{name}/-/{trailing_uri}: + patch: + tags: + - Inference Gateway + summary: Provider Inference Proxy PATCH + description: Proxy requests to provider inference endpoints. + operationId: provider_proxy_patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy PATCH request to provider inference endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + delete: + tags: + - Inference Gateway + summary: Provider Inference Proxy DELETE + description: Proxy requests to provider inference endpoints. + operationId: provider_proxy_delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy DELETE request to provider inference endpoint + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Inference Gateway + summary: Provider Inference Proxy PUT + description: Proxy requests to provider inference endpoints. + operationId: provider_proxy_put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy PUT request to provider inference endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + post: + tags: + - Inference Gateway + summary: Provider Inference Proxy POST + description: Proxy requests to provider inference endpoints. + operationId: provider_proxy_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy POST request to provider inference endpoint + content: + application/json: + schema: + type: object + additionalProperties: true + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + requestBody: + content: + application/json: + schema: + type: object + additionalProperties: true + required: false + get: + tags: + - Inference Gateway + summary: Provider Inference Proxy GET + description: Proxy requests to provider inference endpoints. + operationId: provider_proxy_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: trailing_uri + in: path + required: true + schema: + type: string + title: Trailing Uri + responses: + '200': + description: Proxy GET request to provider inference endpoint + content: + application/json: + schema: {} + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/provider/{name}/ready: + get: + tags: + - Inference Gateway + summary: Check Provider Readiness + description: "Check if a model provider is registered in the gateway's cache.\n\ + \nThis is a lightweight endpoint that only checks the gateway's internal state,\n\ + without making any requests to the actual provider backend. Use this to verify\n\ + the gateway is ready to route requests to a provider after deployment.\n\n\ + Returns:\n 200 OK with provider info if the provider is registered\n \ + \ 404 Not Found if the provider is not yet in the gateway's cache" + operationId: provider_ready + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Check if the gateway can route to a provider + content: + application/json: + schema: + type: object + additionalProperties: true + title: Response Provider Ready + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/virtual-models: + post: + tags: + - Virtual Models + summary: Create VirtualModel + description: 'Create a new VirtualModel in the given workspace. + + + A VirtualModel defines an ordered middleware pipeline that IGW executes + + when an inference request arrives with ``model: "workspace/name"`` matching + + this entity.' + operationId: create_virtual_model + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateVirtualModelRequest' + responses: + '201': + description: VirtualModel created successfully. + content: + application/json: + schema: + $ref: '#/components/schemas/VirtualModel' + '409': + description: A VirtualModel with that name already exists in the workspace. + '422': + description: Validation error. + get: + tags: + - Virtual Models + summary: List VirtualModels + description: 'List VirtualModels for the given workspace. + + + Use ``workspace=-`` to list across all workspaces accessible to the caller.' + operationId: list_virtual_models + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number (1-indexed). + default: 1 + title: Page + description: Page number (1-indexed). + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 200 + minimum: 1 + description: Number of results per page. + default: 20 + title: Page Size + description: Number of results per page. + - name: sort + in: query + required: false + schema: + type: string + description: Sort field. Prefix with ``-`` for descending order. + default: -created_at + title: Sort + description: Sort field. Prefix with ``-`` for descending order. + responses: + '200': + description: Paginated list of virtual models + content: + application/json: + schema: + $ref: '#/components/schemas/VirtualModelsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/inference-gateway/v2/workspaces/{workspace}/virtual-models/{name}: + get: + tags: + - Virtual Models + summary: Get VirtualModel + description: Get a VirtualModel by workspace and name. + operationId: get_virtual_model + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: VirtualModel details + content: + application/json: + schema: + $ref: '#/components/schemas/VirtualModel' + '404': + description: VirtualModel not found. + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + patch: + tags: + - Virtual Models + summary: Update VirtualModel + description: 'Partially update a VirtualModel. + + + Only fields present in the request body are modified. Fields absent from + + the request body retain their current values.' + operationId: update_virtual_model + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateVirtualModelRequest' + responses: + '200': + description: Updated virtual model + content: + application/json: + schema: + $ref: '#/components/schemas/VirtualModel' + '404': + description: VirtualModel not found. + '409': + description: Concurrent modification conflict. + '422': + description: Validation error. + delete: + tags: + - Virtual Models + summary: Delete VirtualModel + description: 'Permanently delete a VirtualModel. + + + This does not affect any in-flight requests already being routed through + + this VirtualModel. IGW''s model cache is refreshed on its next polling cycle.' + operationId: delete_virtual_model + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: VirtualModel deleted + '404': + description: VirtualModel not found. + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/annotations: + post: + tags: + - Annotations + summary: Create Annotation + operationId: create_annotation_apis_intake_v2_workspaces__workspace__annotations_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/AnnotationInput' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Annotation' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Annotations + summary: List Annotations + operationId: list_annotations_apis_intake_v2_workspaces__workspace__annotations_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/AnnotationSortField' + default: -created_at + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/AnnotationFilter' + description: Filter annotations by span_id, session_id, kind, name, created_by, + and created_at range. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/AnnotationsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/annotations/{annotation_id}: + get: + tags: + - Annotations + summary: Get Annotation + operationId: get_annotation_apis_intake_v2_workspaces__workspace__annotations__annotation_id__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: annotation_id + in: path + required: true + schema: + type: string + title: Annotation Id + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Annotation' + '404': + description: Annotation not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Annotations + summary: Delete Annotation + operationId: delete_annotation_apis_intake_v2_workspaces__workspace__annotations__annotation_id__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: annotation_id + in: path + required: true + schema: + type: string + title: Annotation Id + responses: + '204': + description: Successful Response + '404': + description: Annotation not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/evaluator-results: + post: + tags: + - Evaluator Results + summary: Create Evaluator Result + operationId: create_evaluator_result_apis_intake_v2_workspaces__workspace__evaluator_results_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluatorResultInput' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluatorResult' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Evaluator Results + summary: List Evaluator Results + operationId: list_evaluator_results_apis_intake_v2_workspaces__workspace__evaluator_results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/EvaluatorResultSortField' + default: -created_at + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/EvaluatorResultFilter' + description: Filter evaluator results by span_id, session_id, name, data_type, + created_by, value range, and created_at range. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluatorResultsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/evaluator-results/{evaluator_result_id}: + get: + tags: + - Evaluator Results + summary: Get Evaluator Result + operationId: get_evaluator_result_apis_intake_v2_workspaces__workspace__evaluator_results__evaluator_result_id__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: evaluator_result_id + in: path + required: true + schema: + type: string + title: Evaluator Result Id + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluatorResult' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/experiment-groups: + post: + tags: + - Experiment Groups + summary: Create Experiment Group + operationId: create_experiment_group_apis_intake_v2_workspaces__workspace__experiment_groups_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentGroupRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentGroupResponse' + '409': + description: Experiment group already exists + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Experiment Groups + summary: List Experiment Groups + operationId: list_experiment_groups_apis_intake_v2_workspaces__workspace__experiment_groups_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + enum: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + type: string + description: Sort field; prefix with '-' for descending. + default: -created_at + title: Sort + description: Sort field; prefix with '-' for descending. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/ExperimentGroupFilter' + description: Filter experiment groups by name. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentGroupResponsesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/experiment-groups/{name}: + get: + tags: + - Experiment Groups + summary: Get Experiment Group + operationId: get_experiment_group_apis_intake_v2_workspaces__workspace__experiment_groups__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentGroupResponse' + '404': + description: Experiment group not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Experiment Groups + summary: Update Experiment Group + operationId: update_experiment_group_apis_intake_v2_workspaces__workspace__experiment_groups__name__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentGroupRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentGroupResponse' + '404': + description: Experiment group not found + '409': + description: Attempt to rename the group + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Experiment Groups + summary: Delete Experiment Group + operationId: delete_experiment_group_apis_intake_v2_workspaces__workspace__experiment_groups__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Successful Response + '404': + description: Experiment group not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/experiments: + post: + tags: + - Experiments + summary: Create Experiment + operationId: create_experiment_apis_intake_v2_workspaces__workspace__experiments_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentResponse' + '409': + description: Experiment already exists + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Experiments + summary: List Experiments + operationId: list_experiments_apis_intake_v2_workspaces__workspace__experiments_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + enum: + - -created_at + - created_at + - -updated_at + - updated_at + - -name + - name + type: string + description: Sort field; prefix with '-' for descending. + default: -created_at + title: Sort + description: Sort field; prefix with '-' for descending. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/ExperimentFilter' + description: Filter experiments by name, experiment_group_id, agent_name, + and dataset_name. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentResponsesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/experiments/{name}: + get: + tags: + - Experiments + summary: Get Experiment + operationId: get_experiment_apis_intake_v2_workspaces__workspace__experiments__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentResponse' + '404': + description: Experiment not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + put: + tags: + - Experiments + summary: Update Experiment + operationId: update_experiment_apis_intake_v2_workspaces__workspace__experiments__name__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ExperimentResponse' + '404': + description: Experiment not found + '409': + description: Attempt to change an immutable field + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Experiments + summary: Delete Experiment + operationId: delete_experiment_apis_intake_v2_workspaces__workspace__experiments__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Successful Response + '404': + description: Experiment not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/ingest/atif: + post: + tags: + - Ingest + summary: Ingest Atif + operationId: ingest_atif_apis_intake_v2_workspaces__workspace__ingest_atif_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/AtifIngestRequest' + responses: + '201': + description: Successful Response + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/ingest/chat-completions: + post: + tags: + - Ingest + summary: Ingest Chat Completion + operationId: ingest_chat_completion_apis_intake_v2_workspaces__workspace__ingest_chat_completions_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/ChatCompletionsIngestRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/ChatCompletionsIngestResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/ingest/otlp/v1/traces: + post: + tags: + - Ingest + summary: Ingest Otlp Traces + operationId: ingest_otlp_traces_apis_intake_v2_workspaces__workspace__ingest_otlp_v1_traces_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: content-type + in: header + required: false + schema: + type: string + default: application/octet-stream + title: Content-Type + - name: content-length + in: header + required: false + schema: + title: Content-Length + type: integer + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/IngestResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/spans: + get: + tags: + - Spans + summary: List Spans + operationId: list_spans_apis_intake_v2_workspaces__workspace__spans_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/SpanSortField' + default: -started_at + - name: mode + in: query + required: false + schema: + enum: + - summary + - detailed + type: string + default: detailed + title: Mode + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/SpanFilter' + description: Filter spans by session_id, trace_id, parent_span_id, project, + evaluation context fields, source, kind, status, model, tool_name, provider, + agent_id, agent_name, prompt_name, prompt_version, and started_at. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/SpansPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/spans/{span_id}: + get: + tags: + - Spans + summary: Get Span + operationId: get_span_apis_intake_v2_workspaces__workspace__spans__span_id__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: span_id + in: path + required: true + schema: + type: string + title: Span Id + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Span' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/spans/{span_id}/evaluator-results: + get: + tags: + - Evaluator Results + summary: List Evaluator Results For Span + operationId: list_evaluator_results_for_span_apis_intake_v2_workspaces__workspace__spans__span_id__evaluator_results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: span_id + in: path + required: true + schema: + type: string + title: Span Id + responses: + '200': + description: Successful Response + content: + application/json: + schema: + type: array + items: + $ref: '#/components/schemas/EvaluatorResult' + title: Response List Evaluator Results For Span Apis Intake V2 Workspaces Workspace Spans Span + Id Evaluator Results Get + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/traces: + get: + tags: + - Traces + summary: List Traces + operationId: list_traces_apis_intake_v2_workspaces__workspace__traces_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + minimum: 1 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/TraceSortField' + default: -started_at + - name: mode + in: query + required: false + schema: + enum: + - summary + - detailed + type: string + description: Use summary for root-span trace fields only, or detailed to + include token, cost, and span-count rollups. + default: detailed + title: Mode + description: Use summary for root-span trace fields only, or detailed to include + token, cost, and span-count rollups. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/TraceFilter' + description: Filter root-span-backed traces by id, session_id, rolled-up status, + root span started_at, and root-span evaluation context fields. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/TracesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/intake/v2/workspaces/{workspace}/traces/{id}: + get: + tags: + - Traces + summary: Get Trace + operationId: get_trace_apis_intake_v2_workspaces__workspace__traces__id__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: id + in: path + required: true + schema: + type: string + title: Id + - name: mode + in: query + required: false + schema: + enum: + - summary + - detailed + type: string + description: Use summary for root-span trace fields only, or detailed to + include token, cost, and span-count rollups. + default: detailed + title: Mode + description: Use summary for root-span trace fields only, or detailed to include + token, cost, and span-count rollups. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/Trace' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/execution-profiles: + get: + tags: + - Jobs + summary: Get Execution Profiles + description: Get all currently configured execution profiles. + operationId: get_execution_profiles_apis_jobs_v2_execution_profiles_get + responses: + '200': + description: Successful Response + content: + application/json: + schema: + items: + anyOf: + - $ref: '#/components/schemas/DockerJobExecutionProfile' + - $ref: '#/components/schemas/KubernetesJobExecutionProfile' + - $ref: '#/components/schemas/VolcanoJobExecutionProfile' + - $ref: '#/components/schemas/SubprocessJobExecutionProfile' + - $ref: '#/components/schemas/E2EJobExecutionProfile' + type: array + title: Response Get Execution Profiles Apis Jobs V2 Execution Profiles + Get + /apis/jobs/v2/workspaces/{workspace}/jobs: + post: + tags: + - Jobs + summary: Create Job + description: Create a new platform job. + operationId: create_job_apis_jobs_v2_workspaces__workspace__jobs_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreatePlatformJobRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Jobs + summary: List Jobs + description: List platform jobs with filtering and pagination. + operationId: list_jobs_apis_jobs_v2_workspaces__workspace__jobs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page size. + default: 10 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/PlatformJobSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: -created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/PlatformJobsListFilter' + description: Filter jobs by workspace, project, name, status, source, created_at, + and updated_at. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResponsesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{job}/results/{name}: + post: + tags: + - Jobs + summary: Create Job Result + description: Create a new result for a job. + operationId: create_job_result_apis_jobs_v2_workspaces__workspace__jobs__job__results__name__post + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResultCreateRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResultResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Jobs + summary: Get Job Result + description: Get a specific job result. + operationId: get_job_result_apis_jobs_v2_workspaces__workspace__jobs__job__results__name__get + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResultResponse' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{job}/results/{name}/download: + get: + tags: + - Jobs + summary: Download Job Result + description: Download a job result file. + operationId: download_job_result_apis_jobs_v2_workspaces__workspace__jobs__job__results__name__download_get + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/octet-stream: + schema: + type: string + format: binary + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{job}/steps/{name}: + get: + tags: + - Jobs + summary: Get Job Step + description: Get a specific job step. + operationId: get_job_step_apis_jobs_v2_workspaces__workspace__jobs__job__steps__name__get + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStep' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{job}/steps/{name}/status: + patch: + tags: + - Jobs + summary: Update Job Step Status + description: Update a job step status. + operationId: update_job_step_status_apis_jobs_v2_workspaces__workspace__jobs__job__steps__name__status_patch + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStatusUpdateRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStep' + '404': + description: Not Found + '409': + description: Conflict + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{job}/steps/{name}/tasks: + get: + tags: + - Jobs + summary: List Job Step Tasks + description: List tasks for a job step. + operationId: list_job_step_tasks_apis_jobs_v2_workspaces__workspace__jobs__job__steps__name__tasks_get + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobListTaskResponse' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{job}/steps/{step}/tasks/{name}: + put: + tags: + - Jobs + summary: Update Job Step Task + description: Update a job step task. + operationId: update_job_step_task_apis_jobs_v2_workspaces__workspace__jobs__job__steps__step__tasks__name__put + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: step + in: path + required: true + schema: + type: string + title: Step + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobTaskUpdate' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobTask' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Jobs + summary: Get Job Step Task + description: Get a specific job step task. + operationId: get_job_step_task_apis_jobs_v2_workspaces__workspace__jobs__job__steps__step__tasks__name__get + parameters: + - name: job + in: path + required: true + schema: + type: string + title: Job + - name: step + in: path + required: true + schema: + type: string + title: Step + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobTask' + '404': + description: Not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}: + get: + tags: + - Jobs + summary: Get Job + description: Get a platform job by name. + operationId: get_job_apis_jobs_v2_workspaces__workspace__jobs__name__get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Jobs + summary: Delete Job + description: Delete a platform job. + operationId: delete_job_apis_jobs_v2_workspaces__workspace__jobs__name__delete + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '204': + description: Successful Response + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/cancel: + post: + tags: + - Jobs + summary: Cancel Job + description: Cancel a platform job. + operationId: cancel_job_apis_jobs_v2_workspaces__workspace__jobs__name__cancel_post + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/logs: + get: + tags: + - Jobs + summary: Page Job Logs + description: Get paginated logs for a platform job. + operationId: page_job_logs_apis_jobs_v2_workspaces__workspace__jobs__name__logs_get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: limit + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Maximum number of logs to return + default: 100 + title: Limit + description: Maximum number of logs to return + - name: page_cursor + in: query + required: false + schema: + type: string + description: Page cursor + title: Page Cursor + description: Page cursor + - name: attempt_id + in: query + required: false + schema: + description: Filter logs by job attempt ID + title: Attempt Id + type: integer + description: Filter logs by job attempt ID + - name: step_id + in: query + required: false + schema: + description: Filter logs by step name + title: Step Id + type: string + description: Filter logs by step name + - name: task_id + in: query + required: false + schema: + description: Filter logs by task ID + title: Task Id + type: string + description: Filter logs by task ID + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobLogPage' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/pause: + post: + tags: + - Jobs + summary: Pause Job + description: Pause a platform job. + operationId: pause_job_apis_jobs_v2_workspaces__workspace__jobs__name__pause_post + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/results: + get: + tags: + - Jobs + summary: List Job Results + description: List results for a job. + operationId: list_job_results_apis_jobs_v2_workspaces__workspace__jobs__name__results_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/PlatformJobSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: -created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobListResultResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/resume: + post: + tags: + - Jobs + summary: Resume Job + description: Resume a paused platform job. + operationId: resume_job_apis_jobs_v2_workspaces__workspace__jobs__name__resume_post + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/status: + get: + tags: + - Jobs + summary: Get Job Status + description: Get the status of a platform job. + operationId: get_job_status_apis_jobs_v2_workspaces__workspace__jobs__name__status_get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStatusResponse' + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/status-details: + patch: + tags: + - Jobs + summary: Update Job Status Details + description: Update the status details of a platform job. + operationId: update_job_status_details_apis_jobs_v2_workspaces__workspace__jobs__name__status_details_patch + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + title: Request + responses: + '200': + description: Successful Response + content: + application/json: + schema: {} + '404': + description: Job not Found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/jobs/v2/workspaces/{workspace}/jobs/{name}/steps: + get: + tags: + - Jobs + summary: List Steps + description: List job steps with pagination and filtering. + operationId: list_steps_apis_jobs_v2_workspaces__workspace__jobs__name__steps_get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page size. + default: 25 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/PlatformJobSortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/PlatformJobStepsListFilter' + description: Filter steps by job, status, and source. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformJobStepWithContextsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/adapters: + post: + tags: + - Adapters + summary: Create Adapter + description: Create an adapter under a base model specified by the "model" field + in the body. + operationId: create_adapter_apis_models_v2_workspaces__workspace__adapters_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateAdapterRequest' + responses: + '201': + description: Create a new adapter for a model + content: + application/json: + schema: + $ref: '#/components/schemas/Adapter' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Adapters + summary: List Adapters + operationId: list_adapters_apis_models_v2_workspaces__workspace__adapters_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + maximum: 1000 + minimum: 1 + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/AdapterEntityFilter' + description: Filter adapters by name, model (parent model ref string, stored + on the adapter), description, fileset, finetuning_type, enabled, created_at, + and updated_at. + responses: + '200': + description: List adapters in the workspace + content: + application/json: + schema: + $ref: '#/components/schemas/AdaptersPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/adapters/{name}: + get: + tags: + - Adapters + summary: Get Adapter + operationId: get_adapter_apis_models_v2_workspaces__workspace__adapters__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Get one adapter by name. Parent model is taken from the adapter's + stored parent (entity `parent` field). + content: + application/json: + schema: + $ref: '#/components/schemas/Adapter' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Adapters + summary: Delete Adapter + operationId: delete_adapter_apis_models_v2_workspaces__workspace__adapters__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Delete adapter by name. The entity store delete uses the adapter's + stored parent (entity `parent` field). + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + patch: + tags: + - Adapters + summary: Update Adapter + operationId: update_adapter_apis_models_v2_workspaces__workspace__adapters__name__patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateAdapterRequest' + responses: + '200': + description: Update adapter metadata. Updates are applied to the row identified + by the adapter's stored parent (entity `parent` field). + content: + application/json: + schema: + $ref: '#/components/schemas/Adapter' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployment-configs: + get: + tags: + - ModelDeploymentConfigs + summary: List ModelDeploymentConfigs By Workspace + description: 'List ModelDeploymentConfigs for a specific workspace. + + Returns only the latest version of each config.' + operationId: list_deployment_configs_apis_models_v2_workspaces__workspace__deployment_configs_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/ModelDeploymentConfigFilter' + description: Filter deployment configs by workspace, project, model_entity_id, + name, description, created_at, and updated_at. + responses: + '200': + description: Return model deployment configurations for a workspace + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeploymentConfigsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - ModelDeploymentConfigs + summary: Create ModelDeploymentConfig + description: Create a new ModelDeploymentConfig (version 1). + operationId: create_deployment_config_apis_models_v2_workspaces__workspace__deployment_configs_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateModelDeploymentConfigRequest' + responses: + '201': + description: Create a new model deployment configuration + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeploymentConfig' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployment-configs/{config}/versions/{name}: + get: + tags: + - ModelDeploymentConfigs + summary: Get Specific ModelDeploymentConfig Version + description: Get a specific version of a ModelDeploymentConfig. + operationId: get_deployment_config_version_apis_models_v2_workspaces__workspace__deployment_configs__config__versions__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: config + in: path + required: true + schema: + type: string + title: Config + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return a specific version of a model deployment configuration + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeploymentConfig' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - ModelDeploymentConfigs + summary: Delete Specific ModelDeploymentConfig Version + description: 'Delete a specific version of a ModelDeploymentConfig. + + + This operation will fail with 409 Conflict if any ModelDeployments currently + + reference this specific version and are not in DELETED status. Delete or wait + for + + dependent deployments to reach DELETED status before deleting the config version.' + operationId: delete_deployment_config_version_apis_models_v2_workspaces__workspace__deployment_configs__config__versions__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: config + in: path + required: true + schema: + type: string + title: Config + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Deployment config version deleted successfully + '404': + description: Deployment config version not found + '409': + description: Cannot delete - dependent ModelDeployments exist + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployment-configs/{name}: + get: + tags: + - ModelDeploymentConfigs + summary: Get Latest ModelDeploymentConfig Version + description: Get the latest version of a ModelDeploymentConfig. + operationId: get_latest_deployment_config_apis_models_v2_workspaces__workspace__deployment_configs__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return the latest version of a model deployment configuration + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeploymentConfig' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - ModelDeploymentConfigs + summary: Update ModelDeploymentConfig + description: Update a ModelDeploymentConfig (creates a new immutable version). + operationId: update_deployment_config_apis_models_v2_workspaces__workspace__deployment_configs__name__post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateModelDeploymentConfigRequest' + responses: + '201': + description: Update a model deployment configuration (creates new version) + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeploymentConfig' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - ModelDeploymentConfigs + summary: Delete All ModelDeploymentConfig Versions + description: 'Delete all versions of a ModelDeploymentConfig. + + + This operation will fail with 409 Conflict if any ModelDeployments currently + + reference this config and are not in DELETED status. Delete or wait for + + dependent deployments to reach DELETED status before deleting the config.' + operationId: delete_all_deployment_config_versions_apis_models_v2_workspaces__workspace__deployment_configs__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Deployment config deleted successfully + '404': + description: Deployment config not found + '409': + description: Cannot delete - dependent ModelDeployments exist + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployment-configs/{name}/versions: + get: + tags: + - ModelDeploymentConfigs + summary: List ModelDeploymentConfig Versions + description: List all versions of a ModelDeploymentConfig. + operationId: list_deployment_config_versions_apis_models_v2_workspaces__workspace__deployment_configs__name__versions_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return all versions of a model deployment configuration + content: + application/json: + schema: + type: array + items: + $ref: '#/components/schemas/ModelDeploymentConfig' + title: Response List Deployment Config Versions Apis Models V2 Workspaces Workspace Deployment + Configs Name Versions Get + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployments: + get: + tags: + - ModelDeployments + summary: List ModelDeployments + description: 'List ModelDeployments for a specific workspace. + + + By default, returns only the latest version of each deployment.' + operationId: list_deployments_apis_models_v2_workspaces__workspace__deployments_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + type: string + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + title: Sort + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - name: all_versions + in: query + required: false + schema: + type: boolean + description: If true, return all versions of each deployment. If false (default), + return only the latest version. + default: false + title: All Versions + description: If true, return all versions of each deployment. If false (default), + return only the latest version. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/ModelDeploymentFilter' + description: Filter deployments by workspace, project, status, config, model_provider_id, + name, status_message, created_at, and updated_at. + responses: + '200': + description: Return model deployments for a workspace + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeploymentsPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - ModelDeployments + summary: Create ModelDeployment + description: Create a new ModelDeployment (version 1). + operationId: create_deployment_apis_models_v2_workspaces__workspace__deployments_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateModelDeploymentRequest' + responses: + '201': + description: Create a new model deployment + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeployment' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployments/{deployment}/versions/{name}: + get: + tags: + - ModelDeployments + summary: Get Specific ModelDeployment Version + description: Get a specific version of a ModelDeployment. + operationId: get_deployment_version_apis_models_v2_workspaces__workspace__deployments__deployment__versions__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: deployment + in: path + required: true + schema: + type: string + title: Deployment + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return a specific version of a model deployment + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeployment' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - ModelDeployments + summary: Delete Specific ModelDeployment Version + description: 'Delete a specific version of a ModelDeployment. + + + If the deployment is in any state other than DELETED, this will set its status + to DELETING. + + The models controller will then: + + 1. Delete the infrastructure (e.g., K8s NimService) + + 2. Update the status to DELETED + + + If the deployment is already in DELETED status, calling delete again will + permanently + + remove it from the database. + + + Returns: + + - 202 Accepted: Deployment version marked for deletion (status set to DELETING) + + - 204 No Content: Deployment version permanently removed from database (was + already DELETED) + + - 404 Not Found: Deployment version doesn''t exist' + operationId: delete_deployment_version_apis_models_v2_workspaces__workspace__deployments__deployment__versions__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: deployment + in: path + required: true + schema: + type: string + title: Deployment + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '202': + description: Deployment version marked for deletion (DELETING status) + content: + application/json: + schema: {} + '204': + description: Deployment version hard-deleted from database (was already + DELETED) + '404': + description: Deployment version not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployments/{name}: + get: + tags: + - ModelDeployments + summary: Get Latest ModelDeployment + description: Get the latest version of a ModelDeployment. + operationId: get_latest_deployment_apis_models_v2_workspaces__workspace__deployments__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return the latest version of a model deployment + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeployment' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - ModelDeployments + summary: Update ModelDeployment + description: Update a ModelDeployment (creates a new immutable version). + operationId: update_deployment_apis_models_v2_workspaces__workspace__deployments__name__post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateModelDeploymentRequest' + responses: + '201': + description: Update a model deployment (creates new version) + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeployment' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - ModelDeployments + summary: Delete All ModelDeployment Versions + description: 'Delete all versions of a ModelDeployment. + + + If the deployment is in any state other than DELETED, this will set its status + to DELETING. + + The models controller will then: + + 1. Delete the infrastructure (e.g., K8s NimService) + + 2. Update the status to DELETED + + + If the deployment is already in DELETED status, calling delete again will + permanently + + remove it from the database. + + + Returns: + + - 202 Accepted: Deployment marked for deletion (status set to DELETING) + + - 204 No Content: Deployment permanently removed from database (was already + DELETED) + + - 404 Not Found: Deployment doesn''t exist' + operationId: delete_all_deployment_versions_apis_models_v2_workspaces__workspace__deployments__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '202': + description: Deployment marked for deletion (DELETING status) + content: + application/json: + schema: {} + '204': + description: Deployment hard-deleted from database (was already DELETED) + '404': + description: Deployment not found + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployments/{name}/models: + get: + tags: + - ModelDeployments + summary: Get Latest ModelDeployment's Model Entities + description: 'Get Latest ModelDeployment''s Model Entities from Entity Store. + + This provides the API contract that NIMs expect from Entity Store today, for + pulling LoRAs, + + but enables us to enforce AuthZ boundaries. + + + TODO: Implement model entity retrieval based on deployment config.' + operationId: get_deployment_models_apis_models_v2_workspaces__workspace__deployments__name__models_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return model entities from Entity Store for the latest deployment + content: + application/json: + schema: + type: object + additionalProperties: true + title: Response Get Deployment Models Apis Models V2 Workspaces Workspace Deployments Name Models + Get + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployments/{name}/status: + post: + tags: + - ModelDeployments + summary: Update ModelDeployment Status + description: 'Update the status of a ModelDeployment (mutable operation). + + If version is not specified, updates the latest version.' + operationId: update_deployment_status_apis_models_v2_workspaces__workspace__deployments__name__status_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: version + in: query + required: false + schema: + title: Version + type: string + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateModelDeploymentStatusRequest' + responses: + '200': + description: Update ModelDeployment status and status_message + content: + application/json: + schema: + $ref: '#/components/schemas/ModelDeployment' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/deployments/{name}/versions: + get: + tags: + - ModelDeployments + summary: List ModelDeployment Versions + description: List all versions of a ModelDeployment. + operationId: list_deployment_versions_apis_models_v2_workspaces__workspace__deployments__name__versions_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return all versions of a model deployment + content: + application/json: + schema: + type: array + items: + $ref: '#/components/schemas/ModelDeployment' + title: Response List Deployment Versions Apis Models V2 Workspaces Workspace Deployments Name Versions + Get + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/models: + post: + tags: + - Models + summary: Create Model + description: 'Create a new model entity. + + + This endpoint creates a new Model Entity in the Models service database. + + The Model Entity will be registered for use within the platform.' + operationId: create_model_apis_models_v2_workspaces__workspace__models_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateModelEntityRequest' + responses: + '201': + description: Create a new model entity + content: + application/json: + schema: + $ref: '#/components/schemas/ModelEntity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Models + summary: List Models + description: 'List Models endpoint with filtering, pagination, and sorting. + + + Supports filter parameters for various criteria (including peft, custom fields), + + pagination (page, page_size), sorting, and workspace filtering via query parameter.' + operationId: list_models_apis_models_v2_workspaces__workspace__models_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/ModelEntitySortField' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - name: verbose + in: query + required: false + schema: + type: boolean + description: Whether to include full spec details + default: false + title: Verbose + description: Whether to include full spec details + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/ModelEntityFilter' + description: Filter models by name, project, workspace, base_model, adapters, + finetuning_type, prompt, lora_enabled, description, created_at, and updated_at. + responses: + '200': + description: Return a list of models + content: + application/json: + schema: + $ref: '#/components/schemas/ModelEntitysPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/models/{model_name}/adapters: + post: + tags: + - Models + summary: Add Model Adapter + description: Adds an Adapter to the Model + operationId: create_model_adapter_apis_models_v2_workspaces__workspace__models__model_name__adapters_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: model_name + in: path + required: true + schema: + type: string + title: Model Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateModelAdapterRequest' + responses: + '201': + description: Register a new adapter to the model + content: + application/json: + schema: + $ref: '#/components/schemas/Adapter' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/models/{model_name}/adapters/{adapter}: + delete: + tags: + - Models + summary: Delete Model Adapter + description: 'Delete Adapter from Model entity. + + + Permanently deletes an adapter from a model entity, if it was deployed, it + will be cleaned up automatically.' + operationId: delete_model_adapter_apis_models_v2_workspaces__workspace__models__model_name__adapters__adapter__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: model_name + in: path + required: true + schema: + type: string + title: Model Name + - name: adapter + in: path + required: true + schema: + type: string + title: Adapter + responses: + '204': + description: Delete model adapter by name + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + patch: + tags: + - Models + summary: Update Adapter + description: Update Adapter deployment or description. + operationId: update_model_adapter_apis_models_v2_workspaces__workspace__models__model_name__adapters__adapter__patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: model_name + in: path + required: true + schema: + type: string + title: Model Name + - name: adapter + in: path + required: true + schema: + type: string + title: Adapter + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateAdapterRequest' + responses: + '200': + description: Update adapter metadata + content: + application/json: + schema: + $ref: '#/components/schemas/Adapter' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/models/{name}: + get: + tags: + - Models + summary: Get Model by Workspace and Name + description: 'Get Model by Workspace and Name. + + + Returns the details of a specific model entity identified by its workspace + and name.' + operationId: get_model_apis_models_v2_workspaces__workspace__models__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: verbose + in: query + required: false + schema: + type: boolean + description: Whether to include full spec details + default: false + title: Verbose + description: Whether to include full spec details + responses: + '200': + description: Return model details + content: + application/json: + schema: + $ref: '#/components/schemas/ModelEntity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + patch: + tags: + - Models + summary: Update Model + description: 'Update Model metadata. + + + Updates the metadata of an existing model entity. If the request body has + an empty field, + + the old value is kept.' + operationId: update_model_apis_models_v2_workspaces__workspace__models__name__patch + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: verbose + in: query + required: false + schema: + type: boolean + description: Whether to include full spec details + default: false + title: Verbose + description: Whether to include full spec details + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateModelEntityRequest' + responses: + '200': + description: Update model metadata + content: + application/json: + schema: + $ref: '#/components/schemas/ModelEntity' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Models + summary: Delete Model + description: 'Delete Model entity. + + + Permanently deletes a model entity from the platform.' + operationId: delete_model_apis_models_v2_workspaces__workspace__models__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Delete model entity + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/providers: + get: + tags: + - ModelProviders + summary: List ModelProviders By Workspace + description: List model providers for a specific workspace. + operationId: list_providers_apis_models_v2_workspaces__workspace__providers_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + description: Page size. + default: 100 + title: Page Size + description: Page size. + - name: sort + in: query + required: false + schema: + allOf: + - $ref: '#/components/schemas/ModelProviderSort' + description: The field to sort by. To sort in decreasing order, use `-` + in front of the field name. + default: created_at + description: The field to sort by. To sort in decreasing order, use `-` in + front of the field name. + - in: query + name: filter + style: deepObject + required: false + explode: true + schema: + $ref: '#/components/schemas/ModelProviderFilter' + description: Filter model providers by workspace, project, status, model_deployment_id, + name, description, host_url, created_at, and updated_at. + responses: + '200': + description: Return model providers for a workspace + content: + application/json: + schema: + $ref: '#/components/schemas/ModelProvidersPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + post: + tags: + - ModelProviders + summary: Create ModelProvider + description: Create a new model provider. + operationId: create_provider_apis_models_v2_workspaces__workspace__providers_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateModelProviderRequest' + responses: + '201': + description: Create a new model provider + content: + application/json: + schema: + $ref: '#/components/schemas/ModelProvider' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/providers/{name}: + put: + tags: + - ModelProviders + summary: Upsert ModelProvider + description: Create or update a model provider. + operationId: upsert_provider_apis_models_v2_workspaces__workspace__providers__name__put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpsertModelProviderRequest' + responses: + '200': + description: Create or update a model provider + content: + application/json: + schema: + $ref: '#/components/schemas/ModelProvider' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - ModelProviders + summary: Get ModelProvider + description: Get a model provider by workspace and name. + operationId: get_provider_apis_models_v2_workspaces__workspace__providers__name__get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '200': + description: Return model provider details + content: + application/json: + schema: + $ref: '#/components/schemas/ModelProvider' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - ModelProviders + summary: Delete ModelProvider + description: Delete a model provider by workspace and name. + operationId: delete_provider_apis_models_v2_workspaces__workspace__providers__name__delete + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + responses: + '204': + description: Delete a model provider + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/models/v2/workspaces/{workspace}/providers/{name}/status: + put: + tags: + - ModelProviders + summary: Update ModelProvider Status Fields + description: 'Update status-related fields of a model provider. + + + This endpoint supports partial updates for fields managed by Models Controller: + + - model_deployment_id + + - served_models + + - status + + - status_message + + + If status is provided without status_message, status_message will be set to + empty string.' + operationId: update_provider_status_apis_models_v2_workspaces__workspace__providers__name__status_put + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: name + in: path + required: true + schema: + type: string + title: Name + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateModelProviderStatusRequest' + responses: + '200': + description: Update status-related fields of a model provider + content: + application/json: + schema: + $ref: '#/components/schemas/ModelProvider' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/secrets/v2/rotate-encryption-keys: + post: + tags: + - Secrets Admin + summary: Admin Rotate Encryption Keys + description: Rotate encryption keys for all platform secrets. + operationId: admin_rotate_encryption_keys_apis_secrets_v2_rotate_encryption_keys_post + responses: + '202': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretAdminRotationResponse' + /apis/secrets/v2/workspaces/{workspace}/secrets: + post: + tags: + - Secrets + summary: Create Secret + description: Create a new secret. + operationId: create_secret_apis_secrets_v2_workspaces__workspace__secrets_post + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretCreateRequest' + responses: + '201': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + get: + tags: + - Secrets + summary: List Secrets + description: List available secrets + operationId: list_secrets_apis_secrets_v2_workspaces__workspace__secrets_get + parameters: + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + - name: page + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page number. + default: 1 + title: Page + description: Page number. + - name: page_size + in: query + required: false + schema: + type: integer + exclusiveMinimum: 0 + description: Page size. + default: 10 + title: Page Size + description: Page size. + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretResponsesPage' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/secrets/v2/workspaces/{workspace}/secrets/{name}: + get: + tags: + - Secrets + summary: Get Secret + description: Retrieve a secret by its name. + operationId: get_secret_apis_secrets_v2_workspaces__workspace__secrets__name__get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + patch: + tags: + - Secrets + summary: Update Secret + description: Update a secret's metadata. + operationId: update_secret_apis_secrets_v2_workspaces__workspace__secrets__name__patch + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretUpdateRequest' + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + delete: + tags: + - Secrets + summary: Delete Secret + description: Delete a secret. + operationId: delete_secret_apis_secrets_v2_workspaces__workspace__secrets__name__delete + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '204': + description: Successful Response + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' + /apis/secrets/v2/workspaces/{workspace}/secrets/{name}/access: + get: + tags: + - Secrets + summary: Access Secret + description: Access the value of a secret. + operationId: access_secret_apis_secrets_v2_workspaces__workspace__secrets__name__access_get + parameters: + - name: name + in: path + required: true + schema: + type: string + title: Name + - name: workspace + in: path + required: true + schema: + type: string + title: Workspace + responses: + '200': + description: Successful Response + content: + application/json: + schema: + $ref: '#/components/schemas/PlatformSecretAccessResponse' + '422': + description: Validation Error + content: + application/json: + schema: + $ref: '#/components/schemas/HTTPValidationError' +components: + schemas: + AIDefenseRailConfig: + properties: + timeout: + type: number + title: Timeout + description: Timeout in seconds for API requests to AI Defense service + default: 30.0 + fail_open: + type: boolean + title: Fail Open + description: If True, allow content when AI Defense API call fails (fail + open). If False, block content when API call fails (fail closed). Does + not affect missing configuration validation. + default: false + type: object + title: AIDefenseRailConfig + description: Configuration data for the Cisco AI Defense API + APIEndpointData: + properties: + url: + title: Url + description: Endpoint URL + type: string + minLength: 1 + format: uri + model_id: + title: Model Id + description: Model identifier at the endpoint + type: string + api_key: + title: Api Key + description: API key for authentication + type: string + format: + title: Format + description: API format (e.g., openai, nvidia) + type: string + type: object + title: APIEndpointData + description: Data about an inference endpoint. + ActionRails: + properties: + instant_actions: + title: Instant Actions + description: The names of all actions which should finish instantly. + items: + type: string + type: array + type: object + title: ActionRails + description: 'Configuration of action rails. + + + Action rails control various options related to the execution of actions. + + Currently, only + + + In the future multiple options will be added, e.g., what input validation + should be + + performed per action, output validation, throttling, disabling, etc.' + ActivatedRail: + properties: + type: + type: string + title: Type + description: The type of the rail that was activated, e.g., input, output, + dialog. + name: + type: string + title: Name + description: The name of the rail, i.e., the name of the flow implementing + the rail. + decisions: + items: + type: string + type: array + title: Decisions + description: A sequence of decisions made by the rail, e.g., 'bot refuse + to respond', 'stop', 'continue'. + executed_actions: + items: + $ref: '#/components/schemas/ExecutedAction' + type: array + title: Executed Actions + description: The list of actions executed by the rail. + stop: + type: boolean + title: Stop + description: Whether the rail decided to stop any further processing. + default: false + additional_info: + title: Additional Info + description: Additional information coming from rail. + additionalProperties: true + type: object + started_at: + title: Started At + description: Timestamp for when the rail started. + type: number + finished_at: + title: Finished At + description: Timestamp for when the rail finished. + type: number + duration: + title: Duration + description: The duration in seconds for applying the rail. Some rails are + applied instantly, e.g., dialog rails, so they don't have a duration. + type: number + type: object + required: + - type + - name + title: ActivatedRail + description: A rail that was activated during the generation. + Adapter: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the adapter. Name must be unique in the workspace + for all Adapters and match the following regex: Allowed characters: letters + (a-z, A-Z), digits (0-9), underscores, hyphens, and dots.' + examples: + - lora-adapter-v1 + - my-finetune + workspace: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Workspace + description: 'Workspace of the adapter. Allowed characters: letters (a-z, + A-Z), digits (0-9), underscores, hyphens, and dots.' + description: + title: Description + description: Optional description of the adapter + type: string + maxLength: 1000 + fileset: + type: string + title: Fileset + description: Fileset where the adapter files are stored expected format + {workspace}/{fileset_name} + finetuning_type: + allOf: + - $ref: '#/components/schemas/FinetuningType' + description: Type of finetuning (LORA, P_TUNING, etc.) + enabled: + type: boolean + title: Enabled + description: Whether to make this adapter available for inference post training + default: true + lora_config: + allOf: + - $ref: '#/components/schemas/Lora' + description: Lora configuration specifics + model: + title: Model + description: Parent model entity reference. A single name (2-63 characters) + or 'workspace/model_name' where each segment is a valid name (lowercase, + digits, hyphens, and temporarily @ . + _; no leading/trailing or consecutive + hyphens). If one slash, both sides must be non-empty. + type: string + maxLength: 127 + created_at: + type: string + format: date-time + title: Created At + updated_at: + type: string + format: date-time + title: Updated At + type: object + required: + - name + - workspace + - fileset + - finetuning_type + title: Adapter + AdapterEntityFilter: + additionalProperties: false + description: Filter for Adapter list queries. + properties: + name: + description: Filter by adapter name. + title: Name + type: string + model: + description: Filter by parent (base) model entity reference in the form + {workspace}/{model_name}. + title: Model + type: string + description: + description: Filter by description. + title: Description + type: string + fileset: + description: Filter by fileset reference in the form {workspace}/{fileset_name}. + title: Fileset + type: string + finetuning_type: + allOf: + - $ref: '#/components/schemas/FinetuningType' + description: Filter by fine-tuning / PEFT type. + enabled: + description: Filter by whether the adapter is enabled for inference after + training. + title: Enabled + type: boolean + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter entities based on creation date. + updated_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter entities based on update date. + title: AdapterEntityFilter + type: object + AdaptersPage: + properties: + data: + items: + $ref: '#/components/schemas/Adapter' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: AdaptersPage + Agent: + properties: + url: + type: string + title: Url + description: Base URL of the agent endpoint. + name: + type: string + title: Name + description: Agent name / identifier. + format: + type: string + enum: + - generic + - nemo_agent_toolkit + title: Format + description: Agent format that determines the execution path. + default: generic + api_key_secret: + allOf: + - $ref: '#/components/schemas/SecretRef' + description: 'API key secret reference for the agent. Format: workspace/secret_name + or secret_name within the job workspace.' + body: + title: Body + description: Jinja template for the request payload. Required for generic + agents. + additionalProperties: true + type: object + response_path: + title: Response Path + description: JSONPath expression to extract the response text from the agent's + response body. Required for generic agents. + type: string + trajectory_path: + title: Trajectory Path + description: JSONPath expression to extract the trajectory from the agent's + response body. Optional. + type: string + additionalProperties: false + type: object + required: + - url + - name + title: Agent + description: "Agent definition for inference in online evaluation jobs.\n\n\ + An agent is an endpoint that accepts a request and returns a response,\npotentially\ + \ with a trajectory. Two formats are supported:\n\n- ``generic``: configurable\ + \ HTTP POST with Jinja-templated body and\n JSONPath extraction for response\ + \ and trajectory.\n- ``nemo_agent_toolkit``: NeMo Agent Toolkit SSE streaming\ + \ protocol\n (``/generate/full?filter_steps=none``)." + AgentGoalAccuracyMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: agent_goal_accuracy + title: Type + default: agent_goal_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + use_reference: + type: boolean + title: Use Reference + description: Whether to use reference for goal accuracy evaluation. + default: true + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: AgentGoalAccuracyMetric + description: RAGAS metric for measuring agent goal accuracy. + AgentGoalAccuracyMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: agent_goal_accuracy + title: Type + default: agent_goal_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + use_reference: + type: boolean + title: Use Reference + description: Whether to use reference for goal accuracy evaluation. + default: true + type: object + required: + - judge_model + title: AgentGoalAccuracyMetricInput + description: Request type for AgentGoalAccuracy metrics. + AgentGoalAccuracyMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: agent_goal_accuracy + title: Type + default: agent_goal_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + use_reference: + type: boolean + title: Use Reference + description: Whether to use reference for goal accuracy evaluation. + default: true + type: object + required: + - judge_model + title: AgentGoalAccuracyMetricResponse + description: Response type for AgentGoalAccuracy metrics. + AggregateRangeScore: + properties: + name: + type: string + title: Name + description: Name of the score. + count: + type: integer + title: Count + description: Number of samples evaluated (excluding NaN). + nan_count: + type: integer + title: Nan Count + description: Number of samples that produced NaN scores. + sum: + title: Sum + description: Sum of all score values. + type: number + mean: + title: Mean + description: Mean score value. + type: number + min: + title: Min + description: Minimum score value. + type: number + max: + title: Max + description: Maximum score value. + type: number + std_dev: + title: Std Dev + description: Standard deviation of the scores. + type: number + variance: + title: Variance + description: Variance of the scores. + type: number + score_type: + type: string + const: range + title: Score Type + description: Type of score. + default: range + percentiles: + allOf: + - $ref: '#/components/schemas/Percentiles' + description: Percentile distribution of scores. + histogram: + allOf: + - $ref: '#/components/schemas/Histogram' + description: Histogram of score distribution. + additionalProperties: false + type: object + required: + - name + - count + - nan_count + title: AggregateRangeScore + description: Aggregated statistics for a range-type score with percentiles and + histogram. + AggregateRubricScore: + properties: + name: + type: string + title: Name + description: Name of the score. + count: + type: integer + title: Count + description: Number of samples evaluated (excluding NaN). + nan_count: + type: integer + title: Nan Count + description: Number of samples that produced NaN scores. + sum: + title: Sum + description: Sum of all score values. + type: number + mean: + title: Mean + description: Mean score value. + type: number + min: + title: Min + description: Minimum score value. + type: number + max: + title: Max + description: Maximum score value. + type: number + std_dev: + title: Std Dev + description: Standard deviation of the scores. + type: number + variance: + title: Variance + description: Variance of the scores. + type: number + score_type: + type: string + const: rubric + title: Score Type + description: Type of score. + default: rubric + rubric_distribution: + items: + $ref: '#/components/schemas/RubricScoreStat' + type: array + title: Rubric Distribution + description: Distribution of rubric categories. + mode_category: + title: Mode Category + description: Most frequent rubric category. + type: string + additionalProperties: false + type: object + required: + - name + - count + - nan_count + - rubric_distribution + title: AggregateRubricScore + description: Aggregated statistics for a rubric-type score with category distribution. + AggregatedMetricResult: + properties: + scores: + items: + anyOf: + - $ref: '#/components/schemas/AggregateRangeScore' + - $ref: '#/components/schemas/AggregateRubricScore' + type: array + title: Scores + description: The list of aggregated scores. + additionalProperties: false + type: object + required: + - scores + title: AggregatedMetricResult + description: Result of aggregating metric scores with full statistics. + Annotation: + oneOf: + - $ref: '#/components/schemas/FeedbackAnnotation' + - $ref: '#/components/schemas/NoteAnnotation' + - $ref: '#/components/schemas/MetadataAnnotation' + - $ref: '#/components/schemas/LabelAnnotation' + title: Annotation + description: Discriminated annotation read response. The shape varies by `kind`. + discriminator: + propertyName: kind + mapping: + feedback: '#/components/schemas/FeedbackAnnotation' + label: '#/components/schemas/LabelAnnotation' + metadata: '#/components/schemas/MetadataAnnotation' + note: '#/components/schemas/NoteAnnotation' + AnnotationFilter: + additionalProperties: false + properties: + span_id: + description: Return only annotations attached to this span. + title: Span Id + type: string + session_id: + description: Return only annotations attached to this session. + title: Session Id + type: string + kind: + allOf: + - $ref: '#/components/schemas/AnnotationKind' + description: Return only annotations of this kind (`feedback`, `note`, `label`, + or `metadata`). + name: + description: Return only `label` annotations with this `name` (e.g., `severity`, + `helpfulness`). + title: Name + type: string + value_text: + description: Return only annotations with this text value. For `feedback` + annotations this is `positive` or `negative`; for `label` annotations + with `value_type=text` this is the label's value. + title: Value Text + type: string + value_numeric: + allOf: + - $ref: '#/components/schemas/NumericFilter' + description: Return only `label` annotations whose numeric value falls within + the given range. Applies to labels with `value_type=numeric`. + created_by: + description: Return only annotations created by this user. + title: Created By + type: string + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Return only annotations created within the given time range. + title: AnnotationFilter + type: object + AnnotationInput: + oneOf: + - $ref: '#/components/schemas/FeedbackAnnotationInput' + - $ref: '#/components/schemas/NoteAnnotationInput' + - $ref: '#/components/schemas/MetadataAnnotationInput' + - $ref: '#/components/schemas/LabelAnnotationInput' + title: AnnotationInput + description: Discriminated annotation create body. The shape varies by `kind`. + discriminator: + propertyName: kind + mapping: + feedback: '#/components/schemas/FeedbackAnnotationInput' + label: '#/components/schemas/LabelAnnotationInput' + metadata: '#/components/schemas/MetadataAnnotationInput' + note: '#/components/schemas/NoteAnnotationInput' + AnnotationKind: + enum: + - feedback + - label + - note + - metadata + title: AnnotationKind + type: string + AnnotationSortField: + type: string + enum: + - created_at + - -created_at + title: AnnotationSortField + AnnotationsPage: + properties: + data: + items: + $ref: '#/components/schemas/Annotation' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: AnnotationsPage + AnswerAccuracyMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: answer_accuracy + title: Type + default: answer_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: AnswerAccuracyMetric + description: RAGAS metric for measuring answer accuracy. + AnswerAccuracyMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: answer_accuracy + title: Type + default: answer_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: AnswerAccuracyMetricInput + description: Request type for AnswerAccuracy metrics. + AnswerAccuracyMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: answer_accuracy + title: Type + default: answer_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: AnswerAccuracyMetricResponse + description: Response type for AnswerAccuracy metrics. + AtifAgent: + properties: + name: + type: string + title: Name + version: + type: string + title: Version + model_name: + title: Model Name + type: string + tool_definitions: + title: Tool Definitions + items: + additionalProperties: true + type: object + type: array + extra: + title: Extra + additionalProperties: true + type: object + additionalProperties: false + type: object + required: + - name + - version + title: AtifAgent + AtifContentPart: + oneOf: + - $ref: '#/components/schemas/AtifContentPartText' + - $ref: '#/components/schemas/AtifContentPartImage' + discriminator: + propertyName: type + mapping: + image: '#/components/schemas/AtifContentPartImage' + text: '#/components/schemas/AtifContentPartText' + title: AtifContentPart + AtifContentPartImage: + properties: + type: + type: string + const: image + title: Type + source: + $ref: '#/components/schemas/AtifImageSource' + additionalProperties: false + type: object + required: + - type + - source + title: AtifContentPartImage + AtifContentPartText: + properties: + type: + type: string + const: text + title: Type + text: + type: string + title: Text + additionalProperties: false + type: object + required: + - type + - text + title: AtifContentPartText + AtifFinalMetrics: + properties: + total_prompt_tokens: + title: Total Prompt Tokens + type: integer + total_completion_tokens: + title: Total Completion Tokens + type: integer + total_cached_tokens: + title: Total Cached Tokens + type: integer + total_cost_usd: + title: Total Cost Usd + type: number + total_steps: + title: Total Steps + type: integer + minimum: 0.0 + extra: + title: Extra + additionalProperties: true + type: object + additionalProperties: false + type: object + title: AtifFinalMetrics + AtifImageSource: + properties: + media_type: + type: string + enum: + - image/jpeg + - image/png + - image/gif + - image/webp + title: Media Type + path: + type: string + title: Path + additionalProperties: false + type: object + required: + - media_type + - path + title: AtifImageSource + AtifIngestRequest: + properties: + experiment_context: + $ref: '#/components/schemas/ExperimentContext' + evaluation_context: + allOf: + - $ref: '#/components/schemas/EvaluationContext' + description: Deprecated. Use experiment_context; when both are sent, experiment_context + takes precedence. + deprecated: true + schema_version: + type: string + enum: + - ATIF-v1.0 + - ATIF-v1.1 + - ATIF-v1.2 + - ATIF-v1.3 + - ATIF-v1.4 + - ATIF-v1.5 + - ATIF-v1.6 + - ATIF-v1.7 + title: Schema Version + session_id: + title: Session Id + type: string + agent: + $ref: '#/components/schemas/AtifAgent' + final_metrics: + $ref: '#/components/schemas/AtifFinalMetrics' + continued_trajectory_ref: + title: Continued Trajectory Ref + type: string + notes: + title: Notes + type: string + extra: + title: Extra + additionalProperties: true + type: object + steps: + items: + $ref: '#/components/schemas/AtifStep' + type: array + title: Steps + additionalProperties: false + type: object + required: + - schema_version + - agent + title: AtifIngestRequest + description: 'Span-based ATIF ingest request. + + + ATIF project scoping is intentionally not accepted here; use the workspace + + route and ``experiment_context`` for experiment identity.' + AtifMetrics: + properties: + prompt_tokens: + title: Prompt Tokens + type: integer + completion_tokens: + title: Completion Tokens + type: integer + cached_tokens: + title: Cached Tokens + type: integer + cost_usd: + title: Cost Usd + type: number + prompt_token_ids: + title: Prompt Token Ids + items: + type: integer + type: array + completion_token_ids: + title: Completion Token Ids + items: + type: integer + type: array + logprobs: + title: Logprobs + items: + type: number + type: array + extra: + title: Extra + additionalProperties: true + type: object + additionalProperties: false + type: object + title: AtifMetrics + AtifObservation: + properties: + results: + items: + $ref: '#/components/schemas/AtifObservationResult' + type: array + title: Results + additionalProperties: false + type: object + title: AtifObservation + AtifObservationResult: + properties: + source_call_id: + title: Source Call Id + type: string + content: + anyOf: + - type: string + - items: + $ref: '#/components/schemas/AtifContentPart' + type: array + title: Content + subagent_trajectory_ref: + title: Subagent Trajectory Ref + items: + $ref: '#/components/schemas/AtifSubagentTrajectoryRef' + type: array + extra: + title: Extra + additionalProperties: true + type: object + additionalProperties: false + type: object + title: AtifObservationResult + AtifStep: + oneOf: + - $ref: '#/components/schemas/AtifStepSystem' + - $ref: '#/components/schemas/AtifStepUser' + - $ref: '#/components/schemas/AtifStepAgent' + discriminator: + propertyName: source + mapping: + agent: '#/components/schemas/AtifStepAgent' + system: '#/components/schemas/AtifStepSystem' + user: '#/components/schemas/AtifStepUser' + title: AtifStep + AtifStepAgent: + properties: + step_id: + type: integer + minimum: 1.0 + title: Step Id + timestamp: + format: date-time + title: Timestamp + type: string + message: + anyOf: + - type: string + - items: + $ref: '#/components/schemas/AtifContentPart' + type: array + title: Message + default: '' + is_copied_context: + title: Is Copied Context + type: boolean + extra: + title: Extra + additionalProperties: true + type: object + llm_call_count: + title: Llm Call Count + type: integer + minimum: 0.0 + source: + type: string + const: agent + title: Source + model_name: + title: Model Name + type: string + reasoning_effort: + anyOf: + - type: string + - type: number + title: Reasoning Effort + reasoning_content: + title: Reasoning Content + type: string + tool_calls: + title: Tool Calls + items: + $ref: '#/components/schemas/AtifToolCall' + type: array + observation: + $ref: '#/components/schemas/AtifObservation' + metrics: + $ref: '#/components/schemas/AtifMetrics' + additionalProperties: false + type: object + required: + - step_id + - source + title: AtifStepAgent + AtifStepSystem: + properties: + step_id: + type: integer + minimum: 1.0 + title: Step Id + timestamp: + format: date-time + title: Timestamp + type: string + message: + anyOf: + - type: string + - items: + $ref: '#/components/schemas/AtifContentPart' + type: array + title: Message + default: '' + is_copied_context: + title: Is Copied Context + type: boolean + extra: + title: Extra + additionalProperties: true + type: object + llm_call_count: + title: Llm Call Count + type: integer + minimum: 0.0 + source: + type: string + const: system + title: Source + additionalProperties: false + type: object + required: + - step_id + - source + title: AtifStepSystem + AtifStepUser: + properties: + step_id: + type: integer + minimum: 1.0 + title: Step Id + timestamp: + format: date-time + title: Timestamp + type: string + message: + anyOf: + - type: string + - items: + $ref: '#/components/schemas/AtifContentPart' + type: array + title: Message + default: '' + is_copied_context: + title: Is Copied Context + type: boolean + extra: + title: Extra + additionalProperties: true + type: object + llm_call_count: + title: Llm Call Count + type: integer + minimum: 0.0 + source: + type: string + const: user + title: Source + additionalProperties: false + type: object + required: + - step_id + - source + title: AtifStepUser + AtifSubagentTrajectoryRef: + anyOf: + - required: + - trajectory_id + - required: + - trajectory_path + - required: + - session_id + properties: + trajectory_id: + title: Trajectory Id + type: string + trajectory_path: + title: Trajectory Path + type: string + session_id: + title: Session Id + type: string + extra: + title: Extra + additionalProperties: true + type: object + additionalProperties: false + type: object + title: AtifSubagentTrajectoryRef + AtifToolCall: + properties: + tool_call_id: + type: string + title: Tool Call Id + function_name: + type: string + title: Function Name + arguments: + additionalProperties: true + type: object + title: Arguments + additionalProperties: false + type: object + required: + - tool_call_id + - function_name + title: AtifToolCall + AuthContext: + properties: + principal_id: + type: string + title: Principal Id + description: The principal's unique identifier + principal_email: + title: Principal Email + description: The principal's email address + type: string + principal_groups: + items: + type: string + type: array + title: Principal Groups + description: Groups the principal belongs to + principal_on_behalf_of: + title: Principal On Behalf Of + description: If acting on behalf of another principal, their principal ID + type: string + principal_on_behalf_of_groups: + title: Principal On Behalf Of Groups + description: Groups the on-behalf-of principal belongs to + items: + type: string + type: array + principal_on_behalf_of_email: + title: Principal On Behalf Of Email + description: The on-behalf-of principal's email address + type: string + type: object + required: + - principal_id + title: AuthContext + description: 'Auth context captured at resource creation for delegated access. + + + Stores a snapshot of the creating principal''s identity so that controllers + + can later act on their behalf (e.g., accessing secrets).' + AuthDiscoveryResponse: + properties: + auth_enabled: + type: boolean + title: Auth Enabled + oidc: + $ref: '#/components/schemas/OIDCDiscoveryResponse' + type: object + required: + - auth_enabled + title: AuthDiscoveryResponse + description: Auth discovery response for CLI/SDK. + AutoAlignOptions: + properties: + guardrails_config: + additionalProperties: true + type: object + title: Guardrails Config + description: The guardrails configuration that is passed to the AutoAlign + endpoint + type: object + title: AutoAlignOptions + description: List of guardrails that are activated + AutoAlignRailConfig: + properties: + parameters: + title: Parameters + additionalProperties: true + type: object + input: + allOf: + - $ref: '#/components/schemas/AutoAlignOptions' + description: Input configuration for AutoAlign guardrails + output: + allOf: + - $ref: '#/components/schemas/AutoAlignOptions' + description: Output configuration for AutoAlign guardrails + type: object + title: AutoAlignRailConfig + description: Configuration data for the AutoAlign API + BLEUMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + type: + type: string + const: bleu + title: Type + default: bleu + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + references: + items: + type: string + type: array + title: References + description: The templates for the ground truth references to calculate + BLEU metric with. + candidate: + title: Candidate + description: The template for the candidate to calculate BLEU metric on. + If not provided, the output text from the model is used. + type: string + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - references + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: BLEUMetric + description: Persisted BLEU metric. + BLEUMetricInput: + properties: + type: + type: string + const: bleu + title: Type + default: bleu + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + references: + items: + type: string + type: array + title: References + description: The templates for the ground truth references to calculate + BLEU metric with. + candidate: + title: Candidate + description: The template for the candidate to calculate BLEU metric on. + If not provided, the output text from the model is used. + type: string + type: object + required: + - references + title: BLEUMetricInput + description: Request type for BLEUMetric. + BLEUMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + const: bleu + title: Type + default: bleu + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + references: + items: + type: string + type: array + title: References + description: The templates for the ground truth references to calculate + BLEU metric with. + candidate: + title: Candidate + description: The template for the candidate to calculate BLEU metric on. + If not provided, the output text from the model is used. + type: string + type: object + required: + - references + title: BLEUMetricResponse + description: Response type for BLEUMetric. + BackendFormat: + type: string + enum: + - OPENAI_CHAT + - ANTHROPIC_MESSAGES + title: BackendFormat + description: Inference backend API wire formats understood by IGW and middleware + plugins. + BaseModelFilter: + additionalProperties: false + description: Filter for base model properties. + properties: + name: + description: Filter by name of the base model. + title: Name + type: string + title: BaseModelFilter + type: object + Benchmark: + properties: + name: + type: string + title: Name + description: Benchmark name + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + description: + title: Description + description: Human-readable description of the benchmark. + type: string + metrics: + items: + $ref: '#/components/schemas/MetricRef' + type: array + title: Metrics + description: 'The metrics that comprise this benchmark (format: workspace/metric_name).' + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: 'Reference to a Fileset in the Files API (format: workspace/fileset-name). + The fileset contains the test cases for this benchmark.' + field_mapping: + allOf: + - $ref: '#/components/schemas/FieldMapping' + description: Maps canonical evaluator fields such as 'input' and 'output' + to dataset column paths for this benchmark. + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - name + - workspace + - metrics + - dataset + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: Benchmark + description: Benchmark response schema. + BenchmarkEvaluationJob: + properties: + id: + title: Id + type: string + name: + type: string + title: Name + description: + title: Description + type: string + project: + title: Project + type: string + workspace: + title: Workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + spec: + oneOf: + - $ref: '#/components/schemas/BenchmarkOfflineJob' + - $ref: '#/components/schemas/BenchmarkOnlineJob' + - $ref: '#/components/schemas/BenchmarkOnlineAgentJob' + - $ref: '#/components/schemas/SystemBenchmarkOfflineJob' + - $ref: '#/components/schemas/SystemBenchmarkOnlineJob' + title: Spec + status: + $ref: '#/components/schemas/PlatformJobStatus' + status_details: + title: Status Details + additionalProperties: true + type: object + error_details: + title: Error Details + additionalProperties: true + type: object + ownership: + title: Ownership + additionalProperties: true + type: object + custom_fields: + title: Custom Fields + additionalProperties: true + type: object + type: object + required: + - name + - spec + title: BenchmarkEvaluationJob + BenchmarkEvaluationJobRequest: + properties: + name: + title: Name + type: string + description: + title: Description + type: string + project: + title: Project + type: string + spec: + oneOf: + - $ref: '#/components/schemas/BenchmarkOfflineJob' + - $ref: '#/components/schemas/BenchmarkOnlineJob' + - $ref: '#/components/schemas/BenchmarkOnlineAgentJob' + - $ref: '#/components/schemas/SystemBenchmarkOfflineJob' + - $ref: '#/components/schemas/SystemBenchmarkOnlineJob' + title: Spec + ownership: + title: Ownership + additionalProperties: true + type: object + custom_fields: + title: Custom Fields + additionalProperties: true + type: object + type: object + required: + - spec + title: BenchmarkEvaluationJobRequest + BenchmarkEvaluationJobsListFilter: + additionalProperties: false + properties: + created_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Jobs created at 'gte' datetime or 'lte' datetime. + name: + description: Name of the job. + title: Name + type: string + workspace: + description: Workspace of the job. + title: Workspace + type: string + project: + description: Project containing the job. + title: Project + type: string + status: + allOf: + - $ref: '#/components/schemas/PlatformJobStatus' + description: The current status. + updated_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Jobs updated at 'gte' datetime or 'lte' datetime. + title: BenchmarkEvaluationJobsListFilter + type: object + BenchmarkEvaluationJobsPage: + properties: + data: + items: + $ref: '#/components/schemas/BenchmarkEvaluationJob' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: BenchmarkEvaluationJobsPage + BenchmarkEvaluationJobsSortField: + type: string + enum: + - created_at + - -created_at + - updated_at + - -updated_at + title: BenchmarkEvaluationJobsSortField + BenchmarkEvaluationResult: + properties: + results: + items: + $ref: '#/components/schemas/BenchmarkMetricResult' + type: array + title: Results + description: Results for each metric in the benchmark. + type: object + required: + - results + title: BenchmarkEvaluationResult + description: Aggregated results for a benchmark evaluation. + BenchmarkJobResult: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: The dataset used for the evaluation job to generate the result. + This field is only populated when the job specifies a FilesetRef. + model: + allOf: + - $ref: '#/components/schemas/ModelRef' + description: The model evaluated for the job to generate the result. This + field is only populated when the job specifies a ModelRef. + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: The benchmark used for the evaluation job to generate the result. + metrics: + title: Metrics + description: The list of metrics used for the evaluation job to generate + the result. + items: + $ref: '#/components/schemas/MetricRef' + type: array + results: + items: + $ref: '#/components/schemas/BenchmarkMetricResult' + type: array + title: Results + description: Results for each metric in the benchmark. + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - benchmark + - results + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: BenchmarkJobResult + description: Response type for benchmark job result. + BenchmarkJobResultsListResponse: + properties: + data: + items: + $ref: '#/components/schemas/BenchmarkJobResult' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: BenchmarkJobResultsListResponse + BenchmarkMetricResult: + properties: + scores: + items: + anyOf: + - $ref: '#/components/schemas/AggregateRangeScore' + - $ref: '#/components/schemas/AggregateRubricScore' + type: array + title: Scores + description: The list of aggregated scores. + metric: + allOf: + - $ref: '#/components/schemas/MetricRef' + description: The metric used for the evaluation job to generate the result. + additionalProperties: false + type: object + required: + - scores + title: BenchmarkMetricResult + description: Aggregated results for a single metric within a benchmark. + BenchmarkOfflineJob: + properties: + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: 'Reference to the benchmark for evaluation (format: workspace/name).' + params: + allOf: + - $ref: '#/components/schemas/RunConfig' + description: Execution parameters for the benchmark job. + additionalProperties: false + type: object + required: + - benchmark + title: BenchmarkOfflineJob + description: 'Input for an offline benchmark evaluation job. + + + Evaluates the benchmark''s dataset against all metrics in the benchmark.' + BenchmarkOnlineAgentJob: + properties: + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: 'Reference to the benchmark for evaluation (format: workspace/name).' + agent: + allOf: + - $ref: '#/components/schemas/Agent' + description: The agent to evaluate. + params: + allOf: + - $ref: '#/components/schemas/RunConfigOnline' + description: Execution parameters for the benchmark job. + prompt_template: + anyOf: + - type: string + - additionalProperties: true + type: object + title: Prompt Template + description: The jinja template to prompt the agent for evaluation. Can + be either a simple string or a structured object (e.g., OpenAI messages + format). Use Jinja template variables like {{input}}, {{output}}, {{context}}, + {{reference}} to reference input columns. + examples: + - content: 'Question: {{input}} + + Answer: ' + type: string + - content: + messages: + - content: 'Question: {{input}} + + Answer: ' + role: user + type: object + additionalProperties: true + optional_fields: + items: + type: string + minLength: 1 + type: array + title: Optional Fields + description: Prompt template fields that should remain available to the + prompt template but not be required by dataset schema validation. + additionalProperties: false + type: object + required: + - benchmark + - agent + - prompt_template + title: BenchmarkOnlineAgentJob + description: 'Input for an online benchmark evaluation job targeting an agent. + + + Evaluates an agent by prompting it with the benchmark''s dataset and then + evaluating + + the responses against all metrics in the benchmark.' + BenchmarkOnlineJob: + properties: + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: 'Reference to the benchmark for evaluation (format: workspace/name).' + model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Model + description: The model to evaluate. + params: + allOf: + - $ref: '#/components/schemas/RunConfigOnlineModel' + description: Execution parameters for the benchmark job. + prompt_template: + anyOf: + - type: string + - additionalProperties: true + type: object + title: Prompt Template + description: The jinja template to prompt the model for evaluation. Can + be either a simple string or a structured object (e.g., OpenAI messages + format). Use Jinja template variables like {{input}}, {{output}}, {{context}}, + {{reference}} to reference input columns. + examples: + - content: 'Question: {{input}} + + Answer: ' + type: string + - content: + messages: + - content: 'Question: {{input}} + + Answer: ' + role: user + type: object + additionalProperties: true + optional_fields: + items: + type: string + minLength: 1 + type: array + title: Optional Fields + description: Prompt template fields that should remain available to the + prompt template but not be required by dataset schema validation. + additionalProperties: false + type: object + required: + - benchmark + - model + - prompt_template + title: BenchmarkOnlineJob + description: 'Input for an online benchmark evaluation job. + + + Evaluates a model by prompting it with the benchmark''s dataset and then evaluating + + the responses against all metrics in the benchmark.' + BenchmarkRef: + type: string + pattern: ^[a-z0-9_-]+/[a-z0-9_-]+$ + title: BenchmarkRef + description: 'Reference to a benchmark in the Benchmarks API. + + + A reference is a string with format ''workspace/benchmark-name'' that points + to a + + persisted benchmark entity. See [Entity references](docs/get-started/concepts/entity-references.md) + for the + + general entity reference pattern used across the platform.' + BenchmarkRequest: + properties: + name: + type: string + title: Name + description: The name of the benchmark. + description: + title: Description + description: The description of the benchmark. + type: string + metrics: + items: + $ref: '#/components/schemas/MetricRef' + type: array + title: Metrics + description: 'The metrics that comprise this benchmark (format: workspace/metric_name).' + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: 'The Fileset containing test data (format: workspace/fileset-name).' + field_mapping: + allOf: + - $ref: '#/components/schemas/FieldMapping' + description: Maps canonical evaluator fields such as 'input' and 'output' + to dataset column paths for this benchmark. + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + additionalProperties: false + type: object + required: + - name + - description + - metrics + - dataset + title: BenchmarkRequest + description: Request schema for creating a benchmark. Workspace comes from route + parameter. + BenchmarksListResponse: + properties: + data: + items: + anyOf: + - $ref: '#/components/schemas/Benchmark' + - $ref: '#/components/schemas/ExtendedBenchmark' + - $ref: '#/components/schemas/SystemBenchmark' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: BenchmarksListResponse + BuiltInDataset: + type: string + enum: + - beir/climate-fever + - beir/cqadupstack + - beir/dbpedia-entity + - beir/fever + - beir/fiqa + - beir/germanquad + - beir/hotpotqa + - beir/mmarco + - beir/mrtydi + - beir/msmarco-v2 + - beir/msmarco + - beir/nfcorpus + - beir/nq-train + - beir/nq + - beir/quora + - beir/scidocs + - beir/scifact + - beir/trec-covid-beir + - beir/trec-covid-v2 + - beir/trec-covid + - beir/vihealthqa + - beir/webis-touche2020 + - ragas/amnesty_qa + title: BuiltInDataset + description: Well-known dataset (BEIR or RAGAS) referenced by its identifier. + CPUExecutionProviderInput: + properties: + provider: + type: string + const: cpu + title: Provider + default: cpu + profile: + type: string + title: Profile + default: default + container: + $ref: '#/components/schemas/ContainerSpec' + resources: + allOf: + - $ref: '#/components/schemas/ComputeResources' + description: Resource requests and limits for CPU execution. + type: object + required: + - container + title: CPUExecutionProviderInput + description: 'CPU-based execution provider. + + + Provides configuration for running jobs on CPU resources with + + resource requests and limits.' + CPUExecutionProviderOutput: + properties: + provider: + type: string + const: cpu + title: Provider + default: cpu + profile: + type: string + title: Profile + default: default + container: + $ref: '#/components/schemas/ContainerSpec' + resources: + allOf: + - $ref: '#/components/schemas/ComputeResources' + description: Resource requests and limits for CPU execution. + type: object + required: + - container + title: CPUExecutionProviderOutput + description: 'CPU-based execution provider. + + + Provides configuration for running jobs on CPU resources with + + resource requests and limits.' + CacheStatsConfig: + properties: + enabled: + type: boolean + title: Enabled + description: Whether cache statistics tracking is enabled + default: false + log_interval: + title: Log Interval + description: Seconds between periodic cache stats logging to logs (None + disables logging) + type: number + type: object + title: CacheStatsConfig + description: Configuration for cache statistics tracking and logging. + CacheStatus: + type: string + enum: + - cached + - caching + - not_cached + - not_cacheable + title: CacheStatus + description: Cache status for files in external storage backends. + CapturedChatCompletionsRequest: + properties: + messages: + items: + $ref: '#/components/schemas/CapturedChatMessage' + type: array + title: Messages + description: Messages comprising the conversation. + model: + type: string + title: Model + description: The model identifier used for this request. + additionalProperties: true + type: object + required: + - messages + - model + title: CapturedChatCompletionsRequest + description: Flexible captured chat-completions request. + CapturedChatCompletionsResponse: + oneOf: + - required: + - choices + - required: + - error + properties: + choices: + title: Choices + items: + additionalProperties: true + type: object + type: array + error: + title: Error + additionalProperties: true + type: object + additionalProperties: true + type: object + title: CapturedChatCompletionsResponse + description: Flexible captured chat-completions response. + CapturedChatMessage: + properties: + role: + allOf: + - $ref: '#/components/schemas/ChatMessageRole' + description: The role of the message sender. + additionalProperties: true + type: object + required: + - role + title: CapturedChatMessage + description: A flexible message model that requires a valid role field but allows + provider-specific fields. + ChatCompletionAssistantMessageParam: + properties: + role: + type: string + const: assistant + title: Role + description: The role of the messages author, in this case `assistant`. + content: + title: Content + description: The contents of the assistant message. + type: string + function_call: + allOf: + - $ref: '#/components/schemas/FunctionCall' + description: Deprecated and replaced by `tool_calls`. + name: + title: Name + description: An optional name for the participant. + type: string + tool_calls: + title: Tool Calls + description: The tool calls generated by the model, such as function calls. + items: + $ref: '#/components/schemas/ChatCompletionMessageToolCallParam' + type: array + additionalProperties: false + type: object + required: + - role + title: ChatCompletionAssistantMessageParam + description: Assistant message parameter for chat completion. + ChatCompletionContentPartImageParam: + properties: + image_url: + allOf: + - $ref: '#/components/schemas/ImageURL' + description: The image URL information. + type: + type: string + const: image_url + title: Type + description: The type of the content part. + additionalProperties: false + type: object + required: + - image_url + - type + title: ChatCompletionContentPartImageParam + description: Image content part for chat messages. + ChatCompletionContentPartTextParam: + properties: + text: + type: string + title: Text + description: The text content. + type: + type: string + const: text + title: Type + description: The type of the content part. + additionalProperties: false + type: object + required: + - text + - type + title: ChatCompletionContentPartTextParam + description: Text content part for chat messages. + ChatCompletionFunctionMessageParam: + properties: + content: + title: Content + description: The contents of the function message. + type: string + name: + type: string + title: Name + description: The name of the function to call. + role: + type: string + const: function + title: Role + description: The role of the messages author, in this case `function`. + additionalProperties: false + type: object + required: + - content + - name + - role + title: ChatCompletionFunctionMessageParam + description: Function message parameter for chat completion. + ChatCompletionMessageToolCallParam: + properties: + id: + type: string + title: Id + description: The ID of the tool call. + function: + allOf: + - $ref: '#/components/schemas/Function' + description: The function that the model called. + type: + type: string + const: function + title: Type + description: The type of the tool. Currently, only `function` is supported. + additionalProperties: false + type: object + required: + - id + - function + - type + title: ChatCompletionMessageToolCallParam + description: Tool call parameter for chat completion messages. + ChatCompletionSystemMessageParam: + properties: + content: + type: string + title: Content + description: The contents of the system message. + role: + type: string + const: system + title: Role + description: The role of the messages author, in this case `system`. + name: + title: Name + description: An optional name for the participant. + type: string + additionalProperties: false + type: object + required: + - content + - role + title: ChatCompletionSystemMessageParam + description: System message parameter for chat completion. + ChatCompletionToolMessageParam: + properties: + content: + type: string + title: Content + description: The contents of the tool message. + role: + type: string + const: tool + title: Role + description: The role of the messages author, in this case `tool`. + tool_call_id: + type: string + title: Tool Call Id + description: Tool call that this message is responding to. + additionalProperties: false + type: object + required: + - content + - role + - tool_call_id + title: ChatCompletionToolMessageParam + description: Tool message parameter for chat completion. + ChatCompletionUserMessageParam: + properties: + content: + anyOf: + - type: string + - items: + anyOf: + - $ref: '#/components/schemas/ChatCompletionContentPartTextParam' + - $ref: '#/components/schemas/ChatCompletionContentPartImageParam' + type: array + title: Content + description: The contents of the user message. + role: + type: string + const: user + title: Role + description: The role of the messages author, in this case `user`. + name: + title: Name + description: An optional name for the participant. + type: string + additionalProperties: false + type: object + required: + - content + - role + title: ChatCompletionUserMessageParam + description: User message parameter for chat completion. + ChatCompletionsIngestRequest: + properties: + experiment_context: + $ref: '#/components/schemas/ExperimentContext' + evaluation_context: + allOf: + - $ref: '#/components/schemas/EvaluationContext' + description: Deprecated. Use experiment_context; when both are sent, experiment_context + takes precedence. + deprecated: true + request: + $ref: '#/components/schemas/CapturedChatCompletionsRequest' + response: + $ref: '#/components/schemas/CapturedChatCompletionsResponse' + session_id: + title: Session Id + description: Groups related chat-completions calls without forcing them + into the same trace. + type: string + trace_id: + title: Trace Id + description: Opt into joining an existing trace built via OTel or ATIF. + This is not a grouping mechanism for chat-completions calls; use session_id + to group related calls. + type: string + provider: + title: Provider + type: string + cost_usd: + title: Cost Usd + description: Total estimated cost of this model call in USD. This matches + ATIF step metrics; Intake stores it as semantic cost_total_usd on spans. + type: number + minimum: 0.0 + cost_input_usd: + title: Cost Input Usd + description: Estimated input-token cost of this model call in USD. + type: number + minimum: 0.0 + cost_output_usd: + title: Cost Output Usd + description: Estimated output-token cost of this model call in USD. + type: number + minimum: 0.0 + cost_details: + additionalProperties: + type: number + minimum: 0.0 + type: object + title: Cost Details + description: Additional estimated cost breakdown fields in USD. + additionalProperties: false + type: object + required: + - request + - response + title: ChatCompletionsIngestRequest + ChatCompletionsIngestResponse: + properties: + session_id: + type: string + title: Session Id + span_id: + type: string + title: Span Id + type: object + required: + - session_id + - span_id + title: ChatCompletionsIngestResponse + ChatMessageRole: + type: string + enum: + - user + - system + - assistant + - developer + - tool + - function + title: ChatMessageRole + description: Valid role values for captured chat-completions messages. + ClavataRailConfig: + properties: + server_endpoint: + type: string + title: Server Endpoint + description: The endpoint for the Clavata API + default: https://gateway.app.clavata.ai:8443 + policies: + additionalProperties: + type: string + type: object + title: Policies + description: A dictionary of policy aliases and their corresponding IDs. + label_match_logic: + type: string + enum: + - ANY + - ALL + title: Label Match Logic + description: "The logic to use when deciding whether the evaluation matched.\n\ + \ If ANY, only one of the configured labels needs to be found in\ + \ the input or output.\n If ALL, all configured labels must be\ + \ found in the input or output." + default: ANY + input: + allOf: + - $ref: '#/components/schemas/ClavataRailOptions' + description: Clavata configuration for an Input Guardrail + output: + allOf: + - $ref: '#/components/schemas/ClavataRailOptions' + description: Clavata configuration for an Output Guardrail + type: object + title: ClavataRailConfig + description: Configuration data for the Clavata API + ClavataRailOptions: + properties: + policy: + type: string + title: Policy + description: The policy alias to use when evaluating inputs or outputs. + labels: + items: + type: string + type: array + title: Labels + description: "A list of labels to match against the policy.\n If\ + \ no labels are provided, the overall policy result will be returned.\n\ + \ If labels are provided, only hits on the provided labels will\ + \ be considered a hit." + type: object + required: + - policy + title: ClavataRailOptions + description: Configuration data for the Clavata API + ComputeResourceSpec: + properties: + cpu: + title: Cpu + description: CPU specification (e.g., '250m', '1', '2.5'). + type: string + memory: + title: Memory + description: Memory specification (e.g., '128Mi', '1Gi', '512M'). + type: string + type: object + title: ComputeResourceSpec + description: Resource specification. + ComputeResources: + properties: + requests: + allOf: + - $ref: '#/components/schemas/ComputeResourceSpec' + description: Minimum resources requested for the container. + limits: + allOf: + - $ref: '#/components/schemas/ComputeResourceSpec' + description: Maximum resources the container can use. + num_nodes: + type: integer + minimum: 1.0 + title: Num Nodes + description: Number of nodes to use. + default: 1 + num_gpus: + title: Num Gpus + description: Step requesting number of GPUs. + type: integer + shm_size: + title: Shm Size + description: Shared memory (/dev/shm) size as a Kubernetes quantity (e.g. + '1Gi', '4Gi'). Used for GPU and distributed-GPU job executors. When unset, + defaults to 1Gi per allocated GPU. + type: string + type: object + title: ComputeResources + description: Resource requirements matching k8s ResourceRequirements format. + ContainerSpec: + properties: + image: + type: string + title: Image + entrypoint: + items: + type: string + type: array + title: Entrypoint + command: + items: + type: string + type: array + title: Command + type: object + required: + - image + title: ContainerSpec + description: 'Specification for a container configuration. + + + Defines the container image and related configuration for job execution.' + ContentSafetyConfig: + properties: + multilingual: + $ref: '#/components/schemas/MultilingualConfig' + reasoning: + $ref: '#/components/schemas/ReasoningConfig' + type: object + title: ContentSafetyConfig + description: Configuration data for content safety rails. + ContextEntityRecallMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_entity_recall + title: Type + default: context_entity_recall + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ContextEntityRecallMetric + description: RAGAS metric for measuring context entity recall. + ContextEntityRecallMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_entity_recall + title: Type + default: context_entity_recall + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextEntityRecallMetricInput + description: Request type for ContextEntityRecall metrics. + ContextEntityRecallMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_entity_recall + title: Type + default: context_entity_recall + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextEntityRecallMetricResponse + description: Response type for ContextEntityRecall metrics. + ContextPrecisionMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_precision + title: Type + default: context_precision + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ContextPrecisionMetric + description: RAGAS metric for measuring context precision. + ContextPrecisionMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_precision + title: Type + default: context_precision + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextPrecisionMetricInput + description: Request type for ContextPrecision metrics. + ContextPrecisionMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_precision + title: Type + default: context_precision + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextPrecisionMetricResponse + description: Response type for ContextPrecision metrics. + ContextRecallMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_recall + title: Type + default: context_recall + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ContextRecallMetric + description: RAGAS metric for measuring context recall. + ContextRecallMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_recall + title: Type + default: context_recall + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextRecallMetricInput + description: Request type for ContextRecall metrics. + ContextRecallMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_recall + title: Type + default: context_recall + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextRecallMetricResponse + description: Response type for ContextRecall metrics. + ContextRelevanceMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_relevance + title: Type + default: context_relevance + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ContextRelevanceMetric + description: RAGAS metric for measuring context relevance. + ContextRelevanceMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_relevance + title: Type + default: context_relevance + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextRelevanceMetricInput + description: Request type for ContextRelevance metrics. + ContextRelevanceMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: context_relevance + title: Type + default: context_relevance + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ContextRelevanceMetricResponse + description: Response type for ContextRelevance metrics. + CreateAdapterRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the adapter. Name must be unique in the workspace. + Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, + and dots.' + examples: + - lora-adapter-v1 + - my-finetune + description: + title: Description + description: Optional description of the adapter + type: string + maxLength: 1000 + fileset: + type: string + title: Fileset + description: Location where adapter files are stored - expected format {workspace}/{fileset_name} + finetuning_type: + allOf: + - $ref: '#/components/schemas/FinetuningType' + description: Type of finetuning (LORA, P_TUNING, etc.) + enabled: + type: boolean + title: Enabled + description: Whether to make this adapter available for inference post training + default: true + lora_config: + allOf: + - $ref: '#/components/schemas/Lora' + description: Lora configuration specifics + model: + type: string + maxLength: 127 + title: Model + description: "Base model entity.\n Use `{workspace}/{model_name}`\ + \ to reference a model in any workspace, or a single `{model_name}` resolved\ + \ in the path workspace. A single name (2-63 characters) or 'workspace/model_name'\ + \ where each segment is a valid name (lowercase, digits, hyphens, and\ + \ temporarily @ . + _; no leading/trailing or consecutive hyphens). If\ + \ one slash, both sides must be non-empty." + examples: + - llama-3-8b-instruct + - shared-tenant/base-llm + type: object + required: + - name + - fileset + - finetuning_type + - model + title: CreateAdapterRequest + description: Request body for Adapter creation. + CreateFilesetRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'The name of the fileset. Allowed characters: letters (a-z, + A-Z), digits (0-9), underscores, hyphens, and dots.' + examples: + - training-data-v1 + - llama-checkpoint + description: + title: Description + description: The description of the fileset. + type: string + maxLength: 255 + project: + title: Project + description: The name of the project associated with this fileset. + type: string + storage: + anyOf: + - $ref: '#/components/schemas/LocalStorageConfig' + - $ref: '#/components/schemas/NGCStorageConfig' + - $ref: '#/components/schemas/HuggingfaceStorageConfig' + - $ref: '#/components/schemas/S3StorageConfig' + title: Storage + description: The storage configuration for the fileset. If not provided, + uses default storage. + purpose: + allOf: + - $ref: '#/components/schemas/FilesetPurpose' + description: The purpose of the fileset. + default: generic + metadata: + allOf: + - $ref: '#/components/schemas/FilesetMetadataInput' + description: 'Purpose-specific metadata. Use the purpose as the key (e.g., + {dataset: {...}}).' + custom_fields: + additionalProperties: true + type: object + title: Custom Fields + description: Custom fields for the fileset. + cache: + type: boolean + title: Cache + description: Cache all files after creation. Only applies to external storage. + default: false + type: object + required: + - name + title: CreateFilesetRequest + CreateModelAdapterRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the adapter. Name must be unique in the workspace. + Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, + and dots.' + examples: + - lora-adapter-v1 + - my-finetune + description: + title: Description + description: Optional description of the adapter + type: string + maxLength: 1000 + fileset: + type: string + title: Fileset + description: Location where adapter files are stored - expected format {workspace}/{fileset_name} + finetuning_type: + allOf: + - $ref: '#/components/schemas/FinetuningType' + description: Type of finetuning (LORA, P_TUNING, etc.) + enabled: + type: boolean + title: Enabled + description: Whether to make this adapter available for inference post training + default: true + lora_config: + allOf: + - $ref: '#/components/schemas/Lora' + description: Lora configuration specifics + type: object + required: + - name + - fileset + - finetuning_type + title: CreateModelAdapterRequest + description: Request body for nested Adapter creation. The base model comes + from the URL path, not the body. + CreateModelDeploymentConfigRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the deployment configuration. Allowed characters: + letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots.' + examples: + - nim-config-v1 + - production-config + project: + title: Project + description: The URN of the project associated with this deployment configuration + type: string + maxLength: 255 + pattern: ^[\w\-./]+$ + description: + title: Description + description: Optional description of the deployment configuration + type: string + maxLength: 1000 + nim_deployment: + allOf: + - $ref: '#/components/schemas/NIMDeployment' + description: Configuration for NIM-based deployment + model_entity_id: + title: Model Entity Id + description: Optional reference to the base model entity ID for this deployment + type: string + maxLength: 255 + type: object + required: + - name + - nim_deployment + title: CreateModelDeploymentConfigRequest + description: Request model for creating a ModelDeploymentConfig. + CreateModelDeploymentRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the deployment. Allowed characters: letters (a-z, + A-Z), digits (0-9), underscores, hyphens, and dots.' + examples: + - llama-deploy-v1 + - production-nim + project: + title: Project + description: The URN of the project associated with this deployment + type: string + maxLength: 255 + pattern: ^[\w\-./]+$ + config: + type: string + maxLength: 255 + title: Config + description: Reference to the ModelDeploymentConfig name + config_version: + title: Config Version + description: Reference to a specific ModelDeploymentConfig version. If not + specified, uses latest. + type: integer + type: object + required: + - name + - config + title: CreateModelDeploymentRequest + description: Request model for creating a ModelDeployment. + CreateModelEntityRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the model entity. Allowed characters: letters (a-z, + A-Z), digits (0-9), underscores, hyphens, and dots.' + examples: + - llama-3.1-8b + - my-custom-model + project: + title: Project + description: The URN of the project associated with this model entity + type: string + maxLength: 255 + pattern: ^[\w\-./]+$ + description: + title: Description + description: Optional description of the model + type: string + maxLength: 1000 + spec: + allOf: + - $ref: '#/components/schemas/ModelSpec' + description: Detailed specification for the model - Automatically generated + by the platform at creation when fileset provided. + finetuning_type: + allOf: + - $ref: '#/components/schemas/FinetuningType' + description: Set for full weight finetuned models + fileset: + title: Fileset + description: A set of checkpoint files, configs, and other auxiliary info + associated with this model - expected format {workspace}/{fileset_name} + type: string + base_model: + title: Base Model + description: Link to another model which is used as a base for the current + model + type: string + api_endpoint: + allOf: + - $ref: '#/components/schemas/APIEndpointData' + description: Data about the inference endpoint for this model + backend_format: + allOf: + - $ref: '#/components/schemas/BackendFormat' + description: Inference API wire format expected by the backend. If unset, + inference routing treats the model as OPENAI_CHAT. + nullable: true + prompt: + allOf: + - $ref: '#/components/schemas/PromptData' + description: Configuration for prompt engineering + custom_fields: + title: Custom Fields + description: Custom fields for additional metadata + additionalProperties: true + type: object + ownership: + title: Ownership + description: Ownership information for the model + additionalProperties: true + type: object + model_providers: + title: Model Providers + description: List of ModelProvider workspace/name resource names that provide + inference for this Model Entity + items: + type: string + type: array + trust_remote_code: + type: boolean + title: Trust Remote Code + description: "Whether to trust remote code for the checkpoint.\n \ + \ Some models without support in certain libraries such as Transformers\ + \ require additional custom Python code to execute.\n Due to security\ + \ ramifications of running arbitrary code, this can only be set to true\ + \ on one of the following conditions:\n (1) the model's fileset's\ + \ source is pre-approved in the platform config, or\n (2) the user\ + \ creating this model is an administrator.\n " + default: false + type: object + required: + - name + title: CreateModelEntityRequest + description: Request model for creating a Model Entity. + CreateModelProviderRequest: + properties: + name: + type: string + maxLength: 255 + pattern: ^[\w\-.]+$ + title: Name + description: 'Name of the model provider. Allowed characters: letters (a-z, + A-Z), digits (0-9), underscores, hyphens, and dots.' + examples: + - my-nim-provider + - openai-endpoint + project: + title: Project + description: The URN of the project associated with this model provider + type: string + maxLength: 255 + pattern: ^[\w\-./]+$ + description: + title: Description + description: Optional description of the model provider + type: string + maxLength: 1000 + host_url: + type: string + maxLength: 2048 + title: Host Url + description: The network endpoint URL for the model provider + api_key_secret_name: + title: Api Key Secret Name + description: Reference to an API key secret stored in the Secrets service. + Create the secret first via secrets API, then pass the secret name here. + type: string + maxLength: 255 + enabled_models: + title: Enabled Models + description: Optional list of specific models to enable from this provider + items: + type: string + type: array + default_extra_body: + title: Default Extra Body + description: Default body parameters for inference requests. Can be overridden + by user requests. + additionalProperties: true + type: object + default_extra_headers: + title: Default Extra Headers + description: Default headers for inference requests. Can be overridden by + user requests. + additionalProperties: + type: string + type: object + required_extra_body: + title: Required Extra Body + description: Required body parameters for inference requests. Cannot be + overridden by user requests. + additionalProperties: true + type: object + required_extra_headers: + title: Required Extra Headers + description: Required headers for inference requests. Cannot be overridden + by user requests. + additionalProperties: + type: string + type: object + model_deployment_id: + title: Model Deployment Id + description: Optional reference to the ModelDeployment ID if this provider + is being auto-created for a deployment + type: string + maxLength: 255 + status: + allOf: + - $ref: '#/components/schemas/ModelProviderStatus' + description: Status of the model provider + status_message: + title: Status Message + description: Status message + type: string + maxLength: 1000 + auth_header_format: + title: Auth Header Format + description: 'Jinja2 template string controlling how the API key secret + is sent to the upstream. Must contain exactly one variable named `auth_secret`, + which is substituted with the resolved secret value at request time. Example: + `''X-Api-Key: {{ auth_secret }}''`. If not set, defaults to `''Authorization: + Bearer {{ auth_secret }}''`.' + type: string + maxLength: 1024 + type: object + required: + - name + - host_url + title: CreateModelProviderRequest + description: Request model for creating a ModelProvider. + CreatePlatformJobRequest: + properties: + name: + title: Name + type: string + description: + title: Description + type: string + project: + title: Project + type: string + spec: + additionalProperties: true + type: object + title: Spec + platform_spec: + $ref: '#/components/schemas/PlatformJobSpecInput' + source: + type: string + title: Source + ownership: + title: Ownership + additionalProperties: true + type: object + custom_fields: + title: Custom Fields + additionalProperties: true + type: object + type: object + required: + - spec + - platform_spec + - source + title: CreatePlatformJobRequest + description: Request model for creating a new platform job. + CreateVirtualModelRequest: + properties: + default_model_entity: + title: Default Model Entity + description: Model entity to route to, in "workspace/name" format. Written + into request["model"] before the request middleware pipeline runs. If + omitted, a request middleware plugin must handle backend routing itself. + Set to null to clear an existing value. + type: string + autoprovisioned: + type: boolean + title: Autoprovisioned + description: Marks this VirtualModel as controller-managed. The Models controller + will delete it once no ModelProvider serves the matching entity. Setting + this manually opts the VirtualModel into that cleanup behavior. + default: false + models: + items: + $ref: '#/components/schemas/VirtualModelInferenceConfig' + type: array + title: Models + description: Model entity references used by this VirtualModel. A per-entry + backend_format overrides the referenced ModelEntity backend_format when + IGW resolves the backend format for a request. + request_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Request Middleware + description: Ordered list of middleware plugins applied before proxying + to the backend. Each entry is a MiddlewareCall with a "name" (plugin identifier) + and optional "config_type" and "config_id" fields that reference a stored + plugin configuration. + response_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Response Middleware + description: Ordered list of middleware plugins applied after the backend + response is received, before returning it to the caller. + post_response_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Post Response Middleware + description: Ordered list of middleware plugins invoked after the response + has been returned to the caller. Intended for fire-and-forget work (logging, + analytics) that must not block or modify the response. + override_proxy: + title: Override Proxy + description: 'Plugin-provided proxy implementation for IGW to use instead + of its default aiohttp proxy. Format: "plugin-name.proxy-name". Leave + unset to use the default IGW proxy. Set to null to clear an existing value.' + type: string + name: + type: string + title: Name + description: Name of the virtual model within the workspace. Must be unique + per workspace. + type: object + required: + - name + title: CreateVirtualModelRequest + description: Request body for creating a new VirtualModel. + CrowdStrikeAIDRRailConfig: + properties: + timeout: + type: number + title: Timeout + description: Timeout in seconds for API requests to CrowdStrike AIDR + default: 30.0 + type: object + title: CrowdStrikeAIDRRailConfig + description: Configuration data for the CrowdStrike AIDR API + DatasetMetadataContent: + properties: + schema: + anyOf: + - additionalProperties: true + type: object + - type: string + title: Schema + description: Default row schema for files in this fileset, either inline + JSON Schema or a schema_defs key. + schema_defs: + additionalProperties: + additionalProperties: true + type: object + type: object + title: Schema Defs + description: Reusable JSON Schema definitions keyed by name for deduplicating + per-file dataset schemas. + schemas_by_path: + additionalProperties: + anyOf: + - additionalProperties: true + type: object + - type: string + type: object + title: Schemas By Path + description: Optional per-file row schemas keyed by relative path within + the fileset. Each value may be inline JSON Schema or a schema_defs key. + type: object + title: DatasetMetadataContent + description: Content for dataset-type filesets. + DatasetRows: + properties: + rows: + items: + additionalProperties: true + type: object + type: array + minItems: 1 + title: Rows + description: Array of data rows. Each row can be any valid JSON value (object, + string, array, etc.). + additionalProperties: false + type: object + required: + - rows + title: DatasetRows + description: 'Inline dataset definition with embedded rows. + + + Use this for quick evaluations without persisting the dataset first.' + DateRangeFilter: + description: Filter for date ranges. + properties: + gte: + description: Greater than or equal to this date + title: Gte + format: date-time + type: string + lte: + description: Less than or equal to this date + title: Lte + format: date-time + type: string + title: DateRangeFilter + type: object + DatetimeFilter: + additionalProperties: false + properties: + $gte: + description: Filter for results greater than or equal to this datetime. + title: $Gte + format: date-time + type: string + $lte: + description: Filter for results less than or equal to this datetime. + title: $Lte + format: date-time + type: string + title: DatetimeFilter + type: object + DeleteResponse: + properties: + message: + type: string + title: Message + default: Resource deleted successfully. + id: + title: Id + description: The ID of the deleted resource. + type: string + deleted_at: + title: Deleted At + description: The timestamp when the resource was deleted. + type: string + format: date-time + type: object + title: DeleteResponse + DialogRails: + properties: + single_call: + allOf: + - $ref: '#/components/schemas/SingleCallConfig' + description: Configuration for the single LLM call option. + user_messages: + $ref: '#/components/schemas/UserMessagesConfig' + type: object + title: DialogRails + description: Configuration of topical rails. + DistributedGPUExecutionProviderInput: + properties: + provider: + type: string + const: gpu_distributed + title: Provider + default: gpu_distributed + profile: + type: string + title: Profile + default: default + container: + $ref: '#/components/schemas/ContainerSpec' + resources: + allOf: + - $ref: '#/components/schemas/ComputeResources' + description: Resource requests and limits for distributed GPU execution. + type: object + required: + - container + title: DistributedGPUExecutionProviderInput + description: 'GPU-based execution provider. + + + Provides configuration for running jobs on GPU resources with + + resource requests and limits.' + DistributedGPUExecutionProviderOutput: + properties: + provider: + type: string + const: gpu_distributed + title: Provider + default: gpu_distributed + profile: + type: string + title: Profile + default: default + container: + $ref: '#/components/schemas/ContainerSpec' + resources: + allOf: + - $ref: '#/components/schemas/ComputeResources' + description: Resource requests and limits for distributed GPU execution. + type: object + required: + - container + title: DistributedGPUExecutionProviderOutput + description: 'GPU-based execution provider. + + + Provides configuration for running jobs on GPU resources with + + resource requests and limits.' + DockerJobExecutionProfile: + properties: + provider: + type: string + title: Provider + description: The compute provider for the executor, e.g., cpu, gpu + default: cpu + profile: + type: string + title: Profile + description: The profile name for the executor, e.g., high_priority_a100, + low_priority, etc. + default: default + backend: + type: string + const: docker + title: Backend + default: docker + config: + allOf: + - $ref: '#/components/schemas/DockerJobExecutionProfileConfig' + description: Additional configuration for the docker executor + type: object + required: + - config + title: DockerJobExecutionProfile + description: 'Execution configuration for a Docker Job. + + This is used to define the executor type, provider, profile, and any additional + configuration + + required for the executor to run the job on Docker' + DockerJobExecutionProfileConfig: + properties: + ttl_seconds_before_active: + type: integer + title: Ttl Seconds Before Active + default: 1800 + ttl_seconds_active: + type: integer + title: Ttl Seconds Active + default: 86400 + ttl_seconds_after_finished: + type: integer + title: Ttl Seconds After Finished + default: 3600 + cleanup_completed_jobs_immediately: + type: boolean + title: Cleanup Completed Jobs Immediately + default: true + launcher_tool_path: + type: string + title: Launcher Tool Path + description: Path to the jobs launcher tool + default: /tools/jobs-launcher + env: + additionalProperties: + type: string + type: object + title: Env + description: Optional env vars applied to all jobs (e.g. HOME=/tmp). Keys + must not conflict with platform-reserved names. Job steps may override + these variables. + storage: + allOf: + - $ref: '#/components/schemas/DockerJobStorageConfig' + description: Docker storage configuration + networking: + allOf: + - $ref: '#/components/schemas/DockerJobNetworkConfig' + description: Docker networking configuration + type: object + title: DockerJobExecutionProfileConfig + description: Configuration for Docker Job execution profile. + DockerJobNetworkConfig: + properties: + job_container_network: + type: string + title: Job Container Network + description: Docker network for the job container + default: host + type: object + title: DockerJobNetworkConfig + DockerJobStorageConfig: + properties: + volume_name: + type: string + title: Volume Name + description: Name of the Docker volume for persistent storage + default: nemo-jobs-storage + volume_permissions_image: + type: string + title: Volume Permissions Image + description: Docker image used to set permissions on the volume + default: busybox + additional_volume_mounts: + items: + $ref: '#/components/schemas/DockerVolumeMount' + type: array + title: Additional Volume Mounts + description: List of additional Docker volume mounts for the job + type: object + title: DockerJobStorageConfig + description: Configuration for persistent storage in Docker jobs. + DockerVolumeMount: + properties: + volume_name: + type: string + title: Volume Name + description: Name of the Docker volume to mount + mount_path: + type: string + title: Mount Path + description: Path inside the container where the volume will be mounted + kind: + type: string + enum: + - volume + - tmpfs + title: Kind + description: 'Type of the Docker volume to mount. Options are ''volume'' + or ''tmpfs'' (default: ''volume''). tmpfs volumes are only supported on + Linux hosts.' + default: volume + options: + title: Options + description: Additional options for the volume + additionalProperties: true + type: object + allow_create_volume: + type: boolean + title: Allow Create Volume + description: 'Whether to allow the creation of the volume if it does not + exist (default: false).' + default: false + type: object + required: + - volume_name + - mount_path + title: DockerVolumeMount + E2EJobExecutionProfile: + properties: + provider: + type: string + title: Provider + description: The compute provider for the executor, e.g., cpu, gpu + default: cpu + profile: + type: string + title: Profile + description: The profile name for the executor, e.g., high_priority_a100, + low_priority, etc. + default: default + backend: + type: string + const: e2e + title: Backend + default: e2e + config: + allOf: + - $ref: '#/components/schemas/JobExecutionProfileConfig' + description: Configuration for the e2e test executor + type: object + title: E2EJobExecutionProfile + description: 'Execution configuration for E2E testing. + + This backend auto-completes jobs without actually running containers, + + making tests fast and deterministic.' + EntitiesPage: + properties: + data: + items: + $ref: '#/components/schemas/Entity' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: EntitiesPage + Entity: + properties: + entity_type: + type: string + title: Entity Type + description: Entity type identifier + id: + type: string + title: Id + description: UUID identifier + workspace: + type: string + title: Workspace + description: Workspace identifier + parent: + title: Parent + description: Parent entity ID for nested entities + type: string + project: + title: Project + description: The name of the project associated with this entity + type: string + name: + type: string + title: Name + description: Entity name + data: + additionalProperties: true + type: object + title: Data + description: Entity data + created_at: + type: string + format: date-time + title: Created At + description: Timestamp of entity creation + created_by: + title: Created By + description: Principal id for entity creator + type: string + updated_at: + type: string + format: date-time + title: Updated At + description: Timestamp of last entity update + updated_by: + title: Updated By + description: Principal id for last entity update + type: string + db_version: + type: integer + title: Db Version + description: Database version of the entity for optimistic locking. + additionalProperties: false + type: object + required: + - entity_type + - id + - workspace + - name + - data + - created_at + - updated_at + - db_version + title: Entity + description: Entity schema for API responses. + EntityCreateInput: + properties: + name: + title: Name + description: Entity name (optional - auto-generated if not provided). Name + must start with a lowercase letter, be 2-63 characters, and contain only + lowercase letters, digits, and hyphens (no consecutive hyphens, cannot + end with a hyphen). + examples: + - my-config + - baseline-model-v1 + type: string + pattern: ^[a-z](?!.*--)[a-z0-9\-@.+_]{1,62}(? + - not equals + - '>=' + - gte + - greater than or equal + - '>' + - gt + - greater than + - <= + - lte + - less than or equal + - < + - lt + - less than + - absolute difference + title: Operation + description: The operation to compute for the metric. + left_template: + type: string + title: Left Template + description: The template to use for rendering the left value of the operator + to compute the metric. + examples: + - '{{item.dataset_column_name}}' + right_template: + type: string + title: Right Template + description: The template to use for rendering the right value of the operator + to compute the metric. + examples: + - '{{sample.output_text}}' + epsilon: + anyOf: + - type: integer + - type: number + title: Epsilon + description: Specify the tolerance for the absolute difference of values. + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - operation + - left_template + - right_template + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: NumberCheckMetric + description: Persisted number check metric. + NumberCheckMetricInput: + properties: + type: + type: string + const: number-check + title: Type + default: number-check + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + operation: + type: string + enum: + - equals + - == + - '!=' + - <> + - not equals + - '>=' + - gte + - greater than or equal + - '>' + - gt + - greater than + - <= + - lte + - less than or equal + - < + - lt + - less than + - absolute difference + title: Operation + description: The operation to compute for the metric. + left_template: + type: string + title: Left Template + description: The template to use for rendering the left value of the operator + to compute the metric. + examples: + - '{{item.dataset_column_name}}' + right_template: + type: string + title: Right Template + description: The template to use for rendering the right value of the operator + to compute the metric. + examples: + - '{{sample.output_text}}' + epsilon: + anyOf: + - type: integer + - type: number + title: Epsilon + description: Specify the tolerance for the absolute difference of values. + type: object + required: + - operation + - left_template + - right_template + title: NumberCheckMetricInput + description: Request type for NumberCheckMetric. Numeric-comparison metric with + template-driven operands. + NumberCheckMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + const: number-check + title: Type + default: number-check + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + operation: + type: string + enum: + - equals + - == + - '!=' + - <> + - not equals + - '>=' + - gte + - greater than or equal + - '>' + - gt + - greater than + - <= + - lte + - less than or equal + - < + - lt + - less than + - absolute difference + title: Operation + description: The operation to compute for the metric. + left_template: + type: string + title: Left Template + description: The template to use for rendering the left value of the operator + to compute the metric. + examples: + - '{{item.dataset_column_name}}' + right_template: + type: string + title: Right Template + description: The template to use for rendering the right value of the operator + to compute the metric. + examples: + - '{{sample.output_text}}' + epsilon: + anyOf: + - type: integer + - type: number + title: Epsilon + description: Specify the tolerance for the absolute difference of values. + type: object + required: + - operation + - left_template + - right_template + title: NumberCheckMetricResponse + description: Response type for NumberCheckMetric. + NumericFilter: + additionalProperties: false + description: "Range filter for numeric annotation values.\n\nAt least one of\ + \ `$gte` or `$lte` must be supplied \u2014 an empty `{}` is not a\nmeaningful\ + \ filter and is rejected." + minProperties: 1 + properties: + $gte: + description: Include only values greater than or equal to this number. + title: $Gte + type: number + $lte: + description: Include only values less than or equal to this number. + title: $Lte + type: number + title: NumericFilter + type: object + OIDCDiscoveryResponse: + properties: + issuer: + type: string + title: Issuer + authorization_endpoint: + title: Authorization Endpoint + type: string + token_endpoint: + title: Token Endpoint + type: string + device_authorization_endpoint: + title: Device Authorization Endpoint + type: string + userinfo_endpoint: + title: Userinfo Endpoint + type: string + client_id: + type: string + title: Client Id + default_scopes: + type: string + title: Default Scopes + default: openid profile email offline_access + scope_prefix: + title: Scope Prefix + type: string + type: object + required: + - issuer + - client_id + title: OIDCDiscoveryResponse + description: OIDC discovery response for CLI/SDK. + OpenAIListModelsResp: + properties: + data: + items: + $ref: '#/components/schemas/OpenAIModelResp' + type: array + title: Data + object: + type: string + title: Object + default: list + type: object + required: + - data + title: OpenAIListModelsResp + description: Duplicated structure for an OpenAI /v1/models response. + OpenAIModelResp: + properties: + id: + type: string + title: Id + owned_by: + type: string + title: Owned By + object: + type: string + title: Object + default: model + created: + type: integer + title: Created + default: 0 + type: object + required: + - id + - owned_by + title: OpenAIModelResp + description: Duplicated structure for an OpenAI /v1/models individual model + response. + OtelExportLogsPartialSuccess: + properties: + rejectedLogRecords: + type: integer + title: Rejectedlogrecords + description: Number of rejected log records + default: 0 + errorMessage: + title: Errormessage + description: Human-readable error message + type: string + type: object + title: OtelExportLogsPartialSuccess + description: Partial success response details. + OtelExportLogsServiceResponse: + properties: + partialSuccess: + $ref: '#/components/schemas/OtelExportLogsPartialSuccess' + type: object + title: OtelExportLogsServiceResponse + description: 'Response for log export requests. + + + Per OTLP spec, successful responses should be empty or contain partial_success + info.' + OutputRails: + properties: + parallel: + title: Parallel + description: If True, the output rails are executed in parallel. + default: false + type: boolean + flows: + items: + type: string + type: array + title: Flows + description: The names of all the flows that implement output rails. + streaming: + allOf: + - $ref: '#/components/schemas/OutputRailsStreamingConfig' + description: Configuration for streaming output rails. + apply_to_reasoning_traces: + title: Apply To Reasoning Traces + description: If True, output rails will apply guardrails to both reasoning + traces and output response. If False, output rails will only apply guardrails + to the output response excluding the reasoning traces, thus keeping reasoning + traces unaltered. + default: false + type: boolean + type: object + title: OutputRails + description: Configuration of output rails. + OutputRailsStreamingConfig: + properties: + enabled: + type: boolean + title: Enabled + description: Enables streaming mode when True. + default: true + chunk_size: + type: integer + title: Chunk Size + description: The number of tokens in each processing chunk. This is the + size of the token block on which output rails are applied. + default: 200 + context_size: + type: integer + title: Context Size + description: The number of tokens carried over from the previous chunk to + provide context for continuity in processing. + default: 50 + stream_first: + type: boolean + title: Stream First + description: If True, token chunks are streamed immediately before output + rails are applied. + default: true + type: object + title: OutputRailsStreamingConfig + description: Configuration for managing streaming output of LLM tokens. + PaginationData: + properties: + page: + type: integer + title: Page + description: The current page number. + page_size: + type: integer + title: Page Size + description: The page size used for the query. + current_page_size: + type: integer + title: Current Page Size + description: The size for the current page. + total_pages: + type: integer + title: Total Pages + description: The total number of pages. + total_results: + type: integer + title: Total Results + description: The total number of results. + type: object + required: + - page + - page_size + - current_page_size + - total_pages + - total_results + title: PaginationData + PangeaRailConfig: + properties: + input: + allOf: + - $ref: '#/components/schemas/PangeaRailOptions' + description: Pangea configuration for an Input Guardrail + output: + allOf: + - $ref: '#/components/schemas/PangeaRailOptions' + description: Pangea configuration for an Output Guardrail + type: object + title: PangeaRailConfig + description: Configuration data for the Pangea AI Guard API + PangeaRailOptions: + properties: + recipe: + type: string + title: Recipe + description: "Recipe key of a configuration of data types and settings defined\ + \ in the Pangea User Console. It\n specifies the rules that are\ + \ to be applied to the text, such as defang malicious URLs." + type: object + required: + - recipe + title: PangeaRailOptions + description: Configuration data for the Pangea AI Guard API + Parameter: + properties: + name: + type: string + title: Name + description: Name of the parameter. + type: + type: string + enum: + - boolean + - string + - number + - integer + - object + - secret + title: Type + description: The value type of the parameter. + description: + title: Description + description: Description of the parameter. + type: string + default: + anyOf: + - type: boolean + - type: string + - type: number + - type: integer + title: Default + description: The default value of the parameter. + schema: + title: Schema + description: The JSON schema for parameters with object type. + additionalProperties: true + type: object + type: object + required: + - name + - type + title: Parameter + PatronusEvaluateApiParams: + properties: + success_strategy: + allOf: + - $ref: '#/components/schemas/PatronusEvaluationSuccessStrategy' + description: Strategy to determine whether the Patronus Evaluate API Guardrail + passes or not. + default: all_pass + params: + additionalProperties: true + type: object + title: Params + description: Parameters to the Patronus Evaluate API + type: object + title: PatronusEvaluateApiParams + description: Config to parameterize the Patronus Evaluate API call + PatronusEvaluateConfigInput: + properties: + evaluate_config: + allOf: + - $ref: '#/components/schemas/PatronusEvaluateApiParams' + description: Configuration passed to the Patronus Evaluate API + type: object + title: PatronusEvaluateConfigInput + description: Config for the Patronus Evaluate API call + PatronusEvaluateConfigOutput: + properties: + evaluate_config: + allOf: + - $ref: '#/components/schemas/PatronusEvaluateApiParams' + description: Configuration passed to the Patronus Evaluate API + type: object + title: PatronusEvaluateConfigOutput + description: Config for the Patronus Evaluate API call + PatronusEvaluationSuccessStrategy: + type: string + enum: + - all_pass + - any_pass + title: PatronusEvaluationSuccessStrategy + description: 'Strategy for determining whether a Patronus Evaluation API + + request should pass, especially when multiple evaluators + + are called in a single request. + + ALL_PASS requires all evaluators to pass for success. + + ANY_PASS requires only one evaluator to pass for success.' + PatronusRailConfigInput: + properties: + input: + allOf: + - $ref: '#/components/schemas/PatronusEvaluateConfigInput' + description: Patronus Evaluate API configuration for an Input Guardrail + output: + allOf: + - $ref: '#/components/schemas/PatronusEvaluateConfigInput' + description: Patronus Evaluate API configuration for an Output Guardrail + type: object + title: PatronusRailConfigInput + description: Configuration data for the Patronus Evaluate API + PatronusRailConfigOutput: + properties: + input: + allOf: + - $ref: '#/components/schemas/PatronusEvaluateConfigOutput' + description: Patronus Evaluate API configuration for an Input Guardrail + output: + allOf: + - $ref: '#/components/schemas/PatronusEvaluateConfigOutput' + description: Patronus Evaluate API configuration for an Output Guardrail + type: object + title: PatronusRailConfigOutput + description: Configuration data for the Patronus Evaluate API + Percentiles: + properties: + p10: + anyOf: + - type: number + - type: integer + title: P10 + description: 10th percentile. + p20: + anyOf: + - type: number + - type: integer + title: P20 + description: 20th percentile. + p30: + anyOf: + - type: number + - type: integer + title: P30 + description: 30th percentile. + p40: + anyOf: + - type: number + - type: integer + title: P40 + description: 40th percentile. + p50: + anyOf: + - type: number + - type: integer + title: P50 + description: 50th percentile (median). + p60: + anyOf: + - type: number + - type: integer + title: P60 + description: 60th percentile. + p70: + anyOf: + - type: number + - type: integer + title: P70 + description: 70th percentile. + p80: + anyOf: + - type: number + - type: integer + title: P80 + description: 80th percentile. + p90: + anyOf: + - type: number + - type: integer + title: P90 + description: 90th percentile. + p100: + anyOf: + - type: number + - type: integer + title: P100 + description: 100th percentile. + additionalProperties: false + type: object + required: + - p10 + - p20 + - p30 + - p40 + - p50 + - p60 + - p70 + - p80 + - p90 + - p100 + title: Percentiles + description: Percentile distribution of scores. + PlatformJobEnvironmentVariable: + properties: + name: + type: string + title: Name + description: The environment variable name + value: + title: Value + description: The environment variable value + type: string + from_secret: + allOf: + - $ref: '#/components/schemas/PlatformJobSecretEnvironmentVariableRef' + description: Reference to a secret environment variable to populate the + environment variable + type: object + required: + - name + title: PlatformJobEnvironmentVariable + description: Environment variable for a job step + PlatformJobListResultResponse: + properties: + data: + items: + $ref: '#/components/schemas/PlatformJobResultResponse' + type: array + title: Data + type: object + required: + - data + title: PlatformJobListResultResponse + PlatformJobListTaskResponse: + properties: + data: + items: + $ref: '#/components/schemas/PlatformJobTask' + type: array + title: Data + type: object + required: + - data + title: PlatformJobListTaskResponse + description: Response model for listing job tasks. + PlatformJobLog: + properties: + timestamp: + type: string + format: date-time + title: Timestamp + job: + type: string + title: Job + job_step: + type: string + title: Job Step + job_task: + type: string + title: Job Task + message: + type: string + title: Message + type: object + required: + - timestamp + - job + - job_step + - job_task + - message + title: PlatformJobLog + PlatformJobLogPage: + properties: + data: + items: + $ref: '#/components/schemas/PlatformJobLog' + type: array + title: Data + total: + type: integer + title: Total + next_page: + title: Next Page + type: string + prev_page: + title: Prev Page + type: string + type: object + required: + - data + - total + - next_page + - prev_page + title: PlatformJobLogPage + PlatformJobResponse: + properties: + id: + type: string + title: Id + attempt_id: + type: string + title: Attempt Id + name: + type: string + title: Name + workspace: + type: string + title: Workspace + description: Workspace identifier + project: + title: Project + description: Project URN + type: string + description: + title: Description + type: string + source: + type: string + title: Source + spec: + additionalProperties: true + type: object + title: Spec + description: Job Spec + platform_spec: + $ref: '#/components/schemas/PlatformJobSpecOutput' + fileset: + type: string + title: Fileset + description: Fileset ID for storing job artifacts + status: + $ref: '#/components/schemas/PlatformJobStatus' + status_details: + additionalProperties: true + type: object + title: Status Details + description: Details about the job status + error_details: + title: Error Details + additionalProperties: true + type: object + created_at: + type: string + format: date-time + title: Created At + updated_at: + type: string + format: date-time + title: Updated At + ownership: + title: Ownership + additionalProperties: true + type: object + custom_fields: + title: Custom Fields + description: Custom Fields + additionalProperties: true + type: object + type: object + required: + - id + - attempt_id + - name + - workspace + - source + - platform_spec + - fileset + - status + title: PlatformJobResponse + description: Response model for a platform job. + PlatformJobResponsesPage: + properties: + data: + items: + $ref: '#/components/schemas/PlatformJobResponse' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: PlatformJobResponsesPage + PlatformJobResultCreateRequest: + properties: + artifact_url: + type: string + title: Artifact Url + artifact_storage_type: + $ref: '#/components/schemas/FileStorageType' + type: object + required: + - artifact_url + - artifact_storage_type + title: PlatformJobResultCreateRequest + PlatformJobResultResponse: + properties: + name: + type: string + title: Name + job: + type: string + title: Job + workspace: + type: string + title: Workspace + project: + title: Project + type: string + created_at: + type: string + format: date-time + title: Created At + updated_at: + type: string + format: date-time + title: Updated At + artifact_url: + type: string + title: Artifact Url + artifact_storage_type: + $ref: '#/components/schemas/FileStorageType' + download_url: + title: Download Url + type: string + type: object + required: + - name + - job + - workspace + - artifact_url + - artifact_storage_type + title: PlatformJobResultResponse + PlatformJobSecretEnvironmentVariableRef: + properties: + name: + type: string + title: Name + description: The name of the secret to reference + type: object + required: + - name + title: PlatformJobSecretEnvironmentVariableRef + description: Reference to a secret to populate an environment variable for a + job step. + PlatformJobSortField: + type: string + enum: + - created_at + - -created_at + - updated_at + - -updated_at + title: PlatformJobSortField + PlatformJobSpecInput: + properties: + steps: + items: + $ref: '#/components/schemas/PlatformJobStepSpecInput' + type: array + title: Steps + description: List of steps to be executed in the job + type: object + required: + - steps + title: PlatformJobSpecInput + description: Specification for a platform job, containing steps and secrets. + PlatformJobSpecOutput: + properties: + steps: + items: + $ref: '#/components/schemas/PlatformJobStepSpecOutput' + type: array + title: Steps + description: List of steps to be executed in the job + type: object + required: + - steps + title: PlatformJobSpecOutput + description: Specification for a platform job, containing steps and secrets. + PlatformJobStatus: + type: string + enum: + - created + - pending + - active + - cancelled + - cancelling + - error + - completed + - paused + - pausing + - resuming + title: PlatformJobStatus + description: 'Enumeration of possible job statuses. + + + This enum represents the various states a job can be in during its lifecycle, + + from creation to a terminal state.' + PlatformJobStatusResponse: + properties: + id: + type: string + title: Id + name: + type: string + title: Name + status: + $ref: '#/components/schemas/PlatformJobStatus' + status_details: + additionalProperties: true + type: object + title: Status Details + error_details: + title: Error Details + additionalProperties: true + type: object + steps: + items: + $ref: '#/components/schemas/PlatformJobStepStatusResponse' + type: array + title: Steps + created_at: + type: string + format: date-time + title: Created At + updated_at: + type: string + format: date-time + title: Updated At + type: object + required: + - id + - name + - status + - status_details + - error_details + - steps + - created_at + - updated_at + title: PlatformJobStatusResponse + PlatformJobStatusUpdateRequest: + properties: + status: + allOf: + - $ref: '#/components/schemas/PlatformJobStatus' + description: The new status to set for the job. + status_details: + title: Status Details + description: Optional status details related to the status update. + additionalProperties: true + type: object + error_details: + title: Error Details + description: Optional error details related to the status update. + additionalProperties: true + type: object + type: object + required: + - status + title: PlatformJobStatusUpdateRequest + description: Request model for updating job status. + PlatformJobStep: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + attempt_id: + type: string + title: Attempt Id + description: Parent attempt ID + config: + additionalProperties: true + type: object + title: Config + description: Configuration for the step + status: + allOf: + - $ref: '#/components/schemas/PlatformJobStatus' + description: Step status + default: created + status_details: + additionalProperties: true + type: object + title: Status Details + description: Status details + error_details: + title: Error Details + description: Error details if applicable + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - attempt_id + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: PlatformJobStep + description: 'A single step within an attempt. + + + Parent-scoped: unique within (workspace, entity_type, parent=attempt_id).' + PlatformJobStepSpecInput: + properties: + name: + type: string + pattern: ^[a-z](?!.*--)[a-z0-9\-@.+_]{1,62}(? traces) for content + safety models. If False, use low-latency mode without reasoning traces. + default: false + type: object + title: ReasoningConfig + description: Configuration for reasoning mode in content safety models. + ReasoningParams: + properties: + end_token: + title: End Token + description: 'Configure the end token to trim reasoning context based on + the model''s reasoning API. Example for Nemotron models: ''''' + type: string + include_if_not_finished: + title: Include If Not Finished + description: Configure whether to include reasoning context if the model + has not finished reasoning. + type: boolean + effort: + title: Effort + description: Option for OpenAI models to specify low, medium, or high reasoning + effort. + type: string + type: object + title: ReasoningParams + description: Custom settings that control the model's reasoning behavior. + RegexDetection: + properties: + input: + allOf: + - $ref: '#/components/schemas/RegexDetectionOptions' + description: Configuration for regex patterns to detect on user input. + output: + allOf: + - $ref: '#/components/schemas/RegexDetectionOptions' + description: Configuration for regex patterns to detect on bot output. + retrieval: + allOf: + - $ref: '#/components/schemas/RegexDetectionOptions' + description: Configuration for regex patterns to detect on retrieved relevant + chunks. + type: object + title: RegexDetection + description: Configuration for regex pattern detection. + RegexDetectionOptions: + properties: + patterns: + items: + type: string + type: array + title: Patterns + description: List of regex patterns to match against the text. + case_insensitive: + type: boolean + title: Case Insensitive + description: Whether to perform case-insensitive matching. + default: false + type: object + title: RegexDetectionOptions + description: Configuration options for regex pattern detection on a specific + source. + RegexScoreParser: + properties: + type: + type: string + const: regex + title: Type + default: regex + pattern: + type: string + title: Pattern + description: The regular expression to parse the score from the judge response. + method: + type: string + enum: + - search + - match + title: Method + description: 'The regex method to use: ''search'' looks anywhere in the + string, ''match'' starts from the beginning.' + default: match + type: object + required: + - pattern + title: RegexScoreParser + description: Parse a score from content in any format using regular expression. + RemoteMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + type: + type: string + const: remote + title: Type + default: remote + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + url: + type: string + title: Url + description: The URL of the remote endpoint. + api_key_secret: + allOf: + - $ref: '#/components/schemas/SecretRef' + description: 'Optional secret reference of an API key for authentication. + Format: workspace/secret_name or secret_name within the job workspace.' + timeout_seconds: + type: number + title: Timeout Seconds + description: Request timeout in seconds. + default: 30.0 + max_retries: + type: integer + title: Max Retries + description: Maximum number of retry attempts. + default: 3 + body: + additionalProperties: true + type: object + title: Body + description: Jinja template for request payload + scores: + items: + $ref: '#/components/schemas/RemoteScore' + type: array + title: Scores + description: List of scores to extract from the remote response + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - url + - body + - scores + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: RemoteMetric + description: Persisted Remote metric. + RemoteMetricInput: + properties: + type: + type: string + const: remote + title: Type + default: remote + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + url: + type: string + title: Url + description: The URL of the remote endpoint. + api_key_secret: + allOf: + - $ref: '#/components/schemas/SecretRef' + description: 'Optional secret reference of an API key for authentication. + Format: workspace/secret_name or secret_name within the job workspace.' + timeout_seconds: + type: number + title: Timeout Seconds + description: Request timeout in seconds. + default: 30.0 + max_retries: + type: integer + title: Max Retries + description: Maximum number of retry attempts. + default: 3 + body: + additionalProperties: true + type: object + title: Body + description: Jinja template for request payload + scores: + items: + $ref: '#/components/schemas/RemoteScore' + type: array + title: Scores + description: List of scores to extract from the remote response + type: object + required: + - url + - body + - scores + title: RemoteMetricInput + description: Request type for RemoteMetric. A metric that computes scores via + a remote endpoint. + RemoteMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + const: remote + title: Type + default: remote + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + url: + type: string + title: Url + description: The URL of the remote endpoint. + api_key_secret: + allOf: + - $ref: '#/components/schemas/SecretRef' + description: 'Optional secret reference of an API key for authentication. + Format: workspace/secret_name or secret_name within the job workspace.' + timeout_seconds: + type: number + title: Timeout Seconds + description: Request timeout in seconds. + default: 30.0 + max_retries: + type: integer + title: Max Retries + description: Maximum number of retry attempts. + default: 3 + body: + additionalProperties: true + type: object + title: Body + description: Jinja template for request payload + scores: + items: + $ref: '#/components/schemas/RemoteScore' + type: array + title: Scores + description: List of scores to extract from the remote response + type: object + required: + - url + - body + - scores + title: RemoteMetricResponse + description: Response type for RemoteMetric. + RemoteScore: + properties: + name: + type: string + pattern: ^[a-z0-9_]+$ + title: Name + description: The name of the score. Only lowercase letters, numbers, and + underscores allowed. + description: + title: Description + description: Human-readable description of the score. + type: string + parser: + allOf: + - $ref: '#/components/schemas/JSONScoreParser' + description: The method to parse the score. Only JSON parsing is supported + for remote metrics. + minimum: + anyOf: + - type: number + - type: integer + title: Minimum + description: Minimum value for the score range. Defaults to None (no lower + bound). + maximum: + anyOf: + - type: number + - type: integer + title: Maximum + description: Maximum value for the score range. Defaults to None (no upper + bound). + additionalProperties: false + type: object + required: + - name + title: RemoteScore + description: 'Score configuration for remote metrics. + + + Unlike RangeScore, minimum and maximum are optional (default to None = no + bounds). + + This avoids JSON serialization issues with infinity values.' + ResponseGroundednessMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: response_groundedness + title: Type + default: response_groundedness + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ResponseGroundednessMetric + description: RAGAS metric for measuring response groundedness. + ResponseGroundednessMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: response_groundedness + title: Type + default: response_groundedness + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ResponseGroundednessMetricInput + description: Request type for ResponseGroundedness metrics. + ResponseGroundednessMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: response_groundedness + title: Type + default: response_groundedness + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + required: + - judge_model + title: ResponseGroundednessMetricResponse + description: Response type for ResponseGroundedness metrics. + ResponseRelevancyMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + embeddings_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The embeddings model to use. + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: response_relevancy + title: Type + default: response_relevancy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + strictness: + type: integer + title: Strictness + description: Number of parallel questions generated. NIM can only generate + 1. + default: 1 + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - embeddings_model + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ResponseRelevancyMetric + description: RAGAS metric for measuring response relevancy. + ResponseRelevancyMetricInput: + properties: + embeddings_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Embeddings Model + description: The embeddings model configuration. + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: response_relevancy + title: Type + default: response_relevancy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + strictness: + type: integer + title: Strictness + description: Number of parallel questions generated. NIM can only generate + 1. + default: 1 + type: object + required: + - embeddings_model + - judge_model + title: ResponseRelevancyMetricInput + description: Request type for ResponseRelevancy metrics. + ResponseRelevancyMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + embeddings_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Embeddings Model + description: The embeddings model configuration. + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: response_relevancy + title: Type + default: response_relevancy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + strictness: + type: integer + title: Strictness + description: Number of parallel questions generated. NIM can only generate + 1. + default: 1 + type: object + required: + - embeddings_model + - judge_model + title: ResponseRelevancyMetricResponse + description: Response type for ResponseRelevancy metrics. + RetrievalRails: + properties: + flows: + items: + type: string + type: array + title: Flows + description: The names of all the flows that implement retrieval rails. + type: object + title: RetrievalRails + description: Configuration of retrieval rails. + RetrieverPipelineInput: + properties: + embeddings_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Embeddings Model + description: The embeddings model configuration. + additionalProperties: false + type: object + required: + - embeddings_model + title: RetrieverPipelineInput + description: Pipeline configuration for retriever-based evaluations. + RoleBinding: + properties: + id: + type: string + title: Id + name: + type: string + title: Name + principal: + type: string + title: Principal + workspace: + title: Workspace + type: string + role: + type: string + title: Role + granted_by: + type: string + title: Granted By + granted_at: + type: string + format: date-time + title: Granted At + revoked_at: + title: Revoked At + type: string + format: date-time + type: object + required: + - id + - name + - principal + - workspace + - role + - granted_by + - granted_at + - revoked_at + title: RoleBinding + description: Role binding response model. + RoleBindingFilter: + additionalProperties: false + description: Filter for role bindings. + properties: + principal: + description: Filter by principal ID + title: Principal + type: string + workspace: + description: Filter by workspace + title: Workspace + type: string + role: + description: Filter by role + title: Role + type: string + granted_by: + description: Filter by who granted the role + title: Granted By + type: string + is_active: + description: Filter for active (True) or revoked (False) bindings + title: Is Active + type: boolean + granted_at: + allOf: + - $ref: '#/components/schemas/DateRangeFilter' + description: Filter by granted date range + revoked_at: + allOf: + - $ref: '#/components/schemas/DateRangeFilter' + description: Filter by revoked date range + title: RoleBindingFilter + type: object + RoleBindingInput: + properties: + principal: + type: string + title: Principal + description: The principal identifier (email, user ID, or group ID) + workspace: + title: Workspace + description: The workspace this binding applies to. None for platform-level + roles. + type: string + role: + type: string + title: Role + description: The role name (e.g., 'Viewer', 'Editor', 'Admin') + type: object + required: + - principal + - role + title: RoleBindingInput + description: Input schema for creating a role binding. + RoleBindingsPage: + properties: + data: + items: + $ref: '#/components/schemas/RoleBinding' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: RoleBindingsPage + RowScore: + properties: + row_index: + title: Row Index + description: Stable row position used for result alignment. + type: integer + minimum: 0.0 + item: + additionalProperties: true + type: object + title: Item + description: Input item metadata for the evaluated row. + sample: + additionalProperties: true + type: object + title: Sample + description: Sample output payload for the evaluated row. + metrics: + additionalProperties: + items: + $ref: '#/components/schemas/MetricOutput' + type: array + type: object + title: Metrics + description: Metric-level row outputs by metric key. + requests: + items: + additionalProperties: true + type: object + type: array + title: Requests + description: Request details captured during evaluation. + metric_errors: + title: Metric Errors + description: Full row-level error text keyed by metric for summary rendering. + additionalProperties: + type: string + type: object + additionalProperties: true + type: object + required: + - item + - sample + - metrics + - requests + title: RowScore + description: Normalized row-level score payload for metric/benchmark job results. + Rubric: + properties: + label: + type: string + title: Label + description: The label to use for the level of the rubric grading criteria. + (e.g., "helpful", "not_helpful", "positive") + description: + title: Description + description: Describe the semantic meaning of each criteria for the given + rubric. If no judge template is set, the input description for labels + are included in the generated judge prompt. + type: string + value: + anyOf: + - type: number + - type: integer + title: Value + description: The score value to assign for the criteria used for aggregation + and ranking. + additionalProperties: false + type: object + required: + - label + - value + title: Rubric + RubricScore: + properties: + name: + type: string + pattern: ^[a-z0-9_]+$ + title: Name + description: The name of the score. Only lowercase letters, numbers, and + underscores allowed. + description: + title: Description + description: Human-readable description of the score. + type: string + parser: + anyOf: + - $ref: '#/components/schemas/JSONScoreParser' + - $ref: '#/components/schemas/RegexScoreParser' + title: Parser + description: The method to parse the score. When used with llm-judge metric, + and no parser is set, JSONScoreParser is the default parser inferred from + the score parameters. + rubric: + items: + $ref: '#/components/schemas/Rubric' + type: array + minItems: 2 + title: Rubric + description: The rubric for the score. + additionalProperties: false + type: object + required: + - name + - rubric + title: RubricScore + description: Score definition for a rubric with optional parser. If no parser + is set, JSONScoreParser is the default parser inferred from the score parameters + RubricScoreStat: + properties: + label: + type: string + title: Label + description: The label to use for the level of the rubric grading criteria. + description: + title: Description + description: Describe the semantic meaning of each criteria for the given + rubric. + type: string + value: + anyOf: + - type: number + - type: integer + title: Value + description: The score value to assign for the criteria. + count: + type: integer + title: Count + description: The number of samples evaluated with the rubric level. + default: 0 + type: object + required: + - label + - value + title: RubricScoreStat + description: Rubric score with count statistics. + RunConfig: + properties: + parallelism: + type: integer + minimum: 1.0 + title: Parallelism + description: Parallelism to be used for the evaluation job. Typically, this + represents the maximum number of concurrent requests made to the model. + default: 8 + limit_samples: + title: Limit Samples + description: Limit number of evaluation samples, taking the first `limit` + samples from the dataset. + type: integer + minimum: 1.0 + additionalProperties: false + type: object + title: RunConfig + description: Job parameters. + RunConfigOnline: + properties: + parallelism: + type: integer + minimum: 1.0 + title: Parallelism + description: Parallelism to be used for the evaluation job. Typically, this + represents the maximum number of concurrent requests made to the model. + default: 8 + limit_samples: + title: Limit Samples + description: Limit number of evaluation samples, taking the first `limit` + samples from the dataset. + type: integer + minimum: 1.0 + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures will be ignored and the result will + be marked as NaN. If False (default), request failures will raise an exception. + default: false + request_timeout: + title: Request Timeout + description: The timeout to be used for requests made to the model. + type: integer + max_retries: + type: integer + minimum: 0.0 + title: Max Retries + description: Maximum number of retries for failed requests. + default: 3 + additionalProperties: false + type: object + title: RunConfigOnline + description: Job parameters for online evaluation. + RunConfigOnlineModel: + properties: + parallelism: + type: integer + minimum: 1.0 + title: Parallelism + description: Parallelism to be used for the evaluation job. Typically, this + represents the maximum number of concurrent requests made to the model. + default: 8 + limit_samples: + title: Limit Samples + description: Limit number of evaluation samples, taking the first `limit` + samples from the dataset. + type: integer + minimum: 1.0 + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures will be ignored and the result will + be marked as NaN. If False (default), request failures will raise an exception. + default: false + request_timeout: + title: Request Timeout + description: The timeout to be used for requests made to the model. + type: integer + max_retries: + type: integer + minimum: 0.0 + title: Max Retries + description: Maximum number of retries for failed requests. + default: 3 + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Custom settings that control the model's text generation behavior. + system_prompt: + title: System Prompt + description: Initial instructions that define the model's role and behavior + for the conversation. + type: string + reasoning: + allOf: + - $ref: '#/components/schemas/ReasoningParams' + description: Custom settings that control the model's reasoning behavior. + structured_output: + title: Structured Output + description: JSON schema to apply structured output for the model. + additionalProperties: true + type: object + additionalProperties: false + type: object + title: RunConfigOnlineModel + description: Job parameters for model online evaluation. + S3StorageConfig: + properties: + read_chunk_size: + type: integer + title: Read Chunk Size + description: 'Chunk size in bytes for reading/streaming files. Larger chunks + reduce async overhead but increase memory per concurrent download. Default: + 1MB.' + default: 1048576 + type: + type: string + const: s3 + title: Type + default: s3 + bucket: + type: string + title: Bucket + description: S3 bucket name + prefix: + type: string + title: Prefix + description: Optional prefix (folder path) within the bucket. All operations + will be relative to this prefix. + default: '' + region: + title: Region + description: AWS region. If not specified, uses SDK default (env vars, instance + metadata, etc.) + type: string + endpoint_url: + title: Endpoint Url + description: Custom endpoint URL for S3-compatible storage (e.g., MinIO, + Garage, RustFS). If not specified, uses AWS S3. + type: string + use_sdk_auth: + type: boolean + title: Use Sdk Auth + description: Use AWS SDK credential chain for authentication (env vars like + AWS_ACCESS_KEY_ID, IAM roles, instance profiles, etc.). This option is + only available for the platform's default storage backend. User-provided + S3 storage must use explicit credentials via access_key_id_secret and + secret_access_key_secret. + default: false + access_key_id_secret: + allOf: + - $ref: '#/components/schemas/SecretRef' + description: Secret reference for AWS access key ID. Requires use_sdk_auth=False. + secret_access_key_secret: + allOf: + - $ref: '#/components/schemas/SecretRef' + description: Secret reference for AWS secret access key. Requires use_sdk_auth=False. + signature_version: + type: string + enum: + - s3v4 + - s3 + title: Signature Version + description: AWS signature version for request signing. Use 's3' for legacy + systems that only support signature v2. + default: s3v4 + type: object + required: + - bucket + title: S3StorageConfig + SecretRef: + type: string + pattern: ^[a-z0-9_-]+(/[a-z0-9_-]+)?$ + title: SecretRef + description: 'Reference to a secret. Format: ''secret_name'' (uses request workspace) + or ''workspace/secret_name'' (explicit workspace).' + SensitiveDataDetection: + properties: + recognizers: + title: Recognizers + description: Additional custom recognizers. Check out https://microsoft.github.io/presidio/tutorial/08_no_code/ + for more details. + items: + additionalProperties: true + type: object + type: array + input: + allOf: + - $ref: '#/components/schemas/SensitiveDataDetectionOptions' + description: Configuration of the entities to be detected on the user input. + output: + allOf: + - $ref: '#/components/schemas/SensitiveDataDetectionOptions' + description: Configuration of the entities to be detected on the bot output. + retrieval: + allOf: + - $ref: '#/components/schemas/SensitiveDataDetectionOptions' + description: Configuration of the entities to be detected on retrieved relevant + chunks. + type: object + title: SensitiveDataDetection + description: Configuration of what sensitive data should be detected. + SensitiveDataDetectionOptions: + properties: + entities: + items: + type: string + type: array + title: Entities + description: The list of entities that should be detected. Check out https://microsoft.github.io/presidio/supported_entities/ + forthe list of supported entities. + mask_token: + type: string + title: Mask Token + description: The token that should be used to mask the sensitive data. + default: '*' + score_threshold: + type: number + title: Score Threshold + description: The score threshold that should be used to detect the sensitive + data. + default: 0.2 + type: object + title: SensitiveDataDetectionOptions + ServedModelMapping: + properties: + model_entity_id: + type: string + maxLength: 255 + title: Model Entity Id + description: Model Entity identifier as workspace/name (e.g., 'my-ws/my-model') + served_model_name: + type: string + maxLength: 255 + title: Served Model Name + description: The actual model name to send to the backend endpoint in the + 'model' field + type: object + required: + - model_entity_id + - served_model_name + title: ServedModelMapping + description: Mapping between a Model Entity and how it's served by this provider. + SingleCallConfig: + properties: + enabled: + type: boolean + title: Enabled + default: false + fallback_to_multiple_calls: + type: boolean + title: Fallback To Multiple Calls + description: Whether to fall back to multiple calls if a single call is + not possible. + default: true + type: object + title: SingleCallConfig + description: Configuration for the single LLM call option for topical rails. + SlidingWindowConfig: + properties: + window_size: + type: integer + title: Window Size + description: Sliding window size (attends to last N tokens) + type: object + required: + - window_size + title: SlidingWindowConfig + description: Sliding window attention configuration. + Span: + properties: + span_id: + type: string + title: Span Id + session_id: + type: string + title: Session Id + workspace: + type: string + title: Workspace + project: + title: Project + type: string + evaluation_context: + $ref: '#/components/schemas/SpanEvaluationContext' + parent_span_id: + title: Parent Span Id + type: string + kind: + $ref: '#/components/schemas/SpanKind' + name: + title: Name + type: string + source: + type: string + title: Source + trace_id: + title: Trace Id + type: string + started_at: + type: string + format: date-time + title: Started At + ended_at: + title: Ended At + type: string + format: date-time + status: + $ref: '#/components/schemas/SpanStatus' + error_type: + title: Error Type + type: string + error_message: + title: Error Message + type: string + provider: + title: Provider + type: string + model: + title: Model + type: string + prompt_id: + title: Prompt Id + type: string + prompt_name: + title: Prompt Name + type: string + prompt_version: + title: Prompt Version + type: string + agent_id: + title: Agent Id + type: string + agent_name: + title: Agent Name + type: string + tool_name: + title: Tool Name + type: string + input_tokens: + title: Input Tokens + type: integer + minimum: 0.0 + output_tokens: + title: Output Tokens + type: integer + minimum: 0.0 + cached_tokens: + title: Cached Tokens + type: integer + minimum: 0.0 + total_tokens: + title: Total Tokens + type: integer + minimum: 0.0 + usage_details: + additionalProperties: + type: integer + type: object + title: Usage Details + cost_total_usd: + title: Cost Total Usd + type: number + cost_input_usd: + title: Cost Input Usd + type: number + cost_output_usd: + title: Cost Output Usd + type: number + cost_details: + additionalProperties: + type: number + type: object + title: Cost Details + input: + title: Input + type: string + output: + title: Output + type: string + raw_attributes: + title: Raw Attributes + type: string + ingested_at: + type: string + format: date-time + title: Ingested At + type: object + required: + - span_id + - session_id + - workspace + - kind + - source + - started_at + - status + - ingested_at + title: Span + SpanEvaluationContext: + properties: + evaluation_id: + title: Evaluation Id + type: string + evaluation_sha: + title: Evaluation Sha + type: string + evaluation_run_id: + title: Evaluation Run Id + type: string + dataset_id: + title: Dataset Id + type: string + dataset_name: + title: Dataset Name + type: string + dataset_version: + title: Dataset Version + type: string + test_case_id: + title: Test Case Id + type: string + metadata: + additionalProperties: true + type: object + title: Metadata + additionalProperties: false + type: object + title: SpanEvaluationContext + SpanFilter: + properties: + session_id: + description: Filter by span session id. + title: Session Id + type: string + trace_id: + description: Filter by canonical trace id. + title: Trace Id + type: string + project: + description: Filter by project name. + title: Project + type: string + evaluation_id: + description: Filter by evaluation id. + title: Evaluation Id + type: string + evaluation_sha: + description: Filter by evaluation sha. + title: Evaluation Sha + type: string + evaluation_run_id: + description: Filter by evaluation run id. ATIF evaluation context is stored + on root trajectory spans; use session_id from a matched root to fetch + the full trace tree. + title: Evaluation Run Id + type: string + dataset_id: + description: Filter by dataset id. + title: Dataset Id + type: string + dataset_name: + description: Filter by dataset name. + title: Dataset Name + type: string + dataset_version: + description: Filter by dataset version. + title: Dataset Version + type: string + test_case_id: + description: Filter by dataset test case id. + title: Test Case Id + type: string + source: + description: Filter by ingest source (e.g. 'otel', 'atif', 'chat_completions'). + title: Source + type: string + kind: + allOf: + - $ref: '#/components/schemas/SpanKind' + description: Filter by normalized span kind. + status: + allOf: + - $ref: '#/components/schemas/SpanStatus' + description: Filter by normalized span status. + model: + description: Filter by model name. + title: Model + type: string + tool_name: + description: Filter by tool name. + title: Tool Name + type: string + provider: + description: Filter by provider (e.g. 'openai', 'nim', 'anthropic'). + title: Provider + type: string + agent_id: + description: Filter by agent identifier. + title: Agent Id + type: string + agent_name: + description: Filter by agent application name (e.g. 'claude-code', 'codex'). + title: Agent Name + type: string + prompt_name: + description: Filter by prompt template name. + title: Prompt Name + type: string + prompt_version: + description: Filter by prompt template version. + title: Prompt Version + type: string + parent_span_id: + description: Filter by parent span id. Use to fetch direct children of a + span. + title: Parent Span Id + type: string + started_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter by span start timestamp. + title: SpanFilter + type: object + SpanKind: + type: string + enum: + - LLM + - CHAIN + - TOOL + - RETRIEVER + - EMBEDDING + - AGENT + - RERANKER + - EVALUATOR + - GUARDRAIL + - UNKNOWN + title: SpanKind + SpanSortField: + type: string + enum: + - started_at + - -started_at + title: SpanSortField + SpanStatus: + type: string + enum: + - success + - error + - cancelled + - unknown + title: SpanStatus + SpansPage: + properties: + data: + items: + $ref: '#/components/schemas/Span' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: SpansPage + StatusEnum: + type: string + enum: + - blocked + - success + - unknown + title: StatusEnum + StepLifecycle: + properties: + staleness_timeout_seconds: + type: integer + title: Staleness Timeout Seconds + description: If every active task in the step goes this many seconds without + an update, the step is terminated. A value of 0 disables staleness detection. + default: 0 + type: object + title: StepLifecycle + description: 'Controller-level lifecycle configuration for a job step. + + + These settings control how the jobs controller manages the step, + + as opposed to ``config`` which is the task payload forwarded to + + the container.' + StorageConfigType: + enum: + - local + - ngc + - huggingface + - s3 + title: StorageConfigType + type: string + StringCheckMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + type: + type: string + const: string-check + title: Type + default: string-check + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + operation: + type: string + enum: + - equals + - == + - '!=' + - <> + - not equals + - contains + - not contains + - startswith + - endswith + title: Operation + description: The operation to compute for the metric. + left_template: + type: string + title: Left Template + description: The template to use for rendering the left value of the operator + to compute the metric. + examples: + - '{{item.dataset_column_name}}' + right_template: + type: string + title: Right Template + description: The template to use for rendering the right value of the operator + to compute the metric. + examples: + - '{{sample.output_text | trim}}' + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - operation + - left_template + - right_template + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: StringCheckMetric + description: Persisted string check metric. + StringCheckMetricInput: + properties: + type: + type: string + const: string-check + title: Type + default: string-check + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + operation: + type: string + enum: + - equals + - == + - '!=' + - <> + - not equals + - contains + - not contains + - startswith + - endswith + title: Operation + description: The operation to compute for the metric. + left_template: + type: string + title: Left Template + description: The template to use for rendering the left value of the operator + to compute the metric. + examples: + - '{{item.dataset_column_name}}' + right_template: + type: string + title: Right Template + description: The template to use for rendering the right value of the operator + to compute the metric. + examples: + - '{{sample.output_text | trim}}' + type: object + required: + - operation + - left_template + - right_template + title: StringCheckMetricInput + description: Request type for StringCheckMetric. String-comparison metric with + operator-based checks. + StringCheckMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + const: string-check + title: Type + default: string-check + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + operation: + type: string + enum: + - equals + - == + - '!=' + - <> + - not equals + - contains + - not contains + - startswith + - endswith + title: Operation + description: The operation to compute for the metric. + left_template: + type: string + title: Left Template + description: The template to use for rendering the left value of the operator + to compute the metric. + examples: + - '{{item.dataset_column_name}}' + right_template: + type: string + title: Right Template + description: The template to use for rendering the right value of the operator + to compute the metric. + examples: + - '{{sample.output_text | trim}}' + type: object + required: + - operation + - left_template + - right_template + title: StringCheckMetricResponse + description: Response type for StringCheckMetric. + SubprocessExecutionProvider: + properties: + provider: + type: string + const: subprocess + title: Provider + default: subprocess + profile: + type: string + title: Profile + default: default + command: + items: + type: string + type: array + title: Command + type: object + required: + - command + title: SubprocessExecutionProvider + description: Host subprocess execution provider. + SubprocessJobExecutionProfile: + properties: + provider: + type: string + const: subprocess + title: Provider + default: subprocess + profile: + type: string + title: Profile + description: The profile name for the executor, e.g., high_priority_a100, + low_priority, etc. + default: default + backend: + type: string + const: subprocess + title: Backend + default: subprocess + config: + allOf: + - $ref: '#/components/schemas/SubprocessJobExecutionProfileConfig' + description: Additional configuration for the subprocess executor + type: object + title: SubprocessJobExecutionProfile + SubprocessJobExecutionProfileConfig: + properties: + ttl_seconds_before_active: + type: integer + title: Ttl Seconds Before Active + default: 1800 + ttl_seconds_active: + type: integer + title: Ttl Seconds Active + default: 86400 + ttl_seconds_after_finished: + type: integer + title: Ttl Seconds After Finished + default: 3600 + cleanup_completed_jobs_immediately: + type: boolean + title: Cleanup Completed Jobs Immediately + description: Keep subprocess working directories by default so runs remain + inspectable. + default: false + launcher_tool_path: + type: string + title: Launcher Tool Path + description: Path to the jobs launcher tool + default: /tools/jobs-launcher + env: + additionalProperties: + type: string + type: object + title: Env + description: Optional env vars applied to all jobs (e.g. HOME=/tmp). Keys + must not conflict with platform-reserved names. Job steps may override + these variables. + working_directory: + type: string + title: Working Directory + description: Root directory for subprocess job state, config, storage, and + logs. + default: /tmp/nmp-subprocess-jobs + graceful_shutdown_timeout_seconds: + type: integer + title: Graceful Shutdown Timeout Seconds + description: How long to wait after SIGTERM before force killing the process + group. + default: 10 + type: object + title: SubprocessJobExecutionProfileConfig + SystemBenchmark: + properties: + name: + type: string + title: Name + description: Benchmark name + workspace: + type: string + title: Workspace + default: system + project: + title: Project + description: The name of the project associated with this entity. + type: string + description: + title: Description + description: Human-readable description of the benchmark. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + required_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Required Params + description: List of required parameters for running an evaluation with + the benchmark. + optional_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Optional Params + description: List of required parameters for running an evaluation with + the benchmark. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A benchmark can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - name + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: SystemBenchmark + description: System Benchmark response schema. + SystemBenchmarkOfflineJob: + properties: + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: 'Reference to the benchmark for evaluation (format: workspace/name).' + dataset: + allOf: + - $ref: '#/components/schemas/FilesetRef' + description: 'Reference to a Fileset in the Files API (format: workspace/fileset-name). + The fileset contains the pre-generated outputs to evaluate this benchmark + on.' + params: + allOf: + - $ref: '#/components/schemas/RunConfig' + description: Execution parameters for the benchmark job. + benchmark_params: + additionalProperties: true + type: object + title: Benchmark Params + description: Additional parameters specific to the benchmark. + additionalProperties: false + type: object + required: + - benchmark + - dataset + title: SystemBenchmarkOfflineJob + description: 'Input for an offline system benchmark evaluation job. + + + Evaluates the benchmark''s standard dataset against all pre-defined metrics + in the benchmark.' + SystemBenchmarkOnlineJob: + properties: + benchmark: + allOf: + - $ref: '#/components/schemas/BenchmarkRef' + description: 'Reference to the benchmark for evaluation (format: workspace/name).' + model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Model + description: The model to evaluate. + params: + allOf: + - $ref: '#/components/schemas/RunConfigOnlineModel' + description: Execution parameters for the benchmark job. + benchmark_params: + additionalProperties: true + type: object + title: Benchmark Params + description: Additional parameters specific to the benchmark. + additionalProperties: false + type: object + required: + - benchmark + - model + title: SystemBenchmarkOnlineJob + description: 'Input for an online system benchmark evaluation job. + + + Evaluates the benchmark''s standard dataset against all pre-defined metrics + in the benchmark.' + SystemMetric: + properties: + name: + type: string + title: Name + default: Metric name + workspace: + type: string + title: Workspace + default: system + project: + title: Project + description: The name of the project associated with this entity. + type: string + type: + type: string + enum: + - system + - system-retriever + title: Type + default: system + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + - retriever + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + required_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Required Params + description: List of required parameters for running an evaluation with + the metric. + optional_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Optional Params + description: List of optional parameters for running an evaluation with + the metric. + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: SystemMetric + SystemMetricInput: + properties: + type: + type: string + enum: + - system + - system-retriever + title: Type + default: system + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + - retriever + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + name: + type: string + title: Name + default: Metric name + required_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Required Params + description: List of required parameters for running an evaluation with + the metric. + optional_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Optional Params + description: List of optional parameters for running an evaluation with + the metric. + type: object + title: SystemMetricInput + description: Metric entity for system metric that have pre-defined dataset. + SystemMetricResponse: + properties: + name: + type: string + title: Name + default: Metric name + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + enum: + - system + - system-retriever + title: Type + default: system + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + - retriever + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + required_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Required Params + description: List of required parameters for running an evaluation with + the metric. + optional_params: + items: + $ref: '#/components/schemas/Parameter' + type: array + title: Optional Params + description: List of optional parameters for running an evaluation with + the metric. + type: object + title: SystemMetricResponse + description: Response type for SystemMetric. + TaskPrompt: + properties: + task: + type: string + title: Task + description: The id of the task associated with this prompt. + content: + title: Content + description: The content of the prompt, if it's a string. + type: string + messages: + title: Messages + description: The list of messages included in the prompt. Used for chat + models. + items: + anyOf: + - $ref: '#/components/schemas/MessageTemplate' + - type: string + type: array + models: + title: Models + description: 'If specified, the prompt will be used only for the given LLM + engines/models. The format is a list of strings with the format: + or /.' + items: + type: string + type: array + output_parser: + title: Output Parser + description: The name of the output parser to use for this prompt. + type: string + max_length: + title: Max Length + description: The maximum length of the prompt in number of characters. + default: 16000 + type: integer + minimum: 1.0 + mode: + title: Mode + description: Corresponds to the `prompting_mode` for which this prompt is + fetched. Default is 'standard'. + default: standard + type: string + stop: + title: Stop + description: If specified, will be configure stop tokens for models that + support this. + items: + type: string + type: array + max_tokens: + title: Max Tokens + description: The maximum number of tokens that can be generated in the chat + completion. + type: integer + minimum: 1.0 + type: object + required: + - task + title: TaskPrompt + description: Configuration for prompts that will be used for a specific task. + ToolCallAccuracyMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + type: + type: string + const: tool_call_accuracy + title: Type + default: tool_call_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ToolCallAccuracyMetric + description: RAGAS metric for measuring tool call accuracy. + ToolCallAccuracyMetricInput: + properties: + type: + type: string + const: tool_call_accuracy + title: Type + default: tool_call_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + title: ToolCallAccuracyMetricInput + description: Request type for ToolCallAccuracy metrics (no judge required). + ToolCallAccuracyMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + const: tool_call_accuracy + title: Type + default: tool_call_accuracy + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + type: object + title: ToolCallAccuracyMetricResponse + description: Response type for ToolCallAccuracy metrics (no judge required). + ToolCallConfig: + properties: + tool_call_parser: + title: Tool Call Parser + description: Name of the tool call parser to use (e.g., 'openai', 'hermes', + 'pythonic', 'llama3_json', 'mistral'). + type: string + maxLength: 255 + tool_call_plugin: + title: Tool Call Plugin + description: 'Reference to a fileset containing the custom tool call plugin + Python file. Expected format: ''{workspace}/{fileset_name}''. The fileset + is mounted separately from the model checkpoint at deployment time.' + type: string + maxLength: 255 + auto_tool_choice: + title: Auto Tool Choice + description: Whether to enable automatic tool choice. When enabled, the + model can decide to call tools without explicit user instruction. + type: boolean + type: object + title: ToolCallConfig + description: Configuration for tool calling support in NIM deployments. + ToolCallingMetadataContent: + properties: + chat_template: + title: Chat Template + description: Jinja2 chat template for the model. + type: string + tool_call_parser: + title: Tool Call Parser + description: Name of the tool call parser (e.g., 'openai', 'hermes', 'pythonic', + 'llama3_json', 'mistral'). + type: string + tool_call_plugin: + title: Tool Call Plugin + description: 'Reference to a fileset containing a custom tool call plugin + Python file. Expected format: ''{workspace}/{fileset_name}''.' + type: string + auto_tool_choice: + title: Auto Tool Choice + description: Whether to enable automatic tool choice. + type: boolean + type: object + title: ToolCallingMetadataContent + description: 'Content for tool-calling configuration on model filesets. + + + Stores chat template and tool calling settings that are merged into + + the ModelSpec during checkpoint analysis.' + ToolCallingMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + type: + type: string + const: tool-calling + title: Type + default: tool-calling + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + reference: + type: string + title: Reference + description: The template for the ground truth reference to evaluate tool + calling accuracy. + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - reference + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: ToolCallingMetric + description: Persisted Tool Calling metric. + ToolCallingMetricInput: + properties: + type: + type: string + const: tool-calling + title: Type + default: tool-calling + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + reference: + type: string + title: Reference + description: The template for the ground truth reference to evaluate tool + calling accuracy. + type: object + required: + - reference + title: ToolCallingMetricInput + description: Request type for ToolCallingMetric. Tool-calling accuracy metric + for structured function calls. + ToolCallingMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + type: + type: string + const: tool-calling + title: Type + default: tool-calling + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + reference: + type: string + title: Reference + description: The template for the ground truth reference to evaluate tool + calling accuracy. + type: object + required: + - reference + title: ToolCallingMetricResponse + description: Response type for ToolCallingMetric. + ToolInputRails: + properties: + flows: + items: + type: string + type: array + title: Flows + description: The names of all the flows that implement tool input rails. + parallel: + title: Parallel + description: If True, the tool input rails are executed in parallel. + default: false + type: boolean + type: object + title: ToolInputRails + description: 'Configuration of tool input rails. + + Tool input rails are applied to tool results before they are processed. + + They can validate, filter, or transform tool outputs for security and safety.' + ToolOutputRails: + properties: + flows: + items: + type: string + type: array + title: Flows + description: The names of all the flows that implement tool output rails. + parallel: + title: Parallel + description: If True, the tool output rails are executed in parallel. + default: false + type: boolean + type: object + title: ToolOutputRails + description: 'Configuration of tool output rails. + + Tool output rails are applied to tool calls before they are executed. + + They can validate tool names, parameters, and context to ensure safe tool + usage.' + TopicAdherenceMetric: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + judge_model: + allOf: + - $ref: '#/components/schemas/Evaluator.Model' + description: The LLM model to use as judge. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: topic_adherence + title: Type + default: topic_adherence + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + metric_mode: + type: string + enum: + - f1 + - precision + - recall + title: Metric Mode + description: The mode for computing topic adherence score. + default: f1 + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - judge_model + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: TopicAdherenceMetric + description: RAGAS metric for measuring topic adherence. + TopicAdherenceMetricInput: + properties: + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: topic_adherence + title: Type + default: topic_adherence + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + metric_mode: + type: string + enum: + - f1 + - precision + - recall + title: Metric Mode + description: The mode for computing topic adherence score. + default: f1 + type: object + required: + - judge_model + title: TopicAdherenceMetricInput + description: Request type for TopicAdherence metrics. + TopicAdherenceMetricResponse: + properties: + name: + title: Name + description: Entity name within the workspace + type: string + workspace: + title: Workspace + description: Workspace identifier + type: string + project: + title: Project + description: The name of the project associated with this entity. + type: string + id: + title: Id + description: Entity name within the workspace + type: string + created_at: + title: Created At + type: string + format: date-time + updated_at: + title: Updated At + type: string + format: date-time + parent: + title: Parent + type: string + judge_model: + anyOf: + - $ref: '#/components/schemas/Evaluator.Model' + - $ref: '#/components/schemas/ModelRef' + title: Judge Model + description: The judge model configuration. + inference: + allOf: + - $ref: '#/components/schemas/InferenceParams' + description: Inference parameters for the judge. + ignore_request_failure: + type: boolean + title: Ignore Request Failure + description: If True, request failures to the judge model are ignored and + the metric result is marked as NaN. Parse/output formatting failures are + always converted to NaN. + default: false + type: + type: string + const: topic_adherence + title: Type + default: topic_adherence + description: + title: Description + description: Human-readable description of the metric. + type: string + labels: + additionalProperties: + type: string + type: object + title: Labels + description: Labels are key-value pairs that can be used for grouping and + filtering. + supported_job_types: + items: + type: string + enum: + - online + - offline + type: array + title: Supported Job Types + description: A metric can evaluate model outputs for online evaluations + or pre-generated outputs for offline evaluations. + default: + - online + - offline + input_template: + title: Input Template + description: Optional Jinja template for rendering the input payload for + RAGAS evaluation. + additionalProperties: true + type: object + metric_mode: + type: string + enum: + - f1 + - precision + - recall + title: Metric Mode + description: The mode for computing topic adherence score. + default: f1 + type: object + required: + - judge_model + title: TopicAdherenceMetricResponse + description: Response type for TopicAdherence metrics. + Trace: + properties: + id: + type: string + title: Id + root_span_id: + title: Root Span Id + type: string + session_id: + type: string + title: Session Id + workspace: + type: string + title: Workspace + name: + title: Name + type: string + evaluation_context: + $ref: '#/components/schemas/SpanEvaluationContext' + started_at: + type: string + format: date-time + title: Started At + ended_at: + title: Ended At + type: string + format: date-time + duration_ms: + title: Duration Ms + type: number + status: + $ref: '#/components/schemas/SpanStatus' + input_tokens: + title: Input Tokens + type: integer + minimum: 0.0 + output_tokens: + title: Output Tokens + type: integer + minimum: 0.0 + cached_tokens: + title: Cached Tokens + type: integer + minimum: 0.0 + total_tokens: + title: Total Tokens + type: integer + minimum: 0.0 + cost_usd: + title: Cost Usd + type: number + cost_input_usd: + title: Cost Input Usd + type: number + cost_output_usd: + title: Cost Output Usd + type: number + span_count: + title: Span Count + type: integer + minimum: 0.0 + error_count: + title: Error Count + type: integer + minimum: 0.0 + type: object + required: + - id + - session_id + - workspace + - started_at + - status + title: Trace + TraceFilter: + properties: + id: + description: Filter by canonical Intake trace id. + title: Id + type: string + session_id: + description: Filter by session id. + title: Session Id + type: string + status: + allOf: + - $ref: '#/components/schemas/SpanStatus' + description: Filter by rolled-up trace status. + started_at: + allOf: + - $ref: '#/components/schemas/DatetimeFilter' + description: Filter by root span start timestamp. + evaluation_id: + description: Filter by root-span evaluation id. + title: Evaluation Id + type: string + evaluation_sha: + description: Filter by root-span evaluation sha. + title: Evaluation Sha + type: string + evaluation_run_id: + description: Filter by root-span evaluation run id. + title: Evaluation Run Id + type: string + dataset_id: + description: Filter by root-span dataset id. + title: Dataset Id + type: string + dataset_name: + description: Filter by root-span dataset name. + title: Dataset Name + type: string + dataset_version: + description: Filter by root-span dataset version. + title: Dataset Version + type: string + test_case_id: + description: Filter by root-span dataset test case id. + title: Test Case Id + type: string + title: TraceFilter + type: object + TraceSortField: + type: string + enum: + - started_at + - -started_at + title: TraceSortField + TracesPage: + properties: + data: + items: + $ref: '#/components/schemas/Trace' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: TracesPage + TracingConfig: + properties: + enabled: + type: boolean + title: Enabled + default: false + adapters: + items: + $ref: '#/components/schemas/LogAdapterConfig' + type: array + title: Adapters + description: The list of tracing adapters to use. If not specified, the + default adapters are used. + span_format: + type: string + title: Span Format + description: The span format to use. Options are 'legacy' (simple metrics) + or 'opentelemetry' (OpenTelemetry semantic conventions). + default: opentelemetry + enable_content_capture: + type: boolean + title: Enable Content Capture + description: 'Capture prompts and responses (user/assistant/tool message + content) in tracing/telemetry events. Disabled by default for privacy + and alignment with OpenTelemetry GenAI semantic conventions. WARNING: + Enabling this may include PII and sensitive data in your telemetry backend.' + default: false + type: object + title: TracingConfig + TrendMicroRailConfig: + properties: + v1_url: + type: string + title: V1 Url + description: 'The endpoint for the Trend Micro AI Guard API. For other regions, + use: https://api.{region}.xdr.trendmicro.com/v3.0/aiSecurity/applyGuardrails + where region is eu, jp, au, in, sg, or mea.' + default: https://api.xdr.trendmicro.com/v3.0/aiSecurity/applyGuardrails + api_key_env_var: + title: Api Key Env Var + description: Environment variable containing API key for Trend Micro AI + Guard + type: string + application_name: + type: string + maxLength: 64 + pattern: ^[a-zA-Z0-9_-]+$ + title: Application Name + description: Application name for TMV1-Application-Name header (REQUIRED). + Must contain only letters, numbers, hyphens, and underscores, with a maximum + length of 64 characters. + default: nemo-guardrails + detailed_response: + type: boolean + title: Detailed Response + description: 'If True, returns detailed AI Guard results with confidence + scores (Prefer: return=representation). If False, returns minimal response + with only action and reasons (Prefer: return=minimal).' + default: false + type: object + title: TrendMicroRailConfig + description: Configuration data for the Trend Micro AI Guard API + UpdateAdapterRequest: + properties: + description: + title: Description + description: Optional description of the adapter + type: string + maxLength: 1000 + enabled: + title: Enabled + description: Whether to make this adapter available for inference post training + type: boolean + fileset: + title: Fileset + description: Updated fileset for the adapter + type: string + type: object + title: UpdateAdapterRequest + description: Request model for updating Adapter Sub Entity metadata. + UpdateFilesetRequest: + properties: + description: + title: Description + description: The description of the fileset. + type: string + maxLength: 255 + project: + title: Project + description: The name of the project associated with this fileset. + type: string + purpose: + allOf: + - $ref: '#/components/schemas/FilesetPurpose' + description: The purpose of the fileset. + metadata: + allOf: + - $ref: '#/components/schemas/FilesetMetadataInput' + description: 'Purpose-specific metadata. Use the purpose as the key (e.g., + {dataset: {...}}).' + custom_fields: + title: Custom Fields + description: Custom fields for the fileset. + additionalProperties: true + type: object + type: object + title: UpdateFilesetRequest + UpdateModelDeploymentConfigRequest: + properties: + description: + title: Description + description: Optional description of the deployment configuration + type: string + maxLength: 1000 + nim_deployment: + allOf: + - $ref: '#/components/schemas/NIMDeployment' + description: Configuration for NIM-based deployment + model_entity_id: + title: Model Entity Id + description: Optional reference to the base model entity ID for this deployment + type: string + maxLength: 255 + type: object + required: + - nim_deployment + title: UpdateModelDeploymentConfigRequest + description: Request model for updating a ModelDeploymentConfig (creates new + version). + UpdateModelDeploymentRequest: + properties: + config: + type: string + maxLength: 255 + title: Config + description: Reference to the ModelDeploymentConfig name + config_version: + title: Config Version + description: Reference to a specific ModelDeploymentConfig version. If not + specified, uses latest. + type: integer + type: object + required: + - config + title: UpdateModelDeploymentRequest + description: Request model for updating a ModelDeployment (creates new version). + UpdateModelDeploymentStatusRequest: + properties: + status: + allOf: + - $ref: '#/components/schemas/ModelDeploymentStatus' + description: New status for the deployment + status_message: + type: string + maxLength: 1000 + title: Status Message + description: Detailed status message + default: '' + model_provider_id: + title: Model Provider Id + description: 'Optional reference to the auto-created ModelProvider workspace/name + (format: workspace/name)' + type: string + maxLength: 255 + type: object + required: + - status + title: UpdateModelDeploymentStatusRequest + description: Request model for updating ModelDeployment status. + UpdateModelEntityRequest: + properties: + description: + title: Description + description: Optional description of the model + type: string + maxLength: 1000 + spec: + allOf: + - $ref: '#/components/schemas/ModelSpec' + description: Detailed specification for the model + fileset: + title: Fileset + description: A set of checkpoint files, configs, and other auxiliary info + associated with this model - expected format {workspace}/{fileset_name} + type: string + finetuning_type: + allOf: + - $ref: '#/components/schemas/FinetuningType' + description: Set for full weight finetuned models + base_model: + title: Base Model + description: Link to another model which is used as a base for the current + model + type: string + api_endpoint: + allOf: + - $ref: '#/components/schemas/APIEndpointData' + description: Data about the inference endpoint for this model + backend_format: + allOf: + - $ref: '#/components/schemas/BackendFormat' + description: Inference API wire format expected by the backend. If unset, + inference routing treats the model as OPENAI_CHAT. + nullable: true + prompt: + allOf: + - $ref: '#/components/schemas/PromptData' + description: Configuration for prompt engineering + custom_fields: + title: Custom Fields + description: Custom fields for additional metadata + additionalProperties: true + type: object + ownership: + title: Ownership + description: Ownership information for the model + additionalProperties: true + type: object + model_providers: + title: Model Providers + description: List of ModelProvider workspace/name resource names that provide + inference for this Model Entity + items: + type: string + type: array + trust_remote_code: + title: Trust Remote Code + description: "Whether to trust remote code for the checkpoint.\n \ + \ Some models without support in certain libraries such as Transformers\ + \ require additional custom Python code to execute.\n Due to security\ + \ ramifications of running arbitrary code, this can only be set to true\ + \ on one of the following conditions:\n (1) the model's fileset's\ + \ source is pre-approved in the platform config, or\n (2) the user\ + \ creating this model is an administrator.\n " + type: boolean + type: object + title: UpdateModelEntityRequest + description: Request model for updating Model Entity metadata. + UpdateModelProviderStatusRequest: + properties: + model_deployment_id: + title: Model Deployment Id + description: Reference to the ModelDeployment ID if this provider is associated + with a deployment + type: string + maxLength: 255 + served_models: + title: Served Models + description: List of models served by this provider with routing information + for IGW + items: + $ref: '#/components/schemas/ServedModelMapping' + type: array + status: + allOf: + - $ref: '#/components/schemas/ModelProviderStatus' + description: Status of the model provider + status_message: + title: Status Message + description: Status message. If status is provided without status_message, + defaults to empty string. + type: string + maxLength: 1000 + type: object + title: UpdateModelProviderStatusRequest + description: 'Request model for updating ModelProvider status and autodiscovery + fields. + + + This endpoint supports partial updates for fields managed by Models Controller.' + UpdateVirtualModelRequest: + properties: + default_model_entity: + title: Default Model Entity + description: Model entity to route to, in "workspace/name" format. Written + into request["model"] before the request middleware pipeline runs. If + omitted, a request middleware plugin must handle backend routing itself. + Set to null to clear an existing value. + type: string + autoprovisioned: + type: boolean + title: Autoprovisioned + description: Marks this VirtualModel as controller-managed. The Models controller + will delete it once no ModelProvider serves the matching entity. Setting + this manually opts the VirtualModel into that cleanup behavior. + default: false + models: + items: + $ref: '#/components/schemas/VirtualModelInferenceConfig' + type: array + title: Models + description: Model entity references used by this VirtualModel. A per-entry + backend_format overrides the referenced ModelEntity backend_format when + IGW resolves the backend format for a request. + request_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Request Middleware + description: Ordered list of middleware plugins applied before proxying + to the backend. Each entry is a MiddlewareCall with a "name" (plugin identifier) + and optional "config_type" and "config_id" fields that reference a stored + plugin configuration. + response_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Response Middleware + description: Ordered list of middleware plugins applied after the backend + response is received, before returning it to the caller. + post_response_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Post Response Middleware + description: Ordered list of middleware plugins invoked after the response + has been returned to the caller. Intended for fire-and-forget work (logging, + analytics) that must not block or modify the response. + override_proxy: + title: Override Proxy + description: 'Plugin-provided proxy implementation for IGW to use instead + of its default aiohttp proxy. Format: "plugin-name.proxy-name". Leave + unset to use the default IGW proxy. Set to null to clear an existing value.' + type: string + type: object + title: UpdateVirtualModelRequest + description: 'Request body for partially updating an existing VirtualModel (PATCH). + + + Only fields present in the request body are updated. Omitted fields + + retain their current values. ``model_fields_set`` is used in the handler + + to distinguish an intentional ``[]`` (clear the list) from a missing field + + (leave unchanged). Set ``default_model_entity`` or ``override_proxy`` to + + ``null`` explicitly to clear them.' + UpsertModelProviderRequest: + properties: + project: + title: Project + description: The URN of the project associated with this model provider + type: string + maxLength: 255 + pattern: ^[\w\-./]+$ + description: + title: Description + description: Optional description of the model provider + type: string + maxLength: 1000 + host_url: + type: string + maxLength: 2048 + title: Host Url + description: The network endpoint URL for the model provider + api_key_secret_name: + title: Api Key Secret Name + description: Reference to an API key secret stored in the Secrets service. + Create the secret first via secrets API, then pass the secret name here. + type: string + maxLength: 255 + enabled_models: + title: Enabled Models + description: Optional list of specific models to enable from this provider + items: + type: string + type: array + default_extra_body: + title: Default Extra Body + description: Default body parameters for inference requests. Can be overridden + by user requests. + additionalProperties: true + type: object + default_extra_headers: + title: Default Extra Headers + description: Default headers for inference requests. Can be overridden by + user requests. + additionalProperties: + type: string + type: object + required_extra_body: + title: Required Extra Body + description: Required body parameters for inference requests. Cannot be + overridden by user requests. + additionalProperties: true + type: object + required_extra_headers: + title: Required Extra Headers + description: Required headers for inference requests. Cannot be overridden + by user requests. + additionalProperties: + type: string + type: object + model_deployment_id: + title: Model Deployment Id + description: Optional reference to the ModelDeployment ID if this provider + is associated with a deployment + type: string + maxLength: 255 + status: + allOf: + - $ref: '#/components/schemas/ModelProviderStatus' + description: Status of the model provider + status_message: + title: Status Message + description: Status message + type: string + maxLength: 1000 + auth_header_format: + title: Auth Header Format + description: 'Jinja2 template string controlling how the API key secret + is sent to the upstream. Must contain exactly one variable named `auth_secret`, + which is substituted with the resolved secret value at request time. Example: + `''X-Api-Key: {{ auth_secret }}''`. If not set, defaults to `''Authorization: + Bearer {{ auth_secret }}''`.' + type: string + maxLength: 1024 + type: object + required: + - host_url + title: UpsertModelProviderRequest + description: 'Request model for upserting a ModelProvider (PUT /apis/models/v2/workspaces/{workspace}/providers/{name}). + + + All fields must be provided - partial updates are not supported for security + reasons. + + Use PUT /status endpoint to update status-related fields only.' + UserMessagesConfig: + properties: + embeddings_only: + type: boolean + title: Embeddings Only + description: Whether to use only embeddings for computing the user canonical + form messages. + default: false + embeddings_only_similarity_threshold: + title: Embeddings Only Similarity Threshold + description: The similarity threshold to use when using only embeddings + for computing the user canonical form messages. + type: number + maximum: 1.0 + minimum: 0.0 + embeddings_only_fallback_intent: + title: Embeddings Only Fallback Intent + description: Defines the fallback intent when the similarity is below the + threshold. If set to None, the user intent is computed normally using + the LLM. If set to a string value, that string is used as the intent. + type: string + type: object + title: UserMessagesConfig + description: Configuration for how the user messages are interpreted. + ValidationError: + properties: + loc: + items: + anyOf: + - type: string + - type: integer + type: array + title: Location + msg: + type: string + title: Message + type: + type: string + title: Error Type + input: + title: Input + ctx: + type: object + title: Context + additionalProperties: true + type: object + required: + - loc + - msg + - type + title: ValidationError + VirtualModel: + properties: + name: + type: string + title: Name + description: Entity name within the workspace + default: '' + workspace: + type: string + pattern: ^[\w\-\+.@:]+$ + title: Workspace + description: Workspace identifier + project: + title: Project + description: The name of the project associated with this entity. + type: string + default_model_entity: + title: Default Model Entity + type: string + autoprovisioned: + type: boolean + title: Autoprovisioned + description: Marks this VirtualModel as controller-managed. The Models controller + will delete it once no ModelProvider serves the matching entity. Setting + this manually opts the VirtualModel into that cleanup behavior. + default: false + models: + items: + $ref: '#/components/schemas/VirtualModelInferenceConfig' + type: array + title: Models + request_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Request Middleware + default: [] + response_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Response Middleware + default: [] + post_response_middleware: + items: + $ref: '#/components/schemas/MiddlewareCall' + type: array + title: Post Response Middleware + default: [] + override_proxy: + title: Override Proxy + type: string + id: + type: string + title: Id + readOnly: true + created_at: + title: Created At + readOnly: true + type: string + format: date-time + created_by: + title: Created By + readOnly: true + nullable: true + type: string + updated_at: + title: Updated At + readOnly: true + type: string + format: date-time + updated_by: + title: Updated By + readOnly: true + nullable: true + type: string + entity_id: + type: string + title: Entity Id + description: Alias for id for backwards compatibility. + readOnly: true + parent: + title: Parent + description: Parent entity ID for nested entities. + readOnly: true + type: string + type: object + required: + - workspace + - id + - created_at + - created_by + - updated_at + - updated_by + - entity_id + - parent + title: VirtualModel + description: 'Logical inference route. + + + Maps a user-facing model name to an optional default model entity and + + defines ordered middleware pipelines for the request, response, and + + post-response phases. + + + When a caller sets ``model: "workspace/my-virtual-model"`` in an inference + + request, IGW resolves the ``VirtualModel`` instead of a ``ModelEntity`` + + directly. If ``default_model_entity`` is set, IGW writes it into + + ``request["model"]`` before the request middleware pipeline runs. Middleware + + may mutate ``request["model"]`` freely. After the pipeline completes, IGW + + reads ``request["model"]``, resolves it to a ``ModelProvider`` via the + + ``ModelCache``, and proxies. + + + The ``ModelProviderReconciler`` auto-creates a passthrough ``VirtualModel`` + + for each discovered model (same workspace and name as the ``ModelEntity``, + + empty middleware lists, ``default_model_entity`` pointing to that entity). + + All existing inference requests continue to work without changes.' + VirtualModelInferenceConfig: + properties: + model: + type: string + title: Model + backend_format: + allOf: + - $ref: '#/components/schemas/BackendFormat' + description: Optional backend format override for this VirtualModel entry. + nullable: true + type: object + required: + - model + title: VirtualModelInferenceConfig + description: Inference configuration for one model entity referenced by a VirtualModel. + VirtualModelsPage: + properties: + data: + items: + $ref: '#/components/schemas/VirtualModel' + type: array + title: Data + pagination: + allOf: + - $ref: '#/components/schemas/PaginationData' + description: Pagination information. + sort: + title: Sort + description: The field on which the results are sorted. + type: string + filter: + title: Filter + description: Filtering information. + additionalProperties: true + type: object + type: object + required: + - data + title: VirtualModelsPage + VolcanoJobExecutionProfile: + properties: + provider: + type: string + title: Provider + description: The compute provider for the executor, e.g., cpu, gpu + default: cpu + profile: + type: string + title: Profile + description: The profile name for the executor, e.g., high_priority_a100, + low_priority, etc. + default: default + backend: + type: string + const: volcano_job + title: Backend + default: volcano_job + config: + allOf: + - $ref: '#/components/schemas/VolcanoJobExecutionProfileConfig' + description: Additional configuration for the kubernetes executor + type: object + required: + - config + title: VolcanoJobExecutionProfile + description: Volcano Job Execution Profile + VolcanoJobExecutionProfileConfig: + properties: + ttl_seconds_before_active: + type: integer + title: Ttl Seconds Before Active + default: 1800 + ttl_seconds_active: + type: integer + title: Ttl Seconds Active + default: 86400 + ttl_seconds_after_finished: + type: integer + title: Ttl Seconds After Finished + default: 3600 + cleanup_completed_jobs_immediately: + type: boolean + title: Cleanup Completed Jobs Immediately + default: true + launcher_tool_path: + type: string + title: Launcher Tool Path + description: Path to the jobs launcher tool + default: /tools/jobs-launcher + env: + additionalProperties: + type: string + type: object + title: Env + description: Optional env vars applied to all jobs (e.g. HOME=/tmp). Keys + must not conflict with platform-reserved names. Job steps may override + these variables. + namespace: + title: Namespace + description: Kubernetes namespace to submit the job to. If not set, it will + be determined from the environment. + type: string + service_account_name: + type: string + title: Service Account Name + description: Kubernetes service account name for job pods. Uses the Kubernetes + default service account when set to 'default'. + default: default + tolerations: + items: + additionalProperties: true + type: object + type: array + title: Tolerations + description: Tolerations for the Kubernetes job pods. + node_selector: + additionalProperties: + type: string + type: object + title: Node Selector + description: Node selector for the Kubernetes job pods. + affinity: + additionalProperties: true + type: object + title: Affinity + description: Affinity for the Kubernetes job pods. + resources: + allOf: + - $ref: '#/components/schemas/ComputeResources' + description: Resource requests and limits for the Kubernetes job pods. + pod_security_context: + additionalProperties: true + type: object + title: Pod Security Context + description: Pod security context for the Kubernetes job pods. + image_pull_secrets: + items: + $ref: '#/components/schemas/ImagePullSecret' + type: array + title: Image Pull Secrets + description: Image pull secrets for the Kubernetes job pods. + job_metadata: + allOf: + - $ref: '#/components/schemas/KubernetesObjectMetadata' + description: Metadata to add to each job object in the Kubernetes job. + pod_metadata: + allOf: + - $ref: '#/components/schemas/KubernetesObjectMetadata' + description: Metadata to add to each pod in the Kubernetes job. + storage: + allOf: + - $ref: '#/components/schemas/KubernetesJobStorageConfig' + description: Storage configuration for the Kubernetes job pods. + num_gpus: + type: integer + title: Num Gpus + description: Number of GPUs to request for the job + default: 1 + scheduler_name: + type: string + title: Scheduler Name + description: The scheduler name to use for the Volcano job. + default: volcano + launcher_image: + type: string + title: Launcher Image + description: Container image that contains the jobs-launcher binary. + default: nvcr.io/nvidia/nemo-microservices/jobs-launcher:latest + queue: + type: string + title: Queue + description: The Volcano queue to submit the job to. + default: default + max_retry: + type: integer + title: Max Retry + description: maxRetry indicates the maximum number of retries allowed by + the job + default: 0 + plugins: + additionalProperties: true + type: object + title: Plugins + description: plugins indicates the plugins used by Volcano when the job + is scheduled. We always add the pytorch plugin if more than one node. + enable_multi_node_networking: + type: boolean + title: Enable Multi Node Networking + description: Enable multi-node networking injection. Sets annotations to + trigger Kyverno policy mutations. + default: true + type: object + title: VolcanoJobExecutionProfileConfig + description: Configuration for Volcano Job Execution Profile + Workspace: + properties: + id: + type: string + title: Id + description: System-generated UUID + name: + type: string + title: Name + description: Workspace name (user-provided) + description: + title: Description + description: Optional description + type: string + created_at: + type: string + format: date-time + title: Created At + description: Timestamp of workspace creation + created_by: + title: Created By + description: Principal id for workspace creator + type: string + updated_at: + type: string + format: date-time + title: Updated At + description: Timestamp of last workspace update + updated_by: + title: Updated By + description: Principal id for last workspace update + type: string + additionalProperties: false + type: object + required: + - id + - name + - created_at + - updated_at + title: Workspace + description: Workspace schema for API responses. + WorkspaceInput: + properties: + name: + type: string + pattern: ^[a-z](?!.*--)[a-z0-9\-@.+_]{1,62}(? + + ███╗ ██╗███████╗███╗ ███╗ ██████╗ + ████╗ ██║██╔════╝████╗ ████║██╔═══██╗ + ██╔██╗ ██║█████╗ ██╔████╔██║██║ ██║ + ██║╚██╗██║██╔══╝ ██║╚██╔╝██║██║ ██║ + ██║ ╚████║███████╗██║ ╚═╝ ██║╚██████╔╝ + ╚═╝ ╚═══╝╚══════╝╚═╝ ╚═╝ ╚═════╝ + PLATFORM + diff --git a/docs/fern/assets/nvidia-logo-white.png b/docs/fern/assets/nvidia-logo-white.png new file mode 100644 index 0000000000..f6c73aed43 --- /dev/null +++ b/docs/fern/assets/nvidia-logo-white.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00b499146c0d590c254ff8ab555298cfa88605938bf883a880a58067eccdb571 +size 47302 diff --git a/docs/fern/components/Authors.tsx b/docs/fern/components/Authors.tsx new file mode 100644 index 0000000000..0c71197953 --- /dev/null +++ b/docs/fern/components/Authors.tsx @@ -0,0 +1,56 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * Authors - Renders author byline with avatars for dev notes / blog posts. + * + * Uses authors data from components/devnotes/authors-data.ts (synced with .authors.yml). + * NOTE: Fern's custom component pipeline uses the automatic JSX runtime. + * + * Usage in MDX (authors from frontmatter): + * --- + * authors: + * - jdoe + * - asmith + * --- + * + * import { Authors } from "@/components/Authors"; + * + */ + +import { authors } from "./devnotes/authors-data"; + +export interface AuthorsProps { + /** Author IDs from .authors.yml. From frontmatter: ids={authors} */ + ids?: string[]; +} + +export const Authors = ({ ids }: AuthorsProps) => { + const validAuthors = (ids ?? []) + .map((id) => authors[id]) + .filter(Boolean); + + if (validAuthors.length === 0) return null; + + return ( +
+ {validAuthors.map((author, i) => ( +
+ +
+ {author.name} + {author.description} +
+
+ ))} +
+ ); +}; diff --git a/docs/fern/components/BadgeLinks.tsx b/docs/fern/components/BadgeLinks.tsx new file mode 100644 index 0000000000..c4a5949c33 --- /dev/null +++ b/docs/fern/components/BadgeLinks.tsx @@ -0,0 +1,37 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * Badge links for GitHub, License, PyPI, etc. + * Uses a custom wrapper to avoid Fern's external-link icon stacking under badges. + * + * `badges` is required — there is intentionally no default. A previous + * version shipped placeholder URLs that could land in production for + * sites that rendered the component without props. See README-BadgeLinks.md. + */ +export type BadgeItem = { + href: string; + src: string; + alt: string; +}; + +export interface BadgeLinksProps { + badges: BadgeItem[]; +} + +export function BadgeLinks({ badges }: BadgeLinksProps) { + return ( +
+ {badges.map((b) => ( + + {b.alt} + + ))} +
+ ); +} diff --git a/docs/fern/components/CustomCard.tsx b/docs/fern/components/CustomCard.tsx new file mode 100644 index 0000000000..f120898f68 --- /dev/null +++ b/docs/fern/components/CustomCard.tsx @@ -0,0 +1,34 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * CustomCard - Simple card with title, text, link, and optional sparkle. + * + * Alternative to Fern's built-in when you need custom styling + * (e.g. devnotes/blog landing pages). + * NOTE: Fern's custom component pipeline uses the automatic JSX runtime. + * + * Usage in MDX: + * import { CustomCard } from "@/components/CustomCard"; + * + */ + +export interface CustomCardProps { + title: string; + text: string; + link: string; + sparkle?: boolean; +} + +export const CustomCard = ({ title, text, link, sparkle = false }: CustomCardProps) => { + return ( + +

+ {title} {sparkle && "✨"} +

+

{text}

+
+ ); +}; diff --git a/docs/fern/components/MetricsTable.tsx b/docs/fern/components/MetricsTable.tsx new file mode 100644 index 0000000000..f95dc18646 --- /dev/null +++ b/docs/fern/components/MetricsTable.tsx @@ -0,0 +1,106 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + */ + +/** + * MetricsTable - Styled comparison table for benchmark results. + * + * Optional: highlights best values per column (bold). + * NOTE: Fern's custom component pipeline uses the automatic JSX runtime. + * Do NOT import React -- the `react` module is not resolvable in Fern's build. + * + * Usage in MDX: + * import { MetricsTable } from "@/components/MetricsTable"; + * + * + */ + +export interface MetricsTableProps { + headers: string[]; + rows: (string | number)[][]; + /** Column indices where lower is better (for highlighting) */ + lowerIsBetter?: number[]; + /** Column indices where higher is better (default for non-lowerIsBetter) */ + higherIsBetter?: number[]; +} + +function findBestIndices( + rows: (string | number)[][], + colIndex: number, + lowerIsBetter: boolean +): Set { + const values = rows.map((r) => { + const v = r[colIndex]; + if (typeof v === "number") return v; + const parsed = parseFloat(String(v)); + return isNaN(parsed) ? (lowerIsBetter ? Infinity : -Infinity) : parsed; + }); + const best = lowerIsBetter ? Math.min(...values) : Math.max(...values); + const bestIndices = new Set(); + values.forEach((v, i) => { + if (v === best) bestIndices.add(i); + }); + return bestIndices; +} + +export const MetricsTable = ({ + headers, + rows, + lowerIsBetter = [], + higherIsBetter = [], +}: MetricsTableProps) => { + const lowerSet = new Set(lowerIsBetter); + const bestByCol: Record> = {}; + + for (let c = 0; c < headers.length; c++) { + if (lowerSet.has(c)) { + bestByCol[c] = findBestIndices(rows, c, true); + } else if (higherIsBetter.includes(c)) { + bestByCol[c] = findBestIndices(rows, c, false); + } else { + const numLike = rows.every((r) => { + const v = r[c]; + return typeof v === "number" || !isNaN(parseFloat(String(v))); + }); + if (numLike) { + bestByCol[c] = findBestIndices(rows, c, false); + } + } + } + + return ( +
+ + + + {headers.map((h, i) => ( + + ))} + + + + {rows.map((row, rowIdx) => ( + + {row.map((cell, colIdx) => { + const isBest = bestByCol[colIdx]?.has(rowIdx); + return ( + + ); + })} + + ))} + +
{h}
+ {cell} +
+
+ ); +}; diff --git a/docs/fern/components/NotebookViewer.tsx b/docs/fern/components/NotebookViewer.tsx new file mode 100644 index 0000000000..e3842891e1 --- /dev/null +++ b/docs/fern/components/NotebookViewer.tsx @@ -0,0 +1,399 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + */ + +import type { ReactNode } from "react"; + +/** + * NotebookViewer - Renders Jupyter notebook content in Fern docs. + * + * Uses Fern's code block structure (fern-code, fern-code-block, etc.) so input + * and output cells match the default Fern code block styling. + * + * Accepts notebook cells (markdown + code) and optionally a Colab URL. + * Designed to work with notebooks converted via `scripts/converters/ipynb_to_fern_json.py` + * (NeMo Data Designer–compatible pipeline; sources may be plain `.ipynb` or Jupytext). + * + * NOTE: Fern's custom component pipeline uses the automatic JSX runtime. + * Only type-only imports from "react" are used (erased at compile time). + * + * Usage in MDX: + * import { NotebookViewer } from "@/components/NotebookViewer"; + * import notebook from "@/components/notebooks/1-the-basics"; + * + * + */ + +export interface CellOutput { + type: "text" | "image"; + data: string; + format?: "plain" | "html"; +} + +export interface NotebookCell { + type: "markdown" | "code"; + source: string; + /** Pre-rendered syntax-highlighted HTML (from Pygments). When present, used instead of escaped source. */ + source_html?: string; + language?: string; + outputs?: CellOutput[]; +} + +export interface NotebookData { + cells: NotebookCell[]; +} + +export interface NotebookViewerProps { + /** Notebook data with cells array. If import fails, this may be undefined. */ + notebook?: NotebookData | null; + /** Optional Colab URL for "Run in Colab" badge */ + colabUrl?: string; + /** Show code cell outputs (default: true) */ + showOutputs?: boolean; +} + +function NotebookViewerError({ message, detail }: { message: string; detail?: string }) { + return ( +
+ NotebookViewer error: {message} + {detail && ( +
+          {detail}
+        
+ )} +
+ ); +} + +function escapeHtml(text: string): string { + if (typeof text !== "string") return ""; + return text + .replace(/&/g, "&") + .replace(//g, ">") + .replace(/"/g, """); +} + +// Sprint 4.2: markdown is rendered server-side in ipynb_to_fern_json.py +// (markdown-it-py) and emitted as cell.source_html. The component renders +// that HTML directly. The hand-rolled JS parser previously here mishandled +// blockquotes, fenced code, tables, and nested lists; it has been removed. + +function handleCopy(content: string, button: HTMLButtonElement) { + navigator.clipboard.writeText(content).catch(() => {}); + const originalHtml = button.innerHTML; + const originalLabel = button.getAttribute("aria-label") ?? "Copy code"; + button.innerHTML = "Copied!"; + button.setAttribute("aria-label", "Copied to clipboard"); + setTimeout(() => { + button.innerHTML = originalHtml; + button.setAttribute("aria-label", originalLabel); + }, 1500); +} + +const FLAG_ICON = ( + + + + +); + +const SCROLL_AREA_STYLE = `[data-radix-scroll-area-viewport]{scrollbar-width:none;-ms-overflow-style:none;-webkit-overflow-scrolling:touch;}[data-radix-scroll-area-viewport]::-webkit-scrollbar{display:none}`; + +const BUTTON_BASE_CLASS = + "focus-visible:ring-(color:--accent) rounded-2 inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-colors hover:transition-none focus-visible:outline-none focus-visible:ring-1 disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none [&_svg]:size-4 [&_svg]:shrink-0 text-(color:--grayscale-a11) hover:bg-(color:--accent-a3) hover:text-(color:--accent-11) pointer-coarse:size-9 size-7"; + +/** Fern code block structure – matches Fern docs (header with language + buttons, pre with scroll area). */ +function FernCodeBlock({ + title, + children, + className = "", + asPre = true, + copyContent, + showLineNumbers = false, + codeHtml, +}: { + title: string; + children: ReactNode; + className?: string; + /** Use div instead of pre for content (needed when children include block elements like img/div). */ + asPre?: boolean; + /** Raw text to copy when copy button is clicked. When provided, shows a copy button. */ + copyContent?: string; + /** Show line numbers in a table layout (matches Fern's code block structure). */ + showLineNumbers?: boolean; + /** Pre-rendered HTML for each line when showLineNumbers is true. Lines are split by newline. */ + codeHtml?: string; +}) { + const headerLabel = title === "Output" ? "Output" : title.charAt(0).toUpperCase() + title.slice(1); + const wrapperClasses = + "fern-code fern-code-block bg-card-background border-card-border rounded-3 shadow-card-grayscale relative mb-6 mt-4 flex w-full min-w-0 max-w-full flex-col border first:mt-0"; + const preStyle = { + backgroundColor: "rgb(255, 255, 255)", + ["--shiki-dark-bg" as string]: "#212121", + color: "rgb(36, 41, 46)", + ["--shiki-dark" as string]: "#EEFFFF", + }; + + const scrollAreaContent = () => { + if (codeHtml == null) return null; + const lines = codeHtml.split("\n"); + return ( +
+ +``` + +# NeMo Platform Helm Chart ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) -For deployment guide, see [Admin Setup](../set-up/index.md) in the {{platform_name}} documentation. +For deployment guide, see [Admin Setup](/platform/about) in the NeMo Platform documentation. ## Values -The following is the complete `values.yaml` file for the {{platform_name}} Helm Chart. +The following is the complete `values.yaml` file for the NeMo Platform Helm Chart. All configuration options are documented inline with comments. ---8<-- "_values.yaml" +`--8<-- "_values.yaml"` diff --git a/docs/index.mdx b/docs/index.mdx index eae8c12333..04ec86c45c 100644 --- a/docs/index.mdx +++ b/docs/index.mdx @@ -1,8 +1,10 @@ -# {{platform_name}} - +--- +title: "Home" +description: "" +--- Make the agents you ship faster, more accurate, and safer. -{{platform_name}} brings NVIDIA NeMo libraries together under one CLI, Python SDK, and web UI. Hardening, evaluation, and tuning for the agents you put in production. +NeMo Platform brings NVIDIA NeMo libraries together under one CLI, Python SDK, and web UI. Hardening, evaluation, and tuning for the agents you put in production. ## What You Can Do @@ -17,18 +19,18 @@ Make the agents you ship faster, more accurate, and safer. The platform provides several shared capabilities to NeMo libraries. -- [Models and Inference](run-inference/about.md) for model providers, virtual +- [Models and Inference](/models-and-inference/about) for model providers, virtual models, and gateway calls. -- [Files](get-started/concepts/manage-files.md), [Secrets](get-started/concepts/manage-secrets.md), - [Entities](get-started/concepts/entities.md), and jobs for storing state and +- [Files](/get-started/core-concepts/manage-files), [Secrets](/get-started/core-concepts/manage-secrets), + [Entities](/get-started/core-concepts/entities), and jobs for storing state and running local work. ## Where to Go Next -- [Setup](get-started/setup.md) - install, configure providers, and run +- [Setup](/get-started/setup) - install, configure providers, and run local services. -- [About Agents](agents/index.md) - learn the managed agent lifecycle. -- [Optimize Agents](agents/optimization.md) - improve cost, quality, and model +- [About Agents](/agents) - learn the managed agent lifecycle. +- [Optimize Agents](/agents/optimize-agents) - improve cost, quality, and model routing. -- [Secure Agents](agents/security.md) - harden agents with guardrails and data +- [Secure Agents](/agents/secure-agents) - harden agents with guardrails and data safety checks. diff --git a/docs/javascripts/api-filter.js b/docs/javascripts/api-filter.js deleted file mode 100644 index b42ea02938..0000000000 --- a/docs/javascripts/api-filter.js +++ /dev/null @@ -1,185 +0,0 @@ -// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -// SPDX-License-Identifier: Apache-2.0 - -// API Reference page: chip-based tag filtering for Swagger UI iframe -(function() { - function initApiFilter() { - var chips = document.querySelectorAll('.api-chip'); - var chipContainer = document.querySelector('.api-filter-chips'); - var iframe = document.querySelector('iframe.swagger-ui-iframe'); - if (!chipContainer || !iframe) return; - var hiddenTags = {}; - if (chipContainer.dataset.hiddenTags) { - chipContainer.dataset.hiddenTags.split(',').forEach(function(tag) { - tag = tag.trim(); - if (tag) hiddenTags[tag] = true; - }); - } - var hiddenTagIds = {}; - Object.keys(hiddenTags).forEach(function(tag) { - hiddenTagIds['operations-tag-' + tag.replace(/ /g, '_')] = true; - }); - var retryTimer = null; - var swaggerObserver = null; - - function tagSlug(tag) { - return tag.toLowerCase().trim().replace(/[^a-z0-9]+/g, '-').replace(/^-+|-+$/g, ''); - } - - function chipHash(chip) { - var id = chip.getAttribute('id'); - if (id) return '#' + id; - - var tag = chip.getAttribute('data-tag') || ''; - return tag ? '#tag-' + tagSlug(tag) : '#tag-all'; - } - - function tagFromHash() { - var hash = window.location.hash; - if (!hash) return ''; - - var selectedTag = ''; - var normalizedHash = decodeURIComponent(hash).toLowerCase(); - chips.forEach(function(chip) { - var tag = chip.getAttribute('data-tag') || ''; - if (chipHash(chip).toLowerCase() === normalizedHash) { - selectedTag = tag; - } - }); - - return hiddenTags[selectedTag] ? '' : selectedTag; - } - - function getDoc() { - try { return iframe.contentDocument || iframe.contentWindow.document; } - catch(e) { return null; } - } - - function applyFilter(tag) { - var doc = getDoc(); - if (!doc || !doc.querySelectorAll('.opblock-tag-section').length) return false; - if (hiddenTags[tag]) tag = ''; - var tagId = tag ? 'operations-tag-' + tag.replace(/ /g, '_') : ''; - doc.querySelectorAll('.opblock-tag-section').forEach(function(s) { - var heading = s.querySelector('.opblock-tag'); - var id = heading ? heading.getAttribute('id') : ''; - s.style.display = (!hiddenTagIds[id] && (!tag || id === tagId)) ? '' : 'none'; - }); - iframe.style.visibility = ''; - return true; - } - - function setActiveChip(tag) { - chips.forEach(function(chip) { - var isActive = (chip.getAttribute('data-tag') || '') === tag; - chip.classList.toggle('active', isActive); - }); - } - - function updateUrl(chip) { - var nextUrl = window.location.pathname + window.location.search + chipHash(chip); - if (window.location.href !== window.location.origin + nextUrl) { - window.history.pushState(null, '', nextUrl); - } - } - - function selectTag(tag, chip, shouldUpdateUrl) { - if (hiddenTags[tag]) tag = ''; - setActiveChip(tag); - if (!applyFilter(tag)) scheduleTryInit(50); - if (shouldUpdateUrl && chip) updateUrl(chip); - } - - function syncFromHash() { - selectTag(tagFromHash(), null, false); - } - - function bindChips() { - chips.forEach(function(chip) { - if (hiddenTags[chip.getAttribute('data-tag')]) { - chip.style.display = 'none'; - return; - } - if (chip.dataset.apiFilterBound) return; - chip.dataset.apiFilterBound = 'true'; - chip.addEventListener('click', function() { - selectTag(chip.getAttribute('data-tag') || '', chip, true); - }); - }); - } - - function scheduleTryInit(delayMs) { - if (retryTimer) return; - retryTimer = setTimeout(function() { - retryTimer = null; - tryInit(); - }, delayMs); - } - - function disconnectObserver() { - if (!swaggerObserver) return; - swaggerObserver.disconnect(); - swaggerObserver = null; - } - - function watchSwaggerDoc(doc) { - if (swaggerObserver || !doc || !doc.body || typeof MutationObserver === 'undefined') return; - swaggerObserver = new MutationObserver(function() { - if (doc.querySelectorAll('.opblock-tag-section').length) { - disconnectObserver(); - tryInit(); - } - }); - swaggerObserver.observe(doc.body, { childList: true, subtree: true }); - } - - function tryInit() { - var doc = getDoc(); - if (!doc || !doc.body) { - scheduleTryInit(50); - return; - } - if (!doc.querySelectorAll('.opblock-tag-section').length) { - watchSwaggerDoc(doc); - scheduleTryInit(250); - return; - } - - disconnectObserver(); - var filterBar = doc.querySelector('.filter-container'); - if (filterBar) filterBar.style.display = 'none'; - - bindChips(); - syncFromHash(); - } - - window.__apiFilterSyncFromHash = syncFromHash; - if (!window.__apiFilterHashListenerBound) { - window.__apiFilterHashListenerBound = true; - window.addEventListener('hashchange', function() { - if (window.__apiFilterSyncFromHash) window.__apiFilterSyncFromHash(); - }); - window.addEventListener('popstate', function() { - if (window.__apiFilterSyncFromHash) window.__apiFilterSyncFromHash(); - }); - } - - if (tagFromHash()) iframe.style.visibility = 'hidden'; - bindChips(); - syncFromHash(); - iframe.addEventListener('load', function() { - disconnectObserver(); - scheduleTryInit(0); - }); - scheduleTryInit(0); - } - - // MkDocs Material instant navigation re-renders pages without full reloads - if (typeof document$ !== 'undefined') { - document$.subscribe(function() { setTimeout(initApiFilter, 0); }); - } else { - document.addEventListener('DOMContentLoaded', function() { - setTimeout(initApiFilter, 0); - }); - } -})(); diff --git a/docs/notebooks/ndd_evaluator.md b/docs/notebooks/ndd_evaluator.md deleted file mode 100644 index 7b7623981b..0000000000 --- a/docs/notebooks/ndd_evaluator.md +++ /dev/null @@ -1,655 +0,0 @@ -# Emergency Triage LLM Evaluation - -**Problem:** Accurately labeling Emergency Severity Index (ESI) levels from nurse triage notes is critical for clinical research and model development, but real-world data is often sensitive and access is restricted. Additionally, obtaining high-quality human annotations is costly and slow, making it difficult to build large, diverse datasets for robust evaluation. - -**Opportunity:** Synthetic data generation offers a scalable, privacy-preserving alternative. By simulating realistic triage notes and ESI labels, we can create rich datasets without exposing patient information or relying on scarce human annotators. This enables rapid iteration, benchmarking, and model improvement in a domain where data scarcity is a major bottleneck. - -- **Use case:** Predict ESI levels from synthetic nurse triage notes using LLMs -- **Goal:** Evaluate model accuracy and the quality/complexity of generated notes across a range of clinical scenarios -- **Pipeline:** Synthetic data ➜ LLM-as-a-Judge scoring ➜ Filtering ➜ Evaluation - -```text - ┌───────────────────────────────┐ ┌─────────────────────────────┐ - │ Nemo Data Designer │ │ Nemo Evaluator │ - │ +------------------------+ │ │ +-----------------------+ │ - │ | Nurse Triage Note |───┼───────▶│ | LLM predicts ESI | │ - │ +------------------------+ │ │ +-----------------------+ │ - │ + │ │ | │ - │ │ │ v │ - │ +------------------------+ │ │ +-----------------------+ │ - │ | Ground Truth (ESI) |───┼───────▶│ | Predicted ESI | │ - │ +------------------------+ │ │ +-----------------------+ │ - └───────────────────────────────┘ │ | │ - │ v │ - │ +-----------------------+ │ - │ | Metrics (Accuracy) | │ - │ +-----------------------+ │ - └─────────────────────────────┘ -``` - - -- **What happens below**: - - 🏗️ Generate realistic, privacy-safe triage notes, score their quality with LLM-as-a-Judge, and filter for high-signal examples using Data Designer. - - ⬆️ Upload dataset to the Files service - - 📈 Run Evaluator to compute ESI classification accuracy and other metrics - -Tip: Run the cells in order. You can re-run preview/generation to explore different scenarios and difficulty levels. - -## **Step 1**: 🎨 NeMo Data Designer - -First, create a client for interacting with {{platform_name}}. - -```python -import os -from nemo_platform import NeMoPlatform - -# Base URL for the platform -BASE_URL = os.getenv("NMP_BASE_URL", "http://localhost:8080") -WORKSPACE = "default" - -# Initialize NeMoPlatform client (does not trigger any action yet) -client = NeMoPlatform(base_url=BASE_URL, workspace=WORKSPACE) -``` - -Create a Secret and Model Provider to use models hosted on `build.nvidia.com`. -Note: these resources may already exist if you are running {{platform_name}} Quickstart. - -```python -SECRET_NAME = "nvidia-build-api-key" -PROVIDER_NAME = "build" -PROVIDER_API_KEY = os.getenv("NVIDIA_API_KEY") - -existing_secrets = client.secrets.list() -if SECRET_NAME not in [secret.name for secret in existing_secrets.data]: - if PROVIDER_API_KEY is None: - raise ValueError( - "Set NVIDIA_API_KEY to a key with access to NVIDIA Build models" - ) - client.secrets.create( - name=SECRET_NAME, value=PROVIDER_API_KEY, description="NVIDIA Build API key" - ) - -existing_providers = client.inference.providers.list() -if PROVIDER_NAME not in [provider.name for provider in existing_providers.data]: - provider = client.inference.providers.create( - name=PROVIDER_NAME, - host_url="https://integrate.api.nvidia.com", - api_key_secret_name=SECRET_NAME, - description="External provider for build.nvidia.com", - ) -``` - -Create a `DataDesignerConfigBuilder` with the models you intend to use. - -```python -import data_designer.config as dd - - -MODEL_ALIAS_GENERATOR = "content_generator" -MODEL_ID_GENERATOR = "nvidia/nemotron-3-nano-30b-a3b" - -MODEL_ALIAS_JUDGE = "judge" -MODEL_ID_JUDGE = "openai/gpt-oss-120b" - -model_configs = [ - dd.ModelConfig( - provider=PROVIDER_NAME, - alias=MODEL_ALIAS_GENERATOR, - model=MODEL_ID_GENERATOR, - inference_parameters=dd.ChatCompletionInferenceParams( - max_tokens=8000, - temperature=0.7, - top_p=0.95, - ), - ), - dd.ModelConfig( - provider=PROVIDER_NAME, - alias=MODEL_ALIAS_JUDGE, - model=MODEL_ID_JUDGE, - inference_parameters=dd.ChatCompletionInferenceParams( - max_tokens=4096, - temperature=0.1, - top_p=0.95, - ), - ), -] - -config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs) -``` - -## 🎲 Sampler columns - -```python -# ESI levels -ESI_LEVELS = [ - "ESI 1: Resuscitation", - "ESI 2: Emergency", - "ESI 3: Urgent", - "ESI 4: Less Urgent", - "ESI 5: Non-urgent", -] - -# Unique record ID -config_builder.add_column( - dd.SamplerColumnConfig( - name="record_id", - sampler_type=dd.SamplerType.UUID, - params=dd.UUIDSamplerParams( - short_form=True, - uppercase=True, - ), - ) -) - -# ESI level (balanced sampling) -config_builder.add_column( - dd.SamplerColumnConfig( - name="esi_level_description", - sampler_type=dd.SamplerType.CATEGORY, - params=dd.CategorySamplerParams( - values=ESI_LEVELS, - ), - ) -) - -# Clinical scenario (conditioned on ESI level) -config_builder.add_column( - dd.SamplerColumnConfig( - name="clinical_scenario", - sampler_type=dd.SamplerType.SUBCATEGORY, - params=dd.SubcategorySamplerParams( - category="esi_level_description", - values={ - ESI_LEVELS[0]: [ - "Cardiac arrest", - "Unresponsive with no pulse", - "Severe respiratory distress", - "Major trauma with signs of shock", - "Suspected narcotic overdose with shallow respirations", - ], - ESI_LEVELS[1]: [ - "Crushing substernal chest pain radiating to the left arm", - "Sudden onset of facial droop and arm weakness", - "New onset confusion in an elderly patient", - "Active suicidal ideation with a plan", - "High-speed motor vehicle accident", - "Severe abdominal pain in a patient with a history of aortic aneurysm", - ], - ESI_LEVELS[2]: [ - "Abdominal pain with fever and nausea", - "High fever with a productive cough and history of COPD", - "Displaced fracture with visible deformity", - "Asthma attack, responsive to initial treatment", - "Vaginal bleeding in a pregnant patient", - "Head injury with brief loss of consciousness", - ], - ESI_LEVELS[3]: [ - "Simple laceration requiring sutures", - "Twisted ankle, unable to bear weight", - "Sore throat with fever", - "Symptoms of a urinary tract infection", - "Painful ear with fever in a child", - ], - ESI_LEVELS[4]: [ - "Request for a prescription refill", - "Suture removal", - "Minor rash present for several days", - "Common cold symptoms", - "Follow-up for a minor wound check", - ], - }, - ), - ) -) - -# Synthetic patient info -config_builder.add_column( - dd.SamplerColumnConfig( - name="patient", - sampler_type=dd.SamplerType.PERSON, - params=dd.PersonSamplerParams(age_range=[18, 70]), - ) -) - -# Triage note writing style (captures range from poor to best quality notes) -config_builder.add_column( - dd.SamplerColumnConfig( - name="writing_style", - sampler_type=dd.SamplerType.CATEGORY, - params=dd.CategorySamplerParams(values=["Draft", "Adequate", "Polished"]), - ) -) -``` - -## 🦜 LLM-generated columns - -```python -{% raw %} -# LLM-generated triage note -config_builder.add_column( - dd.LLMTextColumnConfig( - name="content", - prompt=( - "You are an experienced triage nurse in a busy Emergency Department writing a draft note. " - "Write a realistic, concise triage note in a telegraphic style using common medical abbreviations. " - "The note is for a {{ patient.age }} y/o {{ 'M' if patient.sex == 'Male' else 'F' }}. " - "Triage classification: '{{ esi_level_description }}'. " - "Reason for visit: '{{ clinical_scenario }}'. " - "Desired writing style: '{{ writing_style }}'. " - "Structure the note with 'CC:' and 'HPI:'. " - "Adjust the style and level of clinical detail based on the 'writing_style': " - "- Draft: Use minimal structure, brief statements, and omit some details; clinical indicators may be less clear. " - "- Adequate: Use complete sentences, include all relevant clinical indicators, but avoid excessive detail. " - "- Polished: Be thorough, precise, and clear; include nuanced or subtle signs and show strong clinical reasoning. " - "Also, adjust level of detail based on urgency (ESI 1 is always brief). " - "Respond with ONLY the note text, starting with 'CC:'." - ), - model_alias=MODEL_ALIAS_GENERATOR, - ) -) -{% endraw %} -``` - -## ⚖️ LLM-as-a-Judge Evaluation Step - -```python -{% raw %} -# Rubric: clinical coherence -clinical_coherence_rubric = dd.Score( - name="Clinical Coherence", - description="Evaluates how well the clinical details in the triage note align with the assigned ESI level and scenario.", - options={ - "5": "Note is perfectly aligned with the ESI level and scenario; details are clinically plausible and specific.", - "4": "Note is well-aligned, with only minor details that might be slightly inconsistent.", - "3": "Note is generally consistent, but some key clinical indicators are missing or don't fully match the ESI level.", - "2": "Note shows significant inconsistency between the clinical details and the assigned ESI level.", - "1": "Note is clinically incoherent and does not reflect the assigned ESI level or scenario at all." - } -) - -# Rubric: ESI level complexity (reduced to 3 levels: Simple, Moderate, Complex) -esi_level_complexity_rubric = dd.Score( - name="ESI Level Complexity", - description="Evaluates how difficult it is to infer the correct ESI level from the note. Higher scores indicate greater complexity, which is desirable for creating a challenging dataset.", - options={ - "Complex": "Note contains subtle or conflicting information, requiring clinical reasoning to distinguish between ESI levels.", - "Moderate": "Note requires some clinical inference; indicators are present but not always immediately obvious.", - "Simple": "Note uses clear, direct, or textbook indicators that make the ESI level obvious." - } -) - -# LLM judge: triage note quality -EVAL_TRIAGE_NOTE_PROMPT = """\ -You are an expert ER physician responsible for quality control. Your task is to evaluate a synthetic triage note for its realism and complexity. - -**Triage Situation:** -- ESI Level: '{{ esi_level_description }}' -- Clinical Scenario: '{{ clinical_scenario }}' -- Desired Writing Style: '{{ writing_style }}' -- Patient: {{ patient.age }}-year-old {{ patient.sex }} - -**Generated Triage Note:** -"{{ content }}" - -Take a deep breath and carefully evaluate the "Generated Triage Note". Assess its clinical coherence with the situation and how well it matches the desired complexity. The goal is to create a challenging dataset, so higher complexity scores are desirable. -""" - -config_builder.add_column( - dd.LLMJudgeColumnConfig( - name="triage_note_quality", - model_alias=MODEL_ALIAS_JUDGE, - prompt=EVAL_TRIAGE_NOTE_PROMPT, - scores=[clinical_coherence_rubric, esi_level_complexity_rubric], - ) -) -{% endraw %} -``` - -### 🧪 Generate & Preview - -Tip: Re-run preview to cycle examples; adjust prompts, temperatures, or scenarios to tune realism and difficulty. - -```python -data_designer = client.data_designer -preview = data_designer.preview(config_builder) -``` - -```python -# Run this cell multiple times to cycle through the 10 preview records. -preview.display_sample_record() -``` - -```python -# The preview dataset is available as a pandas DataFrame. -preview.dataset -``` - -### 🚀 Scale Up Generations - -Once satisfied with the preview results, scale up to generate the full dataset. - -```python -# Submit batch job -job = data_designer.create(config_builder, num_records=100) - -job.wait_until_done() -``` - -```python -import tempfile - -# Change this to a persistent path if you want to keep the full job artifacts locally -with tempfile.TemporaryDirectory() as tmpdir: - results = job.download_artifacts(path=tmpdir) - dataset = results.load_dataset() - -print("\nGenerated dataset shape:", dataset.shape) -dataset.head() -``` - -### 🧹 Refinement [Optional] - -Filter the generated dataset to retain only higher-quality triage notes: - -- Keeps only notes with **Clinical Coherence ≥ 2** (as judged by LLM). -- Retrieves ESI level complexity directly from the LLM judge column (`triage_note_quality`). - -```python -import ast -from rich import print - -def filter_by_scores(df, min_coherence=3, samples_per_complexity=100): - indices = [] - for idx, k in enumerate(df['triage_note_quality']): - # If k is a string, parse it to dict - if isinstance(k, str): - try: - k_dict = ast.literal_eval(k) - except Exception: - continue - else: - k_dict = k - try: - coherence_score = int(k_dict['Clinical Coherence']['score']) - if coherence_score >= min_coherence: - indices.append(idx) - except Exception: - continue - filtered_df = df.iloc[indices] - filtered_df = filtered_df[["esi_level_description", "content", "triage_note_quality"]] - filtered_df['esi_level_complexity'] = filtered_df['triage_note_quality'].apply( - lambda k: (ast.literal_eval(k) if isinstance(k, str) else k).get('ESI Level Complexity', {}).get('score') - ) - filtered_df.drop(columns=['triage_note_quality'], inplace=True) - percent_filtered = 100 * len(filtered_df) / len(df) if len(df) > 0 else 0 - print(f"Filtered {len(filtered_df)} out of {len(df)} records ({percent_filtered:.1f}%)") - # Sample up to N per complexity - sampled_df = ( - filtered_df - .groupby('esi_level_complexity', group_keys=False) - .apply(lambda x: x.sample(min(len(x), samples_per_complexity), random_state=42)) - .reset_index(drop=True) - ) - print(f"Sampled {len(sampled_df)} records total, {samples_per_complexity} (or less) per complexity level.") - return sampled_df - -filtered_df = filter_by_scores(dataset, samples_per_complexity=100) -``` - -## 👀 Inspect results - -```python -def show_example_triage_notes(filtered_df, num_examples=5): - from rich.console import Console - from rich.panel import Panel - from rich.text import Text - - console = Console() - examples = filtered_df.sample(num_examples) - - console.print( - f"[italic]Showing last {num_examples} filtered triage notes:[/italic]\n" - ) - for idx, row in examples.iterrows(): - esi_level = str(row.get("esi_level_description", "")) - esi_level_complexity = str(row.get("esi_level_complexity", "")) - content = str(row.get("content", "")) - # Use blue for the complexity level - panel_title = ( - f"ESI Level: {esi_level} [bold][blue]({esi_level_complexity})[/blue][/bold]" - ) - panel = Panel( - Text(content, style="green"), - title=panel_title, - border_style="cyan", - expand=False, - padding=(1, 2), - ) - console.print(panel) - console.print() # Extra newline for separation - - -# Show some example records from the bottom using rich -show_example_triage_notes(filtered_df, num_examples=3) -``` - -## **Step 2**: 📊 Nemo Evaluator - -We evaluate the model on filtered triage notes to see if it predicts the correct ESI level. - -- **Dataset**: JSONL served by the Files service -- **Task**: Completion with structured output `{ "esi_level_description": "..." }` -- **Metric**: String containment check against ground-truth ESI - -```python -from nemo_platform import ConflictError -from nemo_platform.filesets import build_fileset_ref - -# Split the filtered dataframe into different complexity levels -df_complexities = { - "simple": filtered_df[filtered_df["esi_level_complexity"] == "Simple"], - "moderate": filtered_df[filtered_df["esi_level_complexity"] == "Moderate"], - "complex": filtered_df[filtered_df["esi_level_complexity"] == "Complex"], -} - -# Create a dict to store files_url for each complexity level -files_url_dict = {} - -# Create one fileset and store each complexity split as a separate file inside it -FILESET_NAME = "nurse-triage-notes-eval" -try: - client.files.filesets.create( - name=FILESET_NAME, description="Triage evaluator datasets by complexity" - ) -except ConflictError: - pass - -# Loop over each complexity level, preparing, saving, and uploading evaluation datasets -for level, df in df_complexities.items(): - file_name = f"dataset_{level}.jsonl" - df.to_json(file_name, orient="records", lines=True) - print( - f"Dataset prepared with {len(df)} samples for complexity '{level.capitalize()}'" - ) - - # Upload the dataset file to the shared fileset - result = client.files.upload( - fileset=FILESET_NAME, - local_path=file_name, - remote_path=file_name, - ) - - print(f"Dataset uploaded: {result}") # Print result with the uploaded file URL/info - - # Construct files_url for this complexity file within the shared fileset - files_url = build_fileset_ref(file_name, workspace=WORKSPACE, fileset=FILESET_NAME) - files_url_dict[level] = files_url -``` - -## 🧪 Evaluator Flow -This section defines the evaluation configuration used to assess model performance on triage note classification using a custom evaluator. - -```python -{% raw %} -EVALUATOR_CONFIG = { - "prompt_template": { - "messages": [ - { - "role": "system", - "content": ( - "You are an expert ER triage nurse. Your task is to classify the following triage note into one of the five Emergency Severity Index (ESI) levels." - f" The possible levels are: {', '.join([repr(level) for level in ESI_LEVELS])}." - " Carefully analyze the clinical details in the triage note, focusing on patient acuity, resource needs, and risk of rapid deterioration." - " Respond with only the selected ESI level description, exactly matching one of the listed possibilities. Do not provide extra text or explanation." - ) - }, - { - "role": "user", - "content": ( - "Triage Note: {{item.content}}\n" - "Classify the ESI level for this note based on the provided definitions." - " Respond in JSON format only: { \"esi_level_description\": \"...\" }" - ) - } - ], - } -} - -{% endraw %} -``` - -## 🔍 Model evaluation loop and configuration - -This section compares multiple models (A/B testing) on the triage note classification task **across each complexity level** (Simple, Moderate, Complex). - -The models evaluated are: - - **Qwen2.5-7B** (`qwen/qwen2.5-7b-instruct`) - - **Meta Llama3.3 70b** (`meta/llama-3.3-70b-instruct`) - -For *each* complexity level, the accuracy score for each model is printed, allowing for side-by-side evaluation of how each model performs at every complexity. - -```python -{% raw %} -import time -import pandas as pd -from typing import Any -from nemo_platform.types.evaluation import ( - FilesetRef, - MetricOnlineJobParam, - ModelParam, - StringCheckMetricParam, -) - -# This code assumes EVALUATOR_CONFIG is available in the notebook scope - -MODEL_SPECS = [ - { - "name": "Qwen2.5-7B", - "model_id": "qwen/qwen2.5-7b-instruct", - "url": "https://integrate.api.nvidia.com/v1/chat/completions" - }, - { - "name": "Meta Llama3.3 70b", - "model_id": "meta/llama-3.3-70b-instruct", - "url": "https://integrate.api.nvidia.com/v1/chat/completions" - } -] - -COMPLEXITIES = ["simple", "moderate", "complex"] - -def run_evaluation( - client: Any, - evaluator_config: dict[str, Any], - model_spec: dict[str, str], - complexity: str, - files_url_dict: dict[str, FilesetRef], - secret_key_name: str | None = None, -) -> float: - """ - Populates the evaluator_config, filling in the files_url and endpoint, then runs evaluation. - Returns accuracy for the given model+complexity. - """ - fileset_ref: FilesetRef = files_url_dict[complexity] - model_kwargs = { - "url": model_spec["url"], - "name": model_spec["model_id"], - "format": "openai", - } - if secret_key_name: - model_kwargs["api_key_secret"] = secret_key_name - - job_spec = MetricOnlineJobParam( - metric=StringCheckMetricParam( - type="string-check", - operation="contains", - left_template="{{sample.output_text}}", - right_template="{{item.esi_level_description}}", - ), - dataset=fileset_ref, - model=ModelParam(**model_kwargs), - prompt_template=evaluator_config["prompt_template"], - ) - job = client.evaluation.metric_jobs.create( - spec=job_spec, - ) - print(f"Submitted evaluation job for model '{model_spec['name']}' on complexity '{complexity.capitalize()}' (job name: {job.name})") - max_wait_seconds = 7200 - elapsed = 0 - while True: - time.sleep(5) - elapsed += 5 - status = client.evaluation.metric_jobs.retrieve(job.name) - if status.status in ("completed", "success", "error", "failed", "cancelled"): - break - if elapsed >= max_wait_seconds: - raise TimeoutError(f"Job {job.name} did not complete within {max_wait_seconds}s") - if status.status in ("error", "failed", "cancelled"): - raise RuntimeError( - f"Job {job.name} for model '{model_spec['name']}' on complexity '{complexity}' ended with status: {status.status}" - ) - print(f" ✔️ Job done for model '{model_spec['name']}' on complexity '{complexity.capitalize()}'") - - # Fetch results and extract accuracy - aggregate_scores = client.evaluation.metric_jobs.results.aggregate_scores.download( - job=job.name, - ) - if not aggregate_scores.scores: - raise ValueError(f"No scores returned for job {job.name}") - accuracy_value = next( - (s.mean for s in aggregate_scores.scores if s.name in ("string-check", "accuracy")), - aggregate_scores.scores[0].mean, - ) - return accuracy_value - -results_dict = {model_spec['name']: {} for model_spec in MODEL_SPECS} - -print("Starting evaluation jobs (per model, per complexity)...") -for complexity in COMPLEXITIES: - for spec in MODEL_SPECS: - accuracy = run_evaluation( - client, - EVALUATOR_CONFIG, - spec, - complexity, - files_url_dict, - secret_key_name=SECRET_NAME if PROVIDER_API_KEY else None, - ) - results_dict[spec['name']][complexity.capitalize()] = 100 * accuracy # Store as percentage - print(f" --> DONE: {spec['name']}, {complexity.capitalize()} (Accuracy: {100*accuracy:.2f}%)\n") - -{% endraw %} -``` - -```python -df_results = pd.DataFrame(results_dict).T -df_results = df_results[[c.capitalize() for c in COMPLEXITIES]] - -print("\nModel Accuracy Table (%):") -display(df_results.style.format("{:.2f}")) -``` - -## Next Steps - -- Explore more [Evaluator metrics](../evaluator/metrics/index.md) for additional evaluation scenarios. -- Learn more [Data Designer workflows](../data-designer/tutorials/index.md) for advanced synthetic generation pipelines. -- Review [Files service management](../get-started/concepts/manage-files.md) for organizing datasets and artifacts. diff --git a/docs/pysdk/client/index.mdx b/docs/pysdk/client/index.mdx index 404403934c..f661a4816d 100644 --- a/docs/pysdk/client/index.mdx +++ b/docs/pysdk/client/index.mdx @@ -1,6 +1,8 @@ -# Client APIs - -The following reference provides detailed documentation for the synchronous and asynchronous clients of the {{platform_name}} Python SDK. +--- +title: "Client APIs" +description: "" +--- +The following reference provides detailed documentation for the synchronous and asynchronous clients of the NeMo Platform Python SDK. ## Synchronous Client @@ -39,7 +41,7 @@ NeMoPlatform( | Parameter | Description | | --- | --- | | `workspace` | Workspace name used by workspace-scoped routes. You can set it on the client or pass it to individual methods that accept a workspace argument. | -| `base_url` | Base URL for the {{platform_name}} API. If omitted, the client reads it from the active CLI context. | +| `base_url` | Base URL for the NeMo Platform API. If omitted, the client reads it from the active CLI context. | | `inference_base_url` | Optional override for inference gateway requests. Defaults to `base_url`. | | `config_path` | Path to the CLI config file. Defaults to `~/.config/nmp/config.yaml`. | | `context_name` | Named CLI context to read from the config file. | @@ -145,8 +147,8 @@ for SDK-level errors. ## Client Attributes -The {{platform_name}} clients provide access to API resources through the following attributes. -For endpoint details, see the [REST API Reference](../../api/index.md). +The NeMo Platform clients provide access to API resources through the following attributes. +For endpoint details, see the [REST API Reference](/reference/api-reference). ### Organization diff --git a/docs/pysdk/index.mdx b/docs/pysdk/index.mdx index 27eddca53a..5bfaf866a4 100644 --- a/docs/pysdk/index.mdx +++ b/docs/pysdk/index.mdx @@ -1,25 +1,32 @@ +--- +title: "Overview" +description: "" +--- -# {{platform_name}} Python SDK Reference +# NeMo Platform Python SDK Reference -The [{{platform_name}} Python SDK](https://pypi.org/project/nemo-platform/) is a library for building and deploying AI models, abstracting the underlying infrastructure and providing a high-level interface for the {{platform_name}} APIs. +The [NeMo Platform Python SDK](https://pypi.org/project/nemo-platform/) is a library for building and deploying AI models, abstracting the underlying infrastructure and providing a high-level interface for the NeMo Platform APIs. ## Installation -Install the {{platform_name}} Python SDK using `pip`: +Install the NeMo Platform Python SDK using `pip`: ```bash pip install nemo-platform[all] ``` -!!! note "This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use." - If you previously installed the `nemo-microservices` package, uninstall it first to avoid conflicts: + +This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. +If you previously installed the `nemo-microservices` package, uninstall it first to avoid conflicts: - ```bash - pip uninstall nemo-microservices - ``` ---8<-- "sdk/python/overrides/nemo-platform/README/03_usage.md" +```bash +pip uninstall nemo-microservices +``` + + +`--8<-- "sdk/python/overrides/nemo-platform/README/03_usage.md"` ## Next Steps -- Read more about connecting with the [client APIs](./client/index.md). -- Browse the [REST API Reference](../api/index.md). +- Read more about connecting with the [client APIs](/reference/python-sdk/client-apis). +- Browse the [REST API Reference](/reference/api-reference). diff --git a/docs/requirements-mkdocs.txt b/docs/requirements-mkdocs.txt deleted file mode 100644 index ddbfa947e5..0000000000 --- a/docs/requirements-mkdocs.txt +++ /dev/null @@ -1,41 +0,0 @@ -# MkDocs and core theme -mkdocs>=1.6.0 -mkdocs-material>=9.5.0 - -# Navigation -mkdocs-awesome-pages-plugin>=2.9.0 - -# Python API reference -mkdocstrings[python]>=0.25.0 -mkdocs-autorefs>=1.0.0 - -# Jupyter notebook rendering -mkdocs-jupyter>=0.24.0 -jupyter>=1.0.0 -nbconvert>=7.0.0 - -# Macros / substitutions -mkdocs-macros-plugin>=1.0.0 - -# Redirects -mkdocs-redirects>=1.2.0 - -# Versioning (replaces pydata version switcher) -mike>=2.0.0 - -# OpenAPI / Swagger -mkdocs-swagger-ui-tag>=0.6.0 - -# Live reload -watchdog>=4.0.0 - -# Code block formatting -ruff==0.15.7 - -# SDK dependencies (needed for mkdocstrings to import the SDK) -httpx>=0.23.0 -pydantic>=1.9.0,<3 -typing-extensions>=4.10 -anyio>=3.5.0 -distro>=1.7.0 -sniffio diff --git a/docs/requirements.mdx b/docs/requirements.mdx index f12d2c1c83..68778116c2 100644 --- a/docs/requirements.mdx +++ b/docs/requirements.mdx @@ -1,16 +1,18 @@ -# Hardware and Software Requirements for {{platform_name}} +--- +title: "System Requirements" +description: "" +--- +This page lists the requirements for the OSS local-install path for NeMo Platform 0.1.0. For the full compatibility table, see the [Support Matrix](/reference/support-matrix). -This page lists the requirements for the OSS local-install path for {{platform_name}} {{ release }}. For the full compatibility table, see the [Support Matrix](support-matrix.md). - -The OSS {{ release }} documentation is scoped to local setup with the Python package and `nemo setup`. Docker Compose, Helm, Kubernetes, and OpenShift deployment guides are not part of this release scope. +The OSS 0.1.0 documentation is scoped to local setup with the Python package and `nemo setup`. Docker Compose, Helm, Kubernetes, and OpenShift deployment guides are not part of this release scope. ## Local Setup Requirements | Component | Requirement | Notes | |-----------|-------------|-------| -| Python | 3.11, 3.12, or 3.13 (`>=3.11,<3.14`) | Use an isolated virtual environment. Python 3.14 and later are not part of the OSS local-install support range. | +| Python | 3.11, 3.12, or 3.13 (`>=3.11,<3.14`) | Use an isolated virtual environment. Python 3.14 and later are not part of the OSS local-install support range. | | Package installer | `uv` recommended; `pip` supported | The one-line installer sets up `uv` automatically when needed. | -| Operating system | Recent Linux or macOS release | See the [Support Matrix](support-matrix.md) for the supported OS list. | +| Operating system | Recent Linux or macOS release | See the [Support Matrix](/reference/support-matrix) for the supported OS list. | | Memory | 8 GB RAM minimum | 16 GB or more is recommended for larger local workflows. | | Disk space | 16 GB free disk space minimum | Additional space is needed for datasets, job outputs, and local model artifacts. | | Network | Outbound HTTPS access | Required for package installation and hosted model-provider APIs. | @@ -36,4 +38,4 @@ nemo --help curl -s http://localhost:8080/health/ready ``` -For setup instructions, see [Setup](get-started/setup.md). +For setup instructions, see [Setup](/get-started/setup). diff --git a/docs/run-inference/about.mdx b/docs/run-inference/about.mdx index 45332831c0..ea4c6c6a05 100644 --- a/docs/run-inference/about.mdx +++ b/docs/run-inference/about.mdx @@ -1,9 +1,13 @@ - - +--- +title: "About" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: skip-test */} # About Models and Inference -The {{platform_name}} provides APIs for registering external model providers and routing inference requests through a unified gateway. +The NeMo Platform provides APIs for registering external model providers and routing inference requests through a unified gateway. ```mermaid flowchart LR @@ -30,7 +34,7 @@ Models service manages model entities and model providers. Model providers connect the platform to external inference APIs such as [NVIDIA Build](https://build.nvidia.com/) or OpenAI. The workflow is: -1. **Store the API key** as a [secret](../get-started/concepts/manage-secrets.md) in the platform +1. **Store the API key** as a [secret](/get-started/core-concepts/manage-secrets) in the platform 2. **Create a model provider** pointing to the external API with the secret reference 3. **Route inference** through the gateway using provider or model entity routing @@ -42,7 +46,15 @@ flowchart LR GW --> Client ``` ---8<-- "_snippets/nvidia-build-model-provider.md" + +The platform pre-configures a `system/nvidia-build` model provider during startup. +This provider routes inference requests to models hosted on `build.nvidia.com` using the API base URL `https://integrate.api.nvidia.com` +and the NGC API key with `Public API Endpoints` permissions provided during deployment (automatically saved as the built-in `system/ngc-api-key` secret). + +You can verify this provider exists by running `nemo inference providers list --workspace system`. + +The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead. + @@ -57,138 +69,138 @@ The example below demonstrates how to recreate it in your own workspace. For disambiguation purposes, this example names the manually-created version `my-nvidia-build`. -=== "CLI" - - ```bash - # Store API key - echo "$NVIDIA_API_KEY" | nemo secrets create "nvidia-api-key" --from-file - - - # Create provider - nemo inference providers create "my-nvidia-build" \ - --host-url "https://integrate.api.nvidia.com" \ - --api-key-secret-name "nvidia-api-key" - - nemo wait inference provider my-nvidia-build - - # Test using interactive chat - nemo chat nvidia/llama-3.3-nemotron-super-49b-v1 'Hello!' \ - --provider my-nvidia-build - ``` - -=== "Python SDK" - - ```python - # Store API key - client.secrets.create(name="nvidia-api-key", data=os.environ["NVIDIA_API_KEY"]) - - # Create provider - provider = client.inference.providers.create( - name="my-nvidia-build", - host_url="https://integrate.api.nvidia.com", - api_key_secret_name="nvidia-api-key", - ) + + +```bash +# Store API key +echo "$NVIDIA_API_KEY" | nemo secrets create "nvidia-api-key" --from-file - - client.models.wait_for_provider("my-nvidia-build") +# Create provider +nemo inference providers create "my-nvidia-build" \ + --host-url "https://integrate.api.nvidia.com" \ + --api-key-secret-name "nvidia-api-key" - # Use provider routing - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="my-nvidia-build", - body={ - "model": "meta/llama-3.1-8b-instruct", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` +nemo wait inference provider my-nvidia-build +# Test using interactive chat +nemo chat nvidia/llama-3.3-nemotron-super-49b-v1 'Hello!' \ + --provider my-nvidia-build +``` + + +```python +# Store API key +client.secrets.create(name="nvidia-api-key", data=os.environ["NVIDIA_API_KEY"]) + +# Create provider +provider = client.inference.providers.create( + name="my-nvidia-build", + host_url="https://integrate.api.nvidia.com", + api_key_secret_name="nvidia-api-key", +) + +client.models.wait_for_provider("my-nvidia-build") + +# Use provider routing +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="my-nvidia-build", + body={ + "model": "meta/llama-3.1-8b-instruct", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + #### OpenAI -=== "CLI" - - ```bash - # Store API key - echo "$OPENAI_API_KEY" | nemo secrets create "openai-api-key" --from-file - - - # Create provider with enabled models - nemo inference providers create "openai" \ - --host-url "https://api.openai.com/v1" \ - --api-key-secret-name "openai-api-key" \ - --enabled-models "gpt-4" \ - --enabled-models "gpt-3.5-turbo" - - nemo wait inference provider openai - - # Test using interactive chat - nemo chat gpt-4 'Hello!' \ - --provider openai - ``` - -=== "Python SDK" - - ```python - client.secrets.create(name="openai-api-key", data=os.environ["OPENAI_API_KEY"]) - - provider = client.inference.providers.create( - name="openai", - host_url="https://api.openai.com/v1", - api_key_secret_name="openai-api-key", - enabled_models=["gpt-4", "gpt-3.5-turbo"], - ) + + +```bash +# Store API key +echo "$OPENAI_API_KEY" | nemo secrets create "openai-api-key" --from-file - - client.models.wait_for_provider("openai") +# Create provider with enabled models +nemo inference providers create "openai" \ + --host-url "https://api.openai.com/v1" \ + --api-key-secret-name "openai-api-key" \ + --enabled-models "gpt-4" \ + --enabled-models "gpt-3.5-turbo" - # Use provider routing - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="openai", - body={ - "model": "gpt-4", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` +nemo wait inference provider openai +# Test using interactive chat +nemo chat gpt-4 'Hello!' \ + --provider openai +``` + + +```python +client.secrets.create(name="openai-api-key", data=os.environ["OPENAI_API_KEY"]) + +provider = client.inference.providers.create( + name="openai", + host_url="https://api.openai.com/v1", + api_key_secret_name="openai-api-key", + enabled_models=["gpt-4", "gpt-3.5-turbo"], +) + +client.models.wait_for_provider("openai") + +# Use provider routing +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="openai", + body={ + "model": "gpt-4", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + #### Anthropic Anthropic's `/v1/messages` API expects the API key in an `X-Api-Key:` header (not `Authorization: Bearer`) and requires an `anthropic-version` header on every request. Use `--auth-header-format` (Jinja2 template, must contain exactly one `{{ auth_secret }}` variable) to override the default `Authorization: Bearer {{ auth_secret }}` and pass the API-version pin via `--default-extra-headers`. Without these, Anthropic rejects every request with 401. -=== "CLI" - - ```bash - # Store API key - echo "$ANTHROPIC_API_KEY" | nemo secrets create "anthropic-api-key" --from-file - - - # Create provider — override the default Bearer header and pin the API version - nemo inference providers create "anthropic" \ - --host-url "https://api.anthropic.com" \ - --api-key-secret-name "anthropic-api-key" \ - --auth-header-format "X-Api-Key: {{ auth_secret }}" \ - --default-extra-headers '{"anthropic-version": "2023-06-01"}' - - nemo wait inference provider anthropic - ``` - -=== "Python SDK" - - ```python - # Store API key - client.secrets.create(name="anthropic-api-key", data=os.environ["ANTHROPIC_API_KEY"]) - - # Create provider — override the default Bearer header and pin the API version - provider = client.inference.providers.create( - name="anthropic", - host_url="https://api.anthropic.com", - api_key_secret_name="anthropic-api-key", - auth_header_format="X-Api-Key: {{ auth_secret }}", - default_extra_headers={"anthropic-version": "2023-06-01"}, - ) + + +```bash +# Store API key +echo "$ANTHROPIC_API_KEY" | nemo secrets create "anthropic-api-key" --from-file - - client.models.wait_for_provider("anthropic") - ``` +# Create provider — override the default Bearer header and pin the API version +nemo inference providers create "anthropic" \ + --host-url "https://api.anthropic.com" \ + --api-key-secret-name "anthropic-api-key" \ + --auth-header-format "X-Api-Key: {{ auth_secret }}" \ + --default-extra-headers '{"anthropic-version": "2023-06-01"}' +nemo wait inference provider anthropic +``` + + +```python +# Store API key +client.secrets.create(name="anthropic-api-key", data=os.environ["ANTHROPIC_API_KEY"]) + +# Create provider — override the default Bearer header and pin the API version +provider = client.inference.providers.create( + name="anthropic", + host_url="https://api.anthropic.com", + api_key_secret_name="anthropic-api-key", + auth_header_format="X-Api-Key: {{ auth_secret }}", + default_extra_headers={"anthropic-version": "2023-06-01"}, +) + +client.models.wait_for_provider("anthropic") +``` + + `{{ auth_secret }}` is substituted with the resolved secret value at request time. --- @@ -221,15 +233,33 @@ All patterns use `/-/` as a separator. Everything after `/-/` is forwarded to th ``` Use `nemo inference get-url` to print the correct base URL for your workspace -without hand-assembling the path. Add `--provider ` or -`--virtual-model ` to get the URL for the corresponding proxy route. +without hand-assembling the path. Add `--provider <name>` or +`--virtual-model <name>` to get the URL for the corresponding proxy route. ### SDK Helper Methods Set up the CLI or Python SDK first: ---8<-- "_snippets/tutorials/cli-sdk-setup.md" + + +```bash +# Configure CLI (if not already done) +nemo config set --base-url "$NMP_BASE_URL" --workspace default +``` + + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + + The SDK provides convenience methods for OpenAI compatibility: ```python @@ -277,4 +307,4 @@ working as designed — use `/health/ready` instead. ## API Reference -For complete API details, refer to the [Inference Gateway API Reference](../api/index.md#tag-inference-gateway) and [SDK Reference](../pysdk/index.md). +For complete API details, refer to the [Inference Gateway API Reference](/reference/api-reference#tag-inference-gateway) and [SDK Reference](/reference/python-sdk/overview). diff --git a/docs/run-inference/tutorials/deploy-models.mdx b/docs/run-inference/tutorials/deploy-models.mdx index ba48667081..8b72c03de9 100644 --- a/docs/run-inference/tutorials/deploy-models.mdx +++ b/docs/run-inference/tutorials/deploy-models.mdx @@ -1,15 +1,39 @@ - - +--- +title: "Deploy Models" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: download split */} # Deploy Models Deploy models from NGC or HuggingFace. Register external providers like OpenAI or NVIDIA Build. -!!! note "Resource names for deployments, deployment configs, and providers must contain only letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. For example: `llama-3.1-8b`, `my-custom-model`, `qwen-fs-config`." + +Resource names for deployments, deployment configs, and providers must contain only letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. For example: `llama-3.1-8b`, `my-custom-model`, `qwen-fs-config`. + ---8<-- "_snippets/tutorials/cli-sdk-setup.md" + + +```bash +# Configure CLI (if not already done) +nemo config set --base-url "$NMP_BASE_URL" --workspace default +``` + + +```python +import os +from nemo_platform import NeMoPlatform + +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + + --- ## Add External Providers @@ -23,138 +47,138 @@ The example below demonstrates how to recreate it in your own workspace. For disambiguation purposes, this example names the manually-created version `my-nvidia-build`. -=== "CLI" - - ```bash - # Store API key - echo "$NVIDIA_API_KEY" | nemo secrets create "nvidia-api-key" --from-file - - - # Create provider - nemo inference providers create "my-nvidia-build" \ - --host-url "https://integrate.api.nvidia.com" \ - --api-key-secret-name "nvidia-api-key" - - nemo wait inference provider my-nvidia-build + + +```bash +# Store API key +echo "$NVIDIA_API_KEY" | nemo secrets create "nvidia-api-key" --from-file - - # Test using interactive chat - nemo chat nvidia/llama-3.3-nemotron-super-49b-v1 'Hello!' \ - --provider my-nvidia-build - ``` +# Create provider +nemo inference providers create "my-nvidia-build" \ +--host-url "https://integrate.api.nvidia.com" \ +--api-key-secret-name "nvidia-api-key" -=== "Python SDK" - - ```python - # Store API key - client.secrets.create(name="nvidia-api-key", value=os.environ["NVIDIA_API_KEY"]) - - # Create provider - provider = client.inference.providers.create( - name="my-nvidia-build", - host_url="https://integrate.api.nvidia.com", - api_key_secret_name="nvidia-api-key", - ) - - client.models.wait_for_provider("my-nvidia-build") - - # Use provider routing - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="my-nvidia-build", - body={ - "model": "meta/llama-3.1-8b-instruct", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` +nemo wait inference provider my-nvidia-build +# Test using interactive chat +nemo chat nvidia/llama-3.3-nemotron-super-49b-v1 'Hello!' \ +--provider my-nvidia-build +``` + + +```python +# Store API key +client.secrets.create(name="nvidia-api-key", value=os.environ["NVIDIA_API_KEY"]) + +# Create provider +provider = client.inference.providers.create( + name="my-nvidia-build", + host_url="https://integrate.api.nvidia.com", + api_key_secret_name="nvidia-api-key", +) + +client.models.wait_for_provider("my-nvidia-build") + +# Use provider routing +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="my-nvidia-build", + body={ + "model": "meta/llama-3.1-8b-instruct", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + ### OpenAI -=== "CLI" - - ```bash - # Store API key - echo "$OPENAI_API_KEY" | nemo secrets create "openai-api-key" --from-file - - - # Create provider with enabled models - nemo inference providers create "openai" \ - --host-url "https://api.openai.com/v1" \ - --api-key-secret-name "openai-api-key" \ - --enabled-models "gpt-4" \ - --enabled-models "gpt-3.5-turbo" - - nemo wait inference provider openai - - # Test using interactive chat - nemo chat gpt-4 'Hello!' \ - --provider openai - ``` - -=== "Python SDK" - - ```python - client.secrets.create(name="openai-api-key", value=os.environ["OPENAI_API_KEY"]) + + +```bash +# Store API key +echo "$OPENAI_API_KEY" | nemo secrets create "openai-api-key" --from-file - - provider = client.inference.providers.create( - name="openai", - host_url="https://api.openai.com/v1", - api_key_secret_name="openai-api-key", - enabled_models=["gpt-4", "gpt-3.5-turbo"], - ) +# Create provider with enabled models +nemo inference providers create "openai" \ +--host-url "https://api.openai.com/v1" \ +--api-key-secret-name "openai-api-key" \ +--enabled-models "gpt-4" \ +--enabled-models "gpt-3.5-turbo" - client.models.wait_for_provider("openai") - - # Use provider routing - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="openai", - body={ - "model": "gpt-4", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` +nemo wait inference provider openai +# Test using interactive chat +nemo chat gpt-4 'Hello!' \ +--provider openai +``` + + +```python +client.secrets.create(name="openai-api-key", value=os.environ["OPENAI_API_KEY"]) + +provider = client.inference.providers.create( + name="openai", + host_url="https://api.openai.com/v1", + api_key_secret_name="openai-api-key", + enabled_models=["gpt-4", "gpt-3.5-turbo"], +) + +client.models.wait_for_provider("openai") + +# Use provider routing +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="openai", + body={ + "model": "gpt-4", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + ### Anthropic Anthropic's `/v1/messages` API expects the API key in an `X-Api-Key:` header (not `Authorization: Bearer`) and requires an `anthropic-version` header on every request. Use `--auth-header-format` (Jinja2 template, must contain exactly one `{{ auth_secret }}` variable) to override the default `Authorization: Bearer {{ auth_secret }}` and pass the API-version pin via `--default-extra-headers`. Without these, Anthropic rejects every request with 401. -=== "CLI" - - ```bash - # Store API key - echo "$ANTHROPIC_API_KEY" | nemo secrets create "anthropic-api-key" --from-file - - - # Create provider — override the default Bearer header and pin the API version - nemo inference providers create "anthropic" \ - --host-url "https://api.anthropic.com" \ - --api-key-secret-name "anthropic-api-key" \ - --auth-header-format "X-Api-Key: {{ auth_secret }}" \ - --default-extra-headers '{"anthropic-version": "2023-06-01"}' + + +```bash +# Store API key +echo "$ANTHROPIC_API_KEY" | nemo secrets create "anthropic-api-key" --from-file - - nemo wait inference provider anthropic - ``` - -=== "Python SDK" - - ```python - # Store API key - client.secrets.create(name="anthropic-api-key", data=os.environ["ANTHROPIC_API_KEY"]) - - # Create provider — override the default Bearer header and pin the API version - provider = client.inference.providers.create( - name="anthropic", - host_url="https://api.anthropic.com", - api_key_secret_name="anthropic-api-key", - auth_header_format="X-Api-Key: {{ auth_secret }}", - default_extra_headers={"anthropic-version": "2023-06-01"}, - ) - - client.models.wait_for_provider("anthropic") - ``` +# Create provider — override the default Bearer header and pin the API version +nemo inference providers create "anthropic" \ +--host-url "https://api.anthropic.com" \ +--api-key-secret-name "anthropic-api-key" \ +--auth-header-format "X-Api-Key: {{ auth_secret }}" \ +--default-extra-headers '{"anthropic-version": "2023-06-01"}' +nemo wait inference provider anthropic +``` + + +```python +# Store API key +client.secrets.create(name="anthropic-api-key", data=os.environ["ANTHROPIC_API_KEY"]) + +# Create provider — override the default Bearer header and pin the API version +provider = client.inference.providers.create( + name="anthropic", + host_url="https://api.anthropic.com", + api_key_secret_name="anthropic-api-key", + auth_header_format="X-Api-Key: {{ auth_secret }}", + default_extra_headers={"anthropic-version": "2023-06-01"}, +) + +client.models.wait_for_provider("anthropic") +``` + + `{{ auth_secret }}` is substituted with the resolved secret value at request time. --- @@ -166,218 +190,220 @@ Deploy pre-built NIM containers from NGC. ### Deploy Llama 3.2 1B -=== "CLI" - - ```bash - nemo inference deployment-configs create \ - --name "llama-3-2-1b-config" \ - --nim-deployment '{ - "gpu": 1, - "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct", - "image_tag": "1.8.6", - "model_name": "meta/llama-3.2-1b-instruct" - }' + + +```bash +nemo inference deployment-configs create \ +--name "llama-3-2-1b-config" \ +--nim-deployment '{ +"gpu": 1, +"image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct", +"image_tag": "1.8.6", +"model_name": "meta/llama-3.2-1b-instruct" +}' - nemo inference deployments create \ - --name "llama-3-2-1b-deployment" \ - --config "llama-3-2-1b-config" +nemo inference deployments create \ +--name "llama-3-2-1b-deployment" \ +--config "llama-3-2-1b-config" - nemo wait inference deployment llama-3-2-1b-deployment - - nemo chat meta/llama-3.2-1b-instruct 'Hello!' \ - --provider llama-3-2-1b-deployment \ - --max-tokens 100 - ``` - -=== "Python SDK" - - ```python - config = client.inference.deployment_configs.create( - name="llama-3-2-1b-config", - nim_deployment={ - "gpu": 1, - "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct", - "image_tag": "1.8.6", - "model_name": "meta/llama-3.2-1b-instruct", - }, - ) - - deployment = client.inference.deployments.create( - name="llama-3-2-1b-deployment", config="llama-3-2-1b-config" - ) - - client.models.wait_for_status( - deployment_name="llama-3-2-1b-deployment", desired_status="READY" - ) - - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="llama-3-2-1b-deployment", - body={ - "model": "meta/llama-3.2-1b-instruct", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` +nemo wait inference deployment llama-3-2-1b-deployment +nemo chat meta/llama-3.2-1b-instruct 'Hello!' \ +--provider llama-3-2-1b-deployment \ +--max-tokens 100 +``` + + +```python +config = client.inference.deployment_configs.create( + name="llama-3-2-1b-config", + nim_deployment={ + "gpu": 1, + "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct", + "image_tag": "1.8.6", + "model_name": "meta/llama-3.2-1b-instruct", + }, +) + +deployment = client.inference.deployments.create( + name="llama-3-2-1b-deployment", config="llama-3-2-1b-config" +) + +client.models.wait_for_status( + deployment_name="llama-3-2-1b-deployment", desired_status="READY" +) + +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="llama-3-2-1b-deployment", + body={ + "model": "meta/llama-3.2-1b-instruct", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + ### Deploy NeMo Guard Jailbreak Detection Deploy classification NIMs like NeMoGuard for content safety. Uses the `/v1/classify` endpoint instead of chat completions. -=== "CLI" - - ```bash - nemo inference deployment-configs create \ - --name "nemoguard-jailbreak-config" \ - --nim-deployment '{ - "gpu": 1, - "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect", - "image_tag": "1.10.1" - }' - - nemo inference deployments create \ - --name "nemoguard-jailbreak-deployment" \ - --config "nemoguard-jailbreak-config" - - nemo wait inference deployment nemoguard-jailbreak-deployment + + +```bash +nemo inference deployment-configs create \ +--name "nemoguard-jailbreak-config" \ +--nim-deployment '{ +"gpu": 1, +"image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect", +"image_tag": "1.10.1" +}' - nemo inference gateway provider post v1/classify \ - --name "nemoguard-jailbreak-deployment" \ - --body '{"input": "Tell me about vacation spots in Hawaii."}' - ``` - -=== "Python SDK" - - ```python - config = client.inference.deployment_configs.create( - name="nemoguard-jailbreak-config", - nim_deployment={ - "gpu": 1, - "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect", - "image_tag": "1.10.1", - }, - ) +nemo inference deployments create \ +--name "nemoguard-jailbreak-deployment" \ +--config "nemoguard-jailbreak-config" - deployment = client.inference.deployments.create( - name="nemoguard-jailbreak-deployment", config="nemoguard-jailbreak-config" - ) - - client.models.wait_for_status( - deployment_name="nemoguard-jailbreak-deployment", desired_status="READY" - ) - - response = client.inference.gateway.provider.post( - "v1/classify", - name="nemoguard-jailbreak-deployment", - body={"input": "Tell me about vacation spots in Hawaii."}, - ) - ``` +nemo wait inference deployment nemoguard-jailbreak-deployment +nemo inference gateway provider post v1/classify \ +--name "nemoguard-jailbreak-deployment" \ +--body '{"input": "Tell me about vacation spots in Hawaii."}' +``` + + +```python +config = client.inference.deployment_configs.create( + name="nemoguard-jailbreak-config", + nim_deployment={ + "gpu": 1, + "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect", + "image_tag": "1.10.1", + }, +) + +deployment = client.inference.deployments.create( + name="nemoguard-jailbreak-deployment", config="nemoguard-jailbreak-config" +) + +client.models.wait_for_status( + deployment_name="nemoguard-jailbreak-deployment", desired_status="READY" +) + +response = client.inference.gateway.provider.post( + "v1/classify", + name="nemoguard-jailbreak-deployment", + body={"input": "Tell me about vacation spots in Hawaii."}, +) +``` + + --- ## Deploy from HuggingFace -!!! warning "HuggingFace deployments use the Multi-LLM NIM (`nvcr.io/nim/nvidia/llm-nim:1.13.1`) by default, which only supports specific model architectures. Check the [supported architectures list](https://docs.nvidia.com/nim/large-language-models/1.13.0/supported-architectures.html) before deploying. If your model architecture is not listed, you will need a model-specific NIM image — see [Deploy from NGC](#deploy-from-ngc) for that approach." + +HuggingFace deployments use the Multi-LLM NIM (`nvcr.io/nim/nvidia/llm-nim:1.13.1`) by default, which only supports specific model architectures. Check the [supported architectures list](https://docs.nvidia.com/nim/large-language-models/1.13.0/supported-architectures.html) before deploying. If your model architecture is not listed, you will need a model-specific NIM image — see [Deploy from NGC](#deploy-from-ngc) for that approach. + You can register a HuggingFace model through the Files service. This creates a fileset that acts as a proxy. The Files service handles authentication and caches the weights on first download, so subsequent deployments start faster. -=== "CLI" - - ```bash - # (Optional) Create a HuggingFace token secret for private models. - # Public models like Qwen do not require a token. - echo "$HF_TOKEN" | nemo secrets create "hf-token-secret" --from-file - - - # Create a fileset pointing to the HuggingFace model. - # "token_secret" is optional — only needed for private/gated models. - nemo files filesets create "qwen-2-5-1-5b" \ - --storage '{ - "type": "huggingface", - "repo_id": "Qwen/Qwen2.5-1.5B-Instruct", - "repo_type": "model", - "token_secret": "hf-token-secret" - }' - - # Register a model entity referencing the fileset - nemo models create "qwen-2-5-1-5b" \ - --fileset "default/qwen-2-5-1-5b" - - # Create deployment config pointing to the model entity - nemo inference deployment-configs create "qwen-fs-config" \ - --nim-deployment '{ - "model_namespace": "default", - "model_name": "qwen-2-5-1-5b", - "gpu": 1 - }' - - nemo inference deployments create "qwen-fs-deployment" \ - --config "qwen-fs-config" - - nemo wait inference deployment qwen-fs-deployment - - nemo chat default/qwen-2-5-1-5b 'Hello!' \ - --provider qwen-fs-deployment \ - --max-tokens 100 - ``` - -=== "Python SDK" - - ```python - # (Optional) Create a HuggingFace token secret for private models. - # Public models like Qwen do not require a token. - client.secrets.create(name="hf-token-secret", value=os.environ["HF_TOKEN"]) - - # Create a fileset pointing to the HuggingFace model. - # "token_secret" is optional — only needed for private/gated models. - client.files.filesets.create( - name="qwen-2-5-1-5b", - storage={ - "type": "huggingface", - "repo_id": "Qwen/Qwen2.5-1.5B-Instruct", - "repo_type": "model", - "token_secret": "hf-token-secret", - }, - ) - - # Register a model entity referencing the fileset - client.models.create(name="qwen-2-5-1-5b", fileset="default/qwen-2-5-1-5b") - - # Create deployment config pointing to the model entity - config = client.inference.deployment_configs.create( - name="qwen-fs-config", - nim_deployment={ - "model_namespace": "default", - "model_name": "qwen-2-5-1-5b", - "gpu": 1, - }, - ) - - # Deploy — no hf_token_secret_name needed - deployment = client.inference.deployments.create( - name="qwen-fs-deployment", config="qwen-fs-config" - ) - - client.models.wait_for_status( - deployment_name="qwen-fs-deployment", desired_status="READY" - ) - - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="qwen-fs-deployment", - body={ - "model": "default/qwen-2-5-1-5b", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` - -!!! tip - - The `fileset` format is `/`. This tells the deployment system to pull weights from the Files service, which proxies the download from HuggingFace using the fileset `token_secret`. For public models like Qwen, the `token_secret` on the fileset is optional. + + +```bash +# (Optional) Create a HuggingFace token secret for private models. +# Public models like Qwen do not require a token. +echo "$HF_TOKEN" | nemo secrets create "hf-token-secret" --from-file - + +# Create a fileset pointing to the HuggingFace model. +# "token_secret" is optional — only needed for private/gated models. +nemo files filesets create "qwen-2-5-1-5b" \ +--storage '{ +"type": "huggingface", +"repo_id": "Qwen/Qwen2.5-1.5B-Instruct", +"repo_type": "model", +"token_secret": "hf-token-secret" +}' + +# Register a model entity referencing the fileset +nemo models create "qwen-2-5-1-5b" \ +--fileset "default/qwen-2-5-1-5b" + +# Create deployment config pointing to the model entity +nemo inference deployment-configs create "qwen-fs-config" \ +--nim-deployment '{ +"model_namespace": "default", +"model_name": "qwen-2-5-1-5b", +"gpu": 1 +}' + +nemo inference deployments create "qwen-fs-deployment" \ +--config "qwen-fs-config" + +nemo wait inference deployment qwen-fs-deployment + +nemo chat default/qwen-2-5-1-5b 'Hello!' \ +--provider qwen-fs-deployment \ +--max-tokens 100 +``` + + +```python +# (Optional) Create a HuggingFace token secret for private models. +# Public models like Qwen do not require a token. +client.secrets.create(name="hf-token-secret", value=os.environ["HF_TOKEN"]) + +# Create a fileset pointing to the HuggingFace model. +# "token_secret" is optional — only needed for private/gated models. +client.files.filesets.create( + name="qwen-2-5-1-5b", + storage={ + "type": "huggingface", + "repo_id": "Qwen/Qwen2.5-1.5B-Instruct", + "repo_type": "model", + "token_secret": "hf-token-secret", + }, +) + +# Register a model entity referencing the fileset +client.models.create(name="qwen-2-5-1-5b", fileset="default/qwen-2-5-1-5b") + +# Create deployment config pointing to the model entity +config = client.inference.deployment_configs.create( + name="qwen-fs-config", + nim_deployment={ + "model_namespace": "default", + "model_name": "qwen-2-5-1-5b", + "gpu": 1, + }, +) + +# Deploy — no hf_token_secret_name needed +deployment = client.inference.deployments.create( + name="qwen-fs-deployment", config="qwen-fs-config" +) + +client.models.wait_for_status( + deployment_name="qwen-fs-deployment", desired_status="READY" +) + +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="qwen-fs-deployment", + body={ + "model": "default/qwen-2-5-1-5b", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + + +The `fileset` format is `<workspace>/<fileset-name>`. This tells the deployment system to pull weights from the Files service, which proxies the download from HuggingFace using the fileset `token_secret`. For public models like Qwen, the `token_secret` on the fileset is optional. + --- @@ -385,34 +411,34 @@ You can register a HuggingFace model through the Files service. This creates a f ## Deployment Cleanup -=== "CLI" - - ```bash - # Note: Deleting the deployment will free up its GPU(s) when complete - nemo inference deployments delete - nemo wait inference deployment --status DELETED - nemo inference deployment-configs delete - - # For external providers - nemo inference providers delete - nemo secrets delete - ``` - -=== "Python SDK" - - ```python - # Note: Deleting the deployment will free up its GPU(s) when complete - client.inference.deployments.delete(name="") - client.models.wait_for_status( - deployment_name="", desired_status="DELETED" - ) - client.inference.deployment_configs.delete(name="") - - # For external providers - client.inference.providers.delete(name="") - client.secrets.delete(name="") - ``` + + +```bash +# Note: Deleting the deployment will free up its GPU(s) when complete +nemo inference deployments delete +nemo wait inference deployment --status DELETED +nemo inference deployment-configs delete +# For external providers +nemo inference providers delete +nemo secrets delete +``` + + +```python +# Note: Deleting the deployment will free up its GPU(s) when complete +client.inference.deployments.delete(name="") +client.models.wait_for_status( + deployment_name="", desired_status="DELETED" +) +client.inference.deployment_configs.delete(name="") + +# For external providers +client.inference.providers.delete(name="") +client.secrets.delete(name="") +``` + + --- ## Multi-GPU Deployments @@ -436,131 +462,132 @@ The multi-LLM NIM (`nvcr.io/nim/nvidia/llm-nim:1.13.1`) requires explicit parall This example deploys [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) across 2 GPUs using tensor parallelism. -=== "CLI" - - ```bash - # Create a fileset pointing to the HuggingFace model - # Qwen models are public — token_secret is optional - nemo files filesets create \ - --name "qwen-2-5-14b" \ - --storage '{ - "type": "huggingface", - "repo_id": "Qwen/Qwen2.5-14B-Instruct", - "repo_type": "model" - }' - - # Register a model entity referencing the fileset - nemo models create \ - --name "qwen-2-5-14b" \ - --fileset "default/qwen-2-5-14b" - - # Create deployment config with 2 GPUs (TP=2 by default) - nemo inference deployment-configs create \ - --name "qwen-14b-config" \ - --nim-deployment '{ - "model_name": "default/qwen-2-5-14b", - "gpu": 2 - }' - - # Deploy - nemo inference deployments create \ - --name "qwen-14b-deployment" \ - --config "qwen-14b-config" - - nemo wait inference deployment qwen-14b-deployment - - nemo chat default/qwen-2-5-14b 'Hello!' \ - --max-tokens 100 - ``` - -=== "Python SDK" - - ```python - # Create a fileset pointing to the HuggingFace model - # Qwen models are public — token_secret is optional - client.files.filesets.create( - name="qwen-2-5-14b", - storage={ - "type": "huggingface", - "repo_id": "Qwen/Qwen2.5-14B-Instruct", - "repo_type": "model", - }, - ) - - # Register a model entity referencing the fileset - client.models.create(name="qwen-2-5-14b", fileset="default/qwen-2-5-14b") - - # Create deployment config with 2 GPUs (TP=2 by default) - config = client.inference.deployment_configs.create( - name="qwen-14b-config", - nim_deployment={"model_name": "default/qwen-2-5-14b", "gpu": 2}, - ) - - # Deploy - deployment = client.inference.deployments.create( - name="qwen-14b-deployment", config="qwen-14b-config" - ) - - client.models.wait_for_status( - deployment_name="qwen-14b-deployment", desired_status="READY" - ) - - response = client.inference.gateway.model.post( - "v1/chat/completions", - name="qwen-2-5-14b", - body={ - "model": "default/qwen-2-5-14b", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - # NIM sets NIM_TENSOR_PARALLEL_SIZE=2 automatically - ``` - + + +```bash +# Create a fileset pointing to the HuggingFace model +# Qwen models are public — token_secret is optional +nemo files filesets create \ +--name "qwen-2-5-14b" \ +--storage '{ +"type": "huggingface", +"repo_id": "Qwen/Qwen2.5-14B-Instruct", +"repo_type": "model" +}' + +# Register a model entity referencing the fileset +nemo models create \ +--name "qwen-2-5-14b" \ +--fileset "default/qwen-2-5-14b" + +# Create deployment config with 2 GPUs (TP=2 by default) +nemo inference deployment-configs create \ +--name "qwen-14b-config" \ +--nim-deployment '{ +"model_name": "default/qwen-2-5-14b", +"gpu": 2 +}' + +# Deploy +nemo inference deployments create \ +--name "qwen-14b-deployment" \ +--config "qwen-14b-config" + +nemo wait inference deployment qwen-14b-deployment + +nemo chat default/qwen-2-5-14b 'Hello!' \ +--max-tokens 100 +``` + + +```python +# Create a fileset pointing to the HuggingFace model +# Qwen models are public — token_secret is optional +client.files.filesets.create( + name="qwen-2-5-14b", + storage={ + "type": "huggingface", + "repo_id": "Qwen/Qwen2.5-14B-Instruct", + "repo_type": "model", + }, +) + +# Register a model entity referencing the fileset +client.models.create(name="qwen-2-5-14b", fileset="default/qwen-2-5-14b") + +# Create deployment config with 2 GPUs (TP=2 by default) +config = client.inference.deployment_configs.create( + name="qwen-14b-config", + nim_deployment={"model_name": "default/qwen-2-5-14b", "gpu": 2}, +) + +# Deploy +deployment = client.inference.deployments.create( + name="qwen-14b-deployment", config="qwen-14b-config" +) + +client.models.wait_for_status( + deployment_name="qwen-14b-deployment", desired_status="READY" +) + +response = client.inference.gateway.model.post( + "v1/chat/completions", + name="qwen-2-5-14b", + body={ + "model": "default/qwen-2-5-14b", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +# NIM sets NIM_TENSOR_PARALLEL_SIZE=2 automatically +``` + + ### Custom Parallelism Configuration For larger models requiring more GPUs, you can configure specific TP/PP splits using `additional_envs`. The formula is: `gpu` = `NIM_TENSOR_PARALLEL_SIZE` × `NIM_PIPELINE_PARALLEL_SIZE`. -=== "CLI" - - ```bash - nemo inference deployment-configs create \ - --name "multi-gpu-custom-config" \ - --nim-deployment '{ - "model_name": "default/qwen-2-5-14b", - "gpu": 4, - "additional_envs": { - "NIM_TENSOR_PARALLEL_SIZE": "2", - "NIM_PIPELINE_PARALLEL_SIZE": "2" - } - }' - ``` - -=== "Python SDK" - - ```python - config = client.inference.deployment_configs.create( - name="multi-gpu-custom-config", - nim_deployment={ - "model_name": "default/qwen-2-5-14b", - "gpu": 4, - "additional_envs": { - "NIM_TENSOR_PARALLEL_SIZE": "2", - "NIM_PIPELINE_PARALLEL_SIZE": "2", - }, + + +```bash +nemo inference deployment-configs create \ +--name "multi-gpu-custom-config" \ +--nim-deployment '{ +"model_name": "default/qwen-2-5-14b", +"gpu": 4, +"additional_envs": { +"NIM_TENSOR_PARALLEL_SIZE": "2", +"NIM_PIPELINE_PARALLEL_SIZE": "2" +} +}' +``` + + +```python +config = client.inference.deployment_configs.create( + name="multi-gpu-custom-config", + nim_deployment={ + "model_name": "default/qwen-2-5-14b", + "gpu": 4, + "additional_envs": { + "NIM_TENSOR_PARALLEL_SIZE": "2", + "NIM_PIPELINE_PARALLEL_SIZE": "2", }, - ) - ``` - -!!! tip - **Choosing Parallelism Strategy** + }, +) +``` + + + +**Choosing Parallelism Strategy** - - **TP=8, PP=1** (default): Lowest latency, best for real-time applications - - **TP=4, PP=2**: Balanced latency and throughput - - **TP=2, PP=4**: Highest throughput, best for batch processing +- **TP=8, PP=1** (default): Lowest latency, best for real-time applications +- **TP=4, PP=2**: Balanced latency and throughput +- **TP=2, PP=4**: Highest throughput, best for batch processing - For custom models, match deployment parallelism to training parallelism for optimal performance. +For custom models, match deployment parallelism to training parallelism for optimal performance. + --- @@ -571,8 +598,9 @@ Configure custom chat templates and tool calling for NIM deployments. These sett For more information on chat templates, see the [Hugging Face chat templating guide](https://huggingface.co/docs/transformers/en/chat_templating) and the [NeMo chat templates documentation](https://docs.nvidia.com/nemo/rl/latest/design-docs/chat-datasets.html#chat-templates). -!!! warning - **Security consideration:** Chat templates are Jinja2 programs that execute on every inference call. While NIM uses a sandboxed Jinja2 environment (mitigating arbitrary code execution), a malicious or misconfigured template can still alter model behavior — for example, by injecting hidden instructions, rewriting messages, or degrading output quality. Grant chat template permissions only to trusted users, and review templates before deploying to production. See [Inference-Time Backdoors via Chat Templates (IEEE S&P 2026)](https://arxiv.org/abs/2602.04653) for further background. + +**Security consideration:** Chat templates are Jinja2 programs that execute on every inference call. While NIM uses a sandboxed Jinja2 environment (mitigating arbitrary code execution), a malicious or misconfigured template can still alter model behavior — for example, by injecting hidden instructions, rewriting messages, or degrading output quality. Grant chat template permissions only to trusted users, and review templates before deploying to production. See [Inference-Time Backdoors via Chat Templates (IEEE S&P 2026)](https://arxiv.org/abs/2602.04653) for further background. + ### Configuration Sources @@ -603,10 +631,10 @@ flowchart LR Set `chat_template` and tool calling configuration with fileset `metadata.model.tool_calling`. The platform automatically propagates these into the model entity spec when the model-spec background task runs. -=== "CLI" + + +```bash - ```bash -{% raw %} # Create fileset with chat template and tool calling config via metadata nemo files filesets create \ --name "llama-3-2-1b-tool" \ @@ -643,77 +671,77 @@ Set `chat_template` and tool calling configuration with fileset `metadata.model. --config "llama-tool-config" nemo wait inference deployment llama-tool-deployment -{% endraw %} + ``` + + +```python -=== "Python SDK" - - ```python - {% raw %} - # Create fileset with chat template and tool calling config via metadata - tool_metadata: dict[str, object] = { - "model": { - "tool_calling": { - "chat_template": ( - "{%- for message in messages %}" - "{%- set content = '<{{ start_header_id }}>' + message['role'] + '<{{ end_header_id }}>\n\n'" - " + message['content'] | trim + '<{{ eot_id }}>' %}" - "{%- if loop.index0 == 0 %}{%- set content = '<{{ begin_of_text }}>' + content %}{%- endif %}" - "{{ content }}{%- endfor %}" - "{%- if add_generation_prompt %}{{ '<{{ start_header_id }}>assistant<{{ end_header_id }}>\n\n' }}{%- endif %}" - ), - "tool_call_parser": "llama3_json", - "auto_tool_choice": True, - } +# Create fileset with chat template and tool calling config via metadata +tool_metadata: dict[str, object] = { + "model": { + "tool_calling": { + "chat_template": ( + "{%- for message in messages %}" + "{%- set content = '<{{ start_header_id }}>' + message['role'] + '<{{ end_header_id }}>\n\n'" + " + message['content'] | trim + '<{{ eot_id }}>' %}" + "{%- if loop.index0 == 0 %}{%- set content = '<{{ begin_of_text }}>' + content %}{%- endif %}" + "{{ content }}{%- endfor %}" + "{%- if add_generation_prompt %}{{ '<{{ start_header_id }}>assistant<{{ end_header_id }}>\n\n' }}{%- endif %}" + ), + "tool_call_parser": "llama3_json", + "auto_tool_choice": True, } } - - client.files.filesets.create( - name="llama-3-2-1b-tool", - storage={ - "type": "huggingface", - "repo_id": "meta-llama/Llama-3.2-1B-Instruct", - "repo_type": "model" - }, - metadata=tool_metadata, - ) - - # Register model entity referencing the fileset - client.models.create( - name="llama-3-2-1b-tool", - fileset="default/llama-3-2-1b-tool", - ) - - # Deploy — chat_template and tool_call_config are inherited from the fileset - config = client.inference.deployment_configs.create( - name="llama-tool-config", - nim_deployment={ - "model_name": "default/llama-3-2-1b-tool", - "gpu": 1 - } - ) - - deployment = client.inference.deployments.create( - name="llama-tool-deployment", - config="llama-tool-config" - ) - - client.models.wait_for_status( - deployment_name="llama-tool-deployment", - desired_status="READY" - ) - {% endraw %} - ``` +} + +client.files.filesets.create( + name="llama-3-2-1b-tool", + storage={ + "type": "huggingface", + "repo_id": "meta-llama/Llama-3.2-1B-Instruct", + "repo_type": "model" + }, + metadata=tool_metadata, +) + +# Register model entity referencing the fileset +client.models.create( + name="llama-3-2-1b-tool", + fileset="default/llama-3-2-1b-tool", +) + +# Deploy — chat_template and tool_call_config are inherited from the fileset +config = client.inference.deployment_configs.create( + name="llama-tool-config", + nim_deployment={ + "model_name": "default/llama-3-2-1b-tool", + "gpu": 1 + } +) + +deployment = client.inference.deployments.create( + name="llama-tool-deployment", + config="llama-tool-config" +) + +client.models.wait_for_status( + deployment_name="llama-tool-deployment", + desired_status="READY" +) + ``` + + ### Option 2: Set via Deployment Config (Override) Set `chat_template` and `tool_call_config` directly on the deployment config. These override any values from the fileset. -=== "CLI" + + +```bash - ```bash -{% raw %} nemo inference deployment-configs create \ --name "llama-tool-override-config" \ --nim-deployment '{ @@ -725,154 +753,155 @@ Set `chat_template` and `tool_call_config` directly on the deployment config. Th "auto_tool_choice": false } }' -{% endraw %} - ``` -=== "Python SDK" - - ```python - {% raw %} - config = client.inference.deployment_configs.create( - name="llama-tool-override-config", - nim_deployment={ - "model_name": "default/llama-3-2-1b-tool", - "gpu": 1, - "chat_template": "{%- for message in messages %}{{ message['content'] }}{%- endfor %}", - "tool_call_config": { - "tool_call_parser": "hermes", - "auto_tool_choice": False, - } - } - ) - {% endraw %} ``` + + +```python + +config = client.inference.deployment_configs.create( + name="llama-tool-override-config", + nim_deployment={ + "model_name": "default/llama-3-2-1b-tool", + "gpu": 1, + "chat_template": "{%- for message in messages %}{{ message['content'] }}{%- endfor %}", + "tool_call_config": { + "tool_call_parser": "hermes", + "auto_tool_choice": False, + } + } +) + ``` + + ### Change Tool Calling Config for an Existing Model Updating a fileset's `metadata.model.tool_calling` does **not** propagate changes to an existing model entity. The model entity's `spec` is populated from the fileset only at creation time. To change the tool calling configuration, create a new fileset with the updated config and a new model entity that references it. -=== "CLI" - - ```bash - # Create a new fileset with the updated tool calling config - nemo files filesets create \ - --name "llama-3-2-1b-tool-v2" \ - --storage '{ - "type": "huggingface", - "repo_id": "meta-llama/Llama-3.2-1B-Instruct", - "repo_type": "model" - }' \ - --metadata '{ + + +```bash +# Create a new fileset with the updated tool calling config +nemo files filesets create \ +--name "llama-3-2-1b-tool-v2" \ +--storage '{ +"type": "huggingface", +"repo_id": "meta-llama/Llama-3.2-1B-Instruct", +"repo_type": "model" +}' \ +--metadata '{ +"model": { +"tool_calling": { +"tool_call_parser": "mistral", +"auto_tool_choice": true +} +} +}' + +# Create a new model entity referencing the new fileset +nemo models create \ +--name "llama-3-2-1b-tool-v2" \ +--fileset "default/llama-3-2-1b-tool-v2" + +# Update deployment config to use the new model, or create a new one +nemo inference deployment-configs create \ +--name "llama-tool-config-v2" \ +--nim-deployment '{ +"model_name": "default/llama-3-2-1b-tool-v2", +"gpu": 1 +}' + +nemo inference deployments create \ +--name "llama-tool-deployment-v2" \ +--config "llama-tool-config-v2" + +nemo wait inference deployment llama-tool-deployment-v2 +``` + + +```python +# Create a new fileset with the updated tool calling config via metadata +tool_metadata_v2: dict[str, object] = { "model": { - "tool_calling": { - "tool_call_parser": "mistral", - "auto_tool_choice": true - } - } - }' - - # Create a new model entity referencing the new fileset - nemo models create \ - --name "llama-3-2-1b-tool-v2" \ - --fileset "default/llama-3-2-1b-tool-v2" - - # Update deployment config to use the new model, or create a new one - nemo inference deployment-configs create \ - --name "llama-tool-config-v2" \ - --nim-deployment '{ - "model_name": "default/llama-3-2-1b-tool-v2", - "gpu": 1 - }' - - nemo inference deployments create \ - --name "llama-tool-deployment-v2" \ - --config "llama-tool-config-v2" - - nemo wait inference deployment llama-tool-deployment-v2 - ``` - -=== "Python SDK" - - ```python - # Create a new fileset with the updated tool calling config via metadata - tool_metadata_v2: dict[str, object] = { - "model": { - "tool_calling": { - "tool_call_parser": "mistral", - "auto_tool_choice": True, - } + "tool_calling": { + "tool_call_parser": "mistral", + "auto_tool_choice": True, } } - - client.files.filesets.create( - name="llama-3-2-1b-tool-v2", - storage={ - "type": "huggingface", - "repo_id": "meta-llama/Llama-3.2-1B-Instruct", - "repo_type": "model", - }, - metadata=tool_metadata_v2, - ) - - # Create a new model entity referencing the new fileset - client.models.create( - name="llama-3-2-1b-tool-v2", - fileset="default/llama-3-2-1b-tool-v2", - ) - - # Create a new deployment config and deployment - config = client.inference.deployment_configs.create( - name="llama-tool-config-v2", - nim_deployment={"model_name": "default/llama-3-2-1b-tool-v2", "gpu": 1}, - ) - - deployment = client.inference.deployments.create( - name="llama-tool-deployment-v2", config="llama-tool-config-v2" - ) - - client.models.wait_for_status( - deployment_name="llama-tool-deployment-v2", desired_status="READY" - ) - ``` - +} + +client.files.filesets.create( + name="llama-3-2-1b-tool-v2", + storage={ + "type": "huggingface", + "repo_id": "meta-llama/Llama-3.2-1B-Instruct", + "repo_type": "model", + }, + metadata=tool_metadata_v2, +) + +# Create a new model entity referencing the new fileset +client.models.create( + name="llama-3-2-1b-tool-v2", + fileset="default/llama-3-2-1b-tool-v2", +) + +# Create a new deployment config and deployment +config = client.inference.deployment_configs.create( + name="llama-tool-config-v2", + nim_deployment={"model_name": "default/llama-3-2-1b-tool-v2", "gpu": 1}, +) + +deployment = client.inference.deployments.create( + name="llama-tool-deployment-v2", config="llama-tool-config-v2" +) + +client.models.wait_for_status( + deployment_name="llama-tool-deployment-v2", desired_status="READY" +) +``` + + ### Custom Tool Call Plugin For custom tool calling parsers, store the plugin Python file in a separate fileset and reference it via `tool_call_plugin`. -!!! info - Because plugins execute arbitrary Python code inside the NIM container, `tool_call_plugin` is **disabled by default** at the platform level. - To enable it, set `models.tool_call_plugin.enabled: true` in the platform configuration and ensure the user has the `models.tool-call-plugin.set` permission (granted to Admin and PlatformAdmin roles by default). - - -=== "Python SDK" - - ```python - # 1. Create a fileset for the plugin file - client.files.filesets.create(name="my-tool-plugin") - client.files.upload( - fileset="my-tool-plugin", - local_path="my_parser.py", - remote_path="my_parser.py", - ) - - # 2. Reference the plugin fileset in the model's fileset metadata - plugin_tool_metadata: dict[str, object] = { - "model": { - "tool_calling": { - "tool_call_parser": "custom_parser", - "tool_call_plugin": "default/my-tool-plugin", - "auto_tool_choice": True, - } + +Because plugins execute arbitrary Python code inside the NIM container, `tool_call_plugin` is **disabled by default** at the platform level. +To enable it, set `models.tool_call_plugin.enabled: true` in the platform configuration and ensure the user has the `models.tool-call-plugin.set` permission (granted to Admin and PlatformAdmin roles by default). + + + + +```python +# 1. Create a fileset for the plugin file +client.files.filesets.create(name="my-tool-plugin") +client.files.upload( + fileset="my-tool-plugin", + local_path="my_parser.py", + remote_path="my_parser.py", +) + +# 2. Reference the plugin fileset in the model's fileset metadata +plugin_tool_metadata: dict[str, object] = { + "model": { + "tool_calling": { + "tool_call_parser": "custom_parser", + "tool_call_plugin": "default/my-tool-plugin", + "auto_tool_choice": True, } } +} - client.files.filesets.update( - "llama-3-2-1b-tool", - metadata=plugin_tool_metadata, - ) - ``` - +client.files.filesets.update( + "llama-3-2-1b-tool", + metadata=plugin_tool_metadata, +) +``` + + The platform downloads the plugin fileset at deployment time and passes the `.py` file path to NIM via the `NIM_TOOL_PARSER_PLUGIN` environment variable. ### Tool Call Config Reference @@ -901,33 +930,35 @@ The platform translates tool calling configuration into NIM environment variable Some models (for example, `nvidia/nemotron-3-nano-30b-a3b`) enable reasoning/thinking by default. (From the model card: "[nemotron-3] responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response") You can disable it on a per-request basis by passing `chat_template_kwargs` directly in the request body: -=== "CLI" - - ```bash - nemo inference gateway provider post v1/chat/completions \ - --name "my-deployment" \ - --body '{ - "model": "default/my-model", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - "chat_template_kwargs": {"thinking": false} - }' - ``` - -=== "Python SDK" - - ```python - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="my-deployment", - body={ - "model": "default/my-model", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - "chat_template_kwargs": {"thinking": False}, - }, - ) - ``` - -!!! note "`chat_template_kwargs` is passed directly in the request body, not nested under `extra_body`. For more details on request-level reasoning overrides, see the [vLLM documentation](https://docs.vllm.ai/en/latest/features/reasoning_outputs/#request-level-override)." - This parameter only applies to models which use vLLM under the hood. For non-vLLM providers such as OpenAI or NVIDIA Build, the parameter that controls reasoning differs. Consult the provider's documentation for the specific parameter. + + +```bash +nemo inference gateway provider post v1/chat/completions \ +--name "my-deployment" \ +--body '{ +"model": "default/my-model", +"messages": [{"role": "user", "content": "Hello!"}], +"max_tokens": 100, +"chat_template_kwargs": {"thinking": false} +}' +``` + + +```python +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="my-deployment", + body={ + "model": "default/my-model", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + "chat_template_kwargs": {"thinking": False}, + }, +) +``` + + + +`chat_template_kwargs` is passed directly in the request body, not nested under `extra_body`. For more details on request-level reasoning overrides, see the [vLLM documentation](https://docs.vllm.ai/en/latest/features/reasoning_outputs/#request-level-override). +This parameter only applies to models which use vLLM under the hood. For non-vLLM providers such as OpenAI or NVIDIA Build, the parameter that controls reasoning differs. Consult the provider's documentation for the specific parameter. + diff --git a/docs/run-inference/tutorials/index.mdx b/docs/run-inference/tutorials/index.mdx index bbd61d8d7d..768fff4642 100644 --- a/docs/run-inference/tutorials/index.mdx +++ b/docs/run-inference/tutorials/index.mdx @@ -1,7 +1,11 @@ +--- +title: "Overview" +description: "" +--- # Tutorials -Learn how to run inference through the {{platform_name}}. +Learn how to run inference through the NeMo Platform. ## Prerequisites @@ -10,4 +14,4 @@ Learn how to run inference through the {{platform_name}}. ## Guides -- [Run Inference](run-inference.md) — Route requests via model entity, provider, or OpenAI routing +- [Run Inference](/models-and-inference/tutorials/run-inference) — Route requests via model entity, provider, or OpenAI routing diff --git a/docs/run-inference/tutorials/run-inference.mdx b/docs/run-inference/tutorials/run-inference.mdx index 2ce4cabc5b..9975e9aa8f 100644 --- a/docs/run-inference/tutorials/run-inference.mdx +++ b/docs/run-inference/tutorials/run-inference.mdx @@ -1,15 +1,38 @@ - - +--- +title: "Run Inference" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: download */} # Run Inference Route inference requests through the gateway using model entity routing, provider routing, or OpenAI-compatible routing. -!!! note - This tutorial assumes you have an external provider registered. The platform pre-configures a `system/nvidia-build` provider on startup, which most of the examples below use. To register your own provider, see [About Models and Inference](../about.md#model-providers). + +This tutorial assumes you have an external provider registered. The platform pre-configures a `system/nvidia-build` provider on startup, which most of the examples below use. To register your own provider, see [About Models and Inference](/models-and-inference/about#model-providers). + + ---8<-- "_snippets/tutorials/cli-sdk-setup.md" + + +```bash +# Configure CLI (if not already done) +nemo config set --base-url "$NMP_BASE_URL" --workspace default +``` + + +```python +import os +from nemo_platform import NeMoPlatform +client = NeMoPlatform( + base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), + workspace="default", +) +``` + + --- ## Listing Models @@ -24,29 +47,29 @@ There are two distinct ways to list models, depending on what you need: Use `nemo models list` to manage model configurations. Use `nemo inference models list` to discover what's ready to serve requests right now. -=== "CLI" - - ```bash - # All registered model entities (may include models without active deployments) - nemo models list - - # Models available for inference via the gateway (OpenAI-compatible IDs) - nemo inference models list - ``` - -=== "Python SDK" - - ```python - # All registered model entities - for model in client.models.list(): - print(model.name) - - # Models available for inference via the gateway - models = client.inference.models.list() - for model in models.data: - print(model.id) # format: workspace/model_entity_name - ``` + + +```bash +# All registered model entities (may include models without active deployments) +nemo models list +# Models available for inference via the gateway (OpenAI-compatible IDs) +nemo inference models list +``` + + +```python +# All registered model entities +for model in client.models.list(): + print(model.name) + +# Models available for inference via the gateway +models = client.inference.models.list() +for model in models.data: + print(model.id) # format: workspace/model_entity_name +``` + + --- ## Route by Model Entity @@ -56,45 +79,45 @@ Route requests using the model entity name. The gateway selects an available pro Find available model entities: -=== "CLI" - - ```bash - nemo models list - ``` - -=== "Python SDK" - - ```python - for model in client.models.list(): - print(model.name) - ``` - + + +```bash +nemo models list +``` + + +```python +for model in client.models.list(): + print(model.name) +``` + + Run inference by passing the Model Entity. -=== "CLI" - - ```bash - # Model entities are auto-discovered from deployments - # Use the model entity name from 'nemo models list' - nemo chat meta-llama-3-2-1b-instruct 'Hello!' --max-tokens 100 - ``` - -=== "Python SDK" - - ```python - # Model entities are auto-discovered from deployments - response = client.inference.gateway.model.post( - "v1/chat/completions", - name="meta-llama-3-2-1b-instruct", # Model entity name - body={ - "model": "meta/llama-3.2-1b-instruct", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` - + + +```bash +# Model entities are auto-discovered from deployments +# Use the model entity name from 'nemo models list' +nemo chat meta-llama-3-2-1b-instruct 'Hello!' --max-tokens 100 +``` + + +```python +# Model entities are auto-discovered from deployments +response = client.inference.gateway.model.post( + "v1/chat/completions", + name="meta-llama-3-2-1b-instruct", # Model entity name + body={ + "model": "meta/llama-3.2-1b-instruct", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + --- ## Route by Provider @@ -104,93 +127,95 @@ Route to a specific provider instance. Use for A/B testing or targeting specific Find available providers: -=== "CLI" - - ```bash - nemo inference providers list - - # List models available on a provider if its API is OpenAI compliant - nemo inference gateway provider get v1/models --name llama-3-2-1b-deployment - ``` - -=== "Python SDK" - - ```python - for provider in client.inference.providers.list(): - print(f"{provider.name}: {provider.status}") + + +```bash +nemo inference providers list - # List models available on a provider if its API is OpenAI compliant - models = client.inference.gateway.provider.get( - "v1/models", - name="llama-3-2-1b-deployment", - ) - print(models) - ``` +# List models available on a provider if its API is OpenAI compliant +nemo inference gateway provider get v1/models --name llama-3-2-1b-deployment +``` + + +```python +for provider in client.inference.providers.list(): + print(f"{provider.name}: {provider.status}") +# List models available on a provider if its API is OpenAI compliant +models = client.inference.gateway.provider.get( + "v1/models", + name="llama-3-2-1b-deployment", +) +print(models) +``` + + Pass inference request using provider routing. -=== "CLI" - - ```bash - # Provider name matches deployment name for auto-created providers - nemo chat meta/llama-3.2-1b-instruct 'Hello!' \ - --provider llama-3-2-1b-deployment \ - --max-tokens 100 - ``` - -=== "Python SDK" - - ```python - # Provider name matches deployment name for auto-created providers - response = client.inference.gateway.provider.post( - "v1/chat/completions", - name="llama-3-2-1b-deployment", # Provider name - body={ - "model": "meta/llama-3.2-1b-instruct", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 100, - }, - ) - ``` - + + +```bash +# Provider name matches deployment name for auto-created providers +nemo chat meta/llama-3.2-1b-instruct 'Hello!' \ +--provider llama-3-2-1b-deployment \ +--max-tokens 100 +``` + + +```python +# Provider name matches deployment name for auto-created providers +response = client.inference.gateway.provider.post( + "v1/chat/completions", + name="llama-3-2-1b-deployment", # Provider name + body={ + "model": "meta/llama-3.2-1b-instruct", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 100, + }, +) +``` + + --- ## Route using OpenAI SDK Use the OpenAI-compatible endpoint for drop-in SDK replacement. The model field uses the format `{workspace}/{model_entity}`. -!!! note "Model entity naming" - The `{model_entity}` segment is validated against a strict regex - (lowercase letters, digits, and hyphens; 2–63 characters; **no slashes**). - Vendor-style IDs such as `meta/llama-3.3-70b-instruct` are rejected with - HTTP 422 — register them as a hyphen-only entity (for example - `meta-llama-3-3-70b-instruct`) and reference them as - `{workspace}/meta-llama-3-3-70b-instruct` in the request body. Run - `nemo inference models list` to see the IDs the gateway accepts. + +Model entity naming +The `{model_entity}` segment is validated against a strict regex +(lowercase letters, digits, and hyphens; 2–63 characters; **no slashes**). +Vendor-style IDs such as `meta/llama-3.3-70b-instruct` are rejected with +HTTP 422 — register them as a hyphen-only entity (for example +`meta-llama-3-3-70b-instruct`) and reference them as +`{workspace}/meta-llama-3-3-70b-instruct` in the request body. Run +`nemo inference models list` to see the IDs the gateway accepts. + ### List Available Models List models currently reachable via the Inference Gateway. Returns results in OpenAI-compatible format; each `id` is `workspace/model_entity_name`. -=== "CLI" - - ```bash - nemo inference models list - ``` - -=== "Python SDK" - - ```python - models = client.inference.models.list() - for model in models.data: - print(model.id) # format: workspace/model_entity_name - ``` - + + +```bash +nemo inference models list +``` + + +```python +models = client.inference.models.list() +for model in models.data: + print(model.id) # format: workspace/model_entity_name +``` + + ### Using SDK -Make requests to the OpenAI-compatible Inference Gateway route with the {{platform_name}} SDK. +Make requests to the OpenAI-compatible Inference Gateway route with the NeMo Platform SDK. ```python response = client.inference.gateway.openai.post( @@ -205,7 +230,7 @@ response = client.inference.gateway.openai.post( ### Using OpenAI Python SDK -Models service provides a convenient helper method `client.models.get_openai_client()` that can generate an OpenAI SDK client for the configured workspace. The SDK also offers helper methods for generating OpenAI-compatible URL strings for Inference Gateway. Refer to [SDK Helper Methods](../about.md#sdk-helper-methods) for more info. +Models service provides a convenient helper method `client.models.get_openai_client()` that can generate an OpenAI SDK client for the configured workspace. The SDK also offers helper methods for generating OpenAI-compatible URL strings for Inference Gateway. Refer to [SDK Helper Methods](/models-and-inference/about#sdk-helper-methods) for more info. ```python # Get pre-configured OpenAI client @@ -223,50 +248,49 @@ response = oai_client.chat.completions.create( If you construct third-party clients manually, pass the headers returned by `client.models.get_client_default_headers()` so auth and identity headers are preserved when authorization is enabled. -=== "OpenAI (manual client)" - - ```python - from openai import OpenAI - - base_url = client.models.get_openai_route_base_url() - headers = client.models.get_client_default_headers() - - oai_client = OpenAI( - base_url=base_url, - api_key="not-needed", - default_headers=headers, - ) - - response = oai_client.chat.completions.create( - model="default/meta-llama-3-2-1b-instruct", - messages=[{"role": "user", "content": "Hello!"}], - ) - ``` - -=== "LiteLLM" + + +```python +from openai import OpenAI +base_url = client.models.get_openai_route_base_url() +headers = client.models.get_client_default_headers() - ```bash - # make sure litellm is installed - pip install litellm - ``` +oai_client = OpenAI( + base_url=base_url, + api_key="not-needed", + default_headers=headers, +) - - ```python - from litellm import completion +response = oai_client.chat.completions.create( + model="default/meta-llama-3-2-1b-instruct", + messages=[{"role": "user", "content": "Hello!"}], +) +``` + + +```bash +# make sure litellm is installed +pip install litellm +``` - base_url = client.models.get_openai_route_base_url() - headers = client.models.get_client_default_headers() +{/* @nemo-nb: skip-type-check */} +```python +from litellm import completion - response = completion( - model="openai/meta-llama-3-2-1b-instruct", - messages=[{"role": "user", "content": "Hello!"}], - api_base=base_url, - api_key="not-needed", - extra_headers=headers, - ) - ``` +base_url = client.models.get_openai_route_base_url() +headers = client.models.get_client_default_headers() +response = completion( + model="openai/meta-llama-3-2-1b-instruct", + messages=[{"role": "user", "content": "Hello!"}], + api_base=base_url, + api_key="not-needed", + extra_headers=headers, +) +``` + + ### Using curl The OpenAI-compatible chat-completions endpoint is reachable directly over HTTP. diff --git a/docs/safe-synthesizer/about/data-synthesis.mdx b/docs/safe-synthesizer/about/data-synthesis.mdx index 395e350cab..d108055a32 100644 --- a/docs/safe-synthesizer/about/data-synthesis.mdx +++ b/docs/safe-synthesizer/about/data-synthesis.mdx @@ -1,11 +1,15 @@ +--- +title: "Data Synthesis" +description: "" +--- # Data Synthesis -The synthesizer component is the main component of the {{nss_short_name}} product. It uses LLM-based fine-tuning to generate realistic synthetic data that maintains the utility of your original dataset while providing privacy protection. +The synthesizer component is the main component of the NeMo Safe Synthesizer product. It uses LLM-based fine-tuning to generate realistic synthetic data that maintains the utility of your original dataset while providing privacy protection. ## How It Works -{{nss_short_name}} employs a novel approach to synthetic data generation: +NeMo Safe Synthesizer employs a novel approach to synthetic data generation: 1. **Tabular Fine-Tuning**: Fine-tunes a language model on your tabular data to learn patterns, correlations, and statistical properties 2. **Generation**: Uses the fine-tuned model to generate new synthetic records that maintain data utility @@ -17,7 +21,7 @@ Creating synthetic versions of private data allows you to unlock insights withou ### LLM-Based Fine-Tuning -{{nss_short_name}} adapts language models to understand and generate tabular data: +NeMo Safe Synthesizer adapts language models to understand and generate tabular data: - Converts tabular data into text sequences suitable for LLM training - Fine-tunes on your dataset to capture patterns and correlations @@ -26,7 +30,7 @@ Creating synthetic versions of private data allows you to unlock insights withou ### Supported Data Types -{{nss_short_name}} supports diverse tabular data: +NeMo Safe Synthesizer supports diverse tabular data: - **Numeric**: Continuous and discrete numerical values - **Categorical**: Text labels and categories @@ -37,7 +41,7 @@ Creating synthetic versions of private data allows you to unlock insights withou A high level of privacy protection is achieved simply through the process of generating synthetic data, and is often a sufficient balance between privacy and utility. For use cases that require maximum privacy, you can fine-tune with differential privacy. -Differential privacy (DP) is the gold standard for privacy protection, providing mathematical guarantees that individual records cannot be identified. When you enable DP, {{nss_short_name}} introduces calibrated noise during model training to ensure that the synthetic data generation process is provably private. +Differential privacy (DP) is the gold standard for privacy protection, providing mathematical guarantees that individual records cannot be identified. When you enable DP, NeMo Safe Synthesizer introduces calibrated noise during model training to ensure that the synthetic data generation process is provably private. #### Mathematical Guarantee @@ -57,7 +61,7 @@ Where: #### DP-SGD Implementation -{{nss_short_name}} uses Differentially Private Stochastic Gradient Descent (DP-SGD) to add privacy guarantees during model training: +NeMo Safe Synthesizer uses Differentially Private Stochastic Gradient Descent (DP-SGD) to add privacy guarantees during model training: 1. **Per-sample gradient computation** - Calculate gradients for each training example individually 2. **Gradient clipping** - Clip L2 norm to `per_sample_max_grad_norm` to bound sensitivity @@ -92,7 +96,7 @@ Enabling DP provides strong privacy guarantees but affects synthetic data qualit **Data size:** DP performs best with 10,000+ training records. Smaller datasets may experience significant quality degradation due to the noise required for privacy guarantees. -For hands-on guidance, refer to [differential-privacy](../tutorials/differential-privacy.md). For complete parameter documentation, refer to [reference](reference.md). +For hands-on guidance, refer to [differential-privacy](/synthesize-safe-data/tutorials/differential-privacy). For complete parameter documentation, refer to [reference](/synthesize-safe-data/about/parameters-reference). ## Configuration @@ -102,10 +106,10 @@ Synthesis behavior is controlled through configuration parameters: - **Generation**: Number of records, temperature, sampling strategies - **Privacy**: Differential privacy parameters (epsilon, delta, clipping) -For a complete list of all available parameters and their defaults, refer to [reference](reference.md). +For a complete list of all available parameters and their defaults, refer to [reference](/synthesize-safe-data/about/parameters-reference). ## Related Topics -- [reference](reference.md): Complete parameter reference -- [differential-privacy](../tutorials/differential-privacy.md): Learn about differential privacy in practice -- [index](../tutorials/index.md): More tutorials +- [reference](/synthesize-safe-data/about/parameters-reference): Complete parameter reference +- [differential-privacy](/synthesize-safe-data/tutorials/differential-privacy): Learn about differential privacy in practice +- [index](/synthesize-safe-data/tutorials/overview): More tutorials diff --git a/docs/safe-synthesizer/about/evaluation.mdx b/docs/safe-synthesizer/about/evaluation.mdx index 8ba95c4f90..d821cda57a 100644 --- a/docs/safe-synthesizer/about/evaluation.mdx +++ b/docs/safe-synthesizer/about/evaluation.mdx @@ -1,8 +1,12 @@ - +--- +title: "Evaluation" +description: "" +--- +{/* @nemo-nb: process */} # Evaluation -Evaluation is a critical component of {{nss_short_name}} that helps you understand both the utility and privacy of your synthetic data. The evaluation step is enabled by default and provides comprehensive reports comparing your training and synthetic datasets across multiple dimensions. +Evaluation is a critical component of NeMo Safe Synthesizer that helps you understand both the utility and privacy of your synthetic data. The evaluation step is enabled by default and provides comprehensive reports comparing your training and synthetic datasets across multiple dimensions. ## How It Works @@ -85,11 +89,11 @@ You should expect some PII replay, and it is often not a cause for concern. We t - Lower epsilon (stronger privacy) generally yields higher DPS scores - Enabling DP can reduce **SQS** due to privacy-utility tradeoff (noise affects quality) -For differential privacy configuration and privacy-utility tradeoffs, refer to [data-synthesis](data-synthesis.md) and [differential-privacy](../tutorials/differential-privacy.md). +For differential privacy configuration and privacy-utility tradeoffs, refer to [data-synthesis](/synthesize-safe-data/about/data-synthesis) and [differential-privacy](/synthesize-safe-data/tutorials/differential-privacy). ## Evaluation Reports -Every {{nss_short_name}} job automatically generates an HTML evaluation report containing: +Every NeMo Safe Synthesizer job automatically generates an HTML evaluation report containing: - Overall SQS and DPS scores - Detailed subscores for each metric @@ -121,6 +125,6 @@ We enable evaluation by default. ## Related Topics -- [safe-synthesizer-101](../tutorials/safe-synthesizer-101.md): Get started with evaluation -- [differential-privacy](../tutorials/differential-privacy.md): Learn about privacy metrics -- [index](../tutorials/index.md): More tutorials +- [safe-synthesizer-101](/synthesize-safe-data/tutorials/safe-synthesizer-101): Get started with evaluation +- [differential-privacy](/synthesize-safe-data/tutorials/differential-privacy): Learn about privacy metrics +- [index](/synthesize-safe-data/tutorials/overview): More tutorials diff --git a/docs/safe-synthesizer/about/host-local-development.mdx b/docs/safe-synthesizer/about/host-local-development.mdx index 701f16e869..67dd5557db 100644 --- a/docs/safe-synthesizer/about/host-local-development.mdx +++ b/docs/safe-synthesizer/about/host-local-development.mdx @@ -1,13 +1,17 @@ - - +--- +title: "Host-Local Development and Testing" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: skip-test */} # Host-Local Development and Testing -Run {{nss_short_name}} on your machine's GPU with `nemo safe-synthesizer run-local`. This page covers the plugin CLI only (`run-local` and `runtime`). It does not cover platform job submission or `nemo safe-synthesizer jobs …` commands (not exposed in the CLI today). +Run NeMo Safe Synthesizer on your machine's GPU with `nemo safe-synthesizer run-local`. This page covers the plugin CLI only (`run-local` and `runtime`). It does not cover platform job submission or `nemo safe-synthesizer jobs …` commands (not exposed in the CLI today). ## Prerequisites -- CUDA-capable NVIDIA GPU on the host (80GB+ VRAM recommended; check with `nvidia-smi`). See [Getting Started](../getting-started.md). +- CUDA-capable NVIDIA GPU on the host (80GB+ VRAM recommended; check with `nvidia-smi`). See [Getting Started](/synthesize-safe-data/getting-started). - NeMo Platform repository checkout with the Safe Synthesizer plugin installed. - **No running platform required** for a typical local run when you pass `--data-source` — the NSS runtime can download base models from Hugging Face directly. @@ -161,7 +165,7 @@ Requires `RUN_NSS_LOCAL_E2E=1`, CUDA, and `nemo safe-synthesizer runtime setup`. ## Related topics -- [Parameters Reference](reference.md) — spec and `config` fields -- [Getting Started](../getting-started.md) — GPU and platform context -- [Jobs](jobs.md) — platform job lifecycle (separate from this run-local guide) +- [Parameters Reference](/synthesize-safe-data/about/parameters-reference) — spec and `config` fields +- [Getting Started](/synthesize-safe-data/getting-started) — GPU and platform context +- [Jobs](/synthesize-safe-data/about/jobs) — platform job lifecycle (separate from this run-local guide) - Plugin README: `plugins/nemo-safe-synthesizer/README.md` diff --git a/docs/safe-synthesizer/about/index.mdx b/docs/safe-synthesizer/about/index.mdx index bc3bda32f6..64baf3a421 100644 --- a/docs/safe-synthesizer/about/index.mdx +++ b/docs/safe-synthesizer/about/index.mdx @@ -1,58 +1,61 @@ - +--- +title: "About Generating Safe Synthetic Data" +description: "" +--- # About Generating Safe Synthetic Data -{{nss_long_name}} enables you to create private versions of sensitive tabular datasets. The resulting data is entirely synthetic, with no one-to-one mapping to your original records. {{nss_short_name}} is purpose-built for privacy compliance and data protection while preserving data utility for downstream AI tasks. +NVIDIA NeMo Safe Synthesizer enables you to create private versions of sensitive tabular datasets. The resulting data is entirely synthetic, with no one-to-one mapping to your original records. NeMo Safe Synthesizer is purpose-built for privacy compliance and data protection while preserving data utility for downstream AI tasks. -[**Quickstart**](../getting-started.md){ .md-button .md-button--primary } -[**Tutorials**](../tutorials/index.md){ .md-button } +[**Quickstart**](/synthesize-safe-data/getting-started) +[**Tutorials**](/synthesize-safe-data/tutorials/overview) --- -{{nss_short_name}} allows you to generate synthetic data that maintains the statistical properties of your original dataset without exposing sensitive information about individual records. +NeMo Safe Synthesizer allows you to generate synthetic data that maintains the statistical properties of your original dataset without exposing sensitive information about individual records. -{{nss_short_name}} is best when you have the data you need, but because it is private or sensitive in nature you cannot use it as-is. {{nss_short_name}} interpolates from existing data to generate a private, synthetic version, where new records have no one-to-one mapping to original records. If you do not have any data or want to extrapolate based on a very small set of examples, refer to [index](../../data-designer/index.md). NeMo Data Designer supports synthetic data creation from scratch or small seed for AI training and development use cases. +NeMo Safe Synthesizer is best when you have the data you need, but because it is private or sensitive in nature you cannot use it as-is. NeMo Safe Synthesizer interpolates from existing data to generate a private, synthetic version, where new records have no one-to-one mapping to original records. If you do not have any data or want to extrapolate based on a very small set of examples, refer to [index](/design-synthetic-data/about). NeMo Data Designer supports synthetic data creation from scratch or small seed for AI training and development use cases. -## {{nss_short_name}} Job +## NeMo Safe Synthesizer Job -A complete {{nss_short_name}} job consists of the following steps: +A complete NeMo Safe Synthesizer job consists of the following steps: -1. [Upload Data](../../get-started/concepts/manage-files.md): Add your tabular data to the Files API +1. [Upload Data](/get-started/core-concepts/manage-files): Add your tabular data to the Files API 2. Prepare Data: - - [Configure PII Replacement](pii-replacement.md): Set up detection and replacement of sensitive information (recommended prior to the Synthesis step to ensure the model has no chance of learning the most sensitive information like names and addresses) + - [Configure PII Replacement](/synthesize-safe-data/about/pii-replacement): Set up detection and replacement of sensitive information (recommended prior to the Synthesis step to ensure the model has no chance of learning the most sensitive information like names and addresses) - Configure training data organization and holdout splits 3. Configure Synthesis: - - [Training](data-synthesis.md): Set model selection and training parameters including differential privacy + - [Training](/synthesize-safe-data/about/data-synthesis): Set model selection and training parameters including differential privacy - Generate synthetic records - - [Evaluation](evaluation.md): Assess quality and privacy + - [Evaluation](/synthesize-safe-data/about/evaluation): Assess quality and privacy 4. Execute and Review: - - [Run and Monitor Job](jobs.md): Execute the job and track progress + - [Run and Monitor Job](/synthesize-safe-data/about/jobs): Execute the job and track progress - Download synthetic data and evaluation reports -Find all Safe Synthesizer configuration parameters in [Parameters Reference](reference.md). +Find all Safe Synthesizer configuration parameters in [Parameters Reference](/synthesize-safe-data/about/parameters-reference). --- ## Installation Options -Try out this early access API using Docker Compose or deploying the {{platform_name}} Helm chart. +Try out this early access API using Docker Compose or deploying the NeMo Platform Helm chart.
-- **[Quickstart](../getting-started.md)** +- **[Quickstart](/synthesize-safe-data/getting-started)** --- - Get started with the {{nss_short_name}} microservice locally using the {{platform_name}} CLI. Easiest for local testing. + Get started with the NeMo Safe Synthesizer microservice locally using the NeMo Platform CLI. Easiest for local testing. standalone -- **[Helm Chart](../../set-up/helm/index.md)** +- **[Helm Chart](/platform/deploying-on-kubernetes/overview)** --- - Deploy the {{platform_name}} Helm Chart, which includes {{nss_short_name}}. + Deploy the NeMo Platform Helm Chart, which includes NeMo Safe Synthesizer. helm-chart @@ -66,11 +69,11 @@ Get hands-on experience with Safe Synthesizer through step-by-step tutorials.
-- **[Tutorials](../tutorials/index.md)** +- **[Tutorials](/synthesize-safe-data/tutorials/overview)** --- - Learn how to use {{nss_short_name}} with hands-on tutorials covering basics to advanced topics. + Learn how to use NeMo Safe Synthesizer with hands-on tutorials covering basics to advanced topics. beginner intermediate @@ -82,37 +85,37 @@ Get hands-on experience with Safe Synthesizer through step-by-step tutorials.
-- **[Data Synthesis](data-synthesis.md)** +- **[Data Synthesis](/synthesize-safe-data/about/data-synthesis)** --- Learn about LLM-based synthesis, differential privacy, and tabular fine-tuning for generating synthetic data. -- **[PII Replacement](pii-replacement.md)** +- **[PII Replacement](/synthesize-safe-data/about/pii-replacement)** --- Understand how PII detection and replacement works to protect sensitive information before synthesis. -- **[Evaluation](evaluation.md)** +- **[Evaluation](/synthesize-safe-data/about/evaluation)** --- Learn about quality and privacy metrics used to assess synthetic data including SQS and DPS scores. -- **[Jobs](jobs.md)** +- **[Jobs](/synthesize-safe-data/about/jobs)** --- Understand the job lifecycle, configuration, and execution for Safe Synthesizer pipelines. -- **[Host-Local Development](host-local-development.md)** +- **[Host-Local Development](/synthesize-safe-data/about/host-local-development)** --- Run on a host GPU with `nemo safe-synthesizer run-local` and `runtime` commands; reuse local adapters and run plugin tests. -- **[Parameters Reference](reference.md)** +- **[Parameters Reference](/synthesize-safe-data/about/parameters-reference)** --- diff --git a/docs/safe-synthesizer/about/jobs.mdx b/docs/safe-synthesizer/about/jobs.mdx index 187d9e0a76..fd7f004c3b 100644 --- a/docs/safe-synthesizer/about/jobs.mdx +++ b/docs/safe-synthesizer/about/jobs.mdx @@ -1,15 +1,19 @@ - - +--- +title: "Safe Synthesizer Jobs" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: skip-test */} # Safe Synthesizer Jobs -{{nss_short_name}} jobs orchestrate the complete pipeline from data preparation through synthesis to evaluation. Understanding the job lifecycle and configuration options is essential for effective use of the platform. +NeMo Safe Synthesizer jobs orchestrate the complete pipeline from data preparation through synthesis to evaluation. Understanding the job lifecycle and configuration options is essential for effective use of the platform. - +{/* TODO: Link to generic NeMo Platform jobs documentation when available for common job management concepts */} ## Job Lifecycle -A {{nss_short_name}} job progresses through several states: +A NeMo Safe Synthesizer job progresses through several states: ### Job States @@ -109,7 +113,7 @@ When the job completes, access: For **platform jobs**, set `pretrained_model_job` in the job spec to a completed job that has an **`adapter`** result in Files. Reuse is generation-only (no retraining). Use either `pretrained_model_job` or `config.training.pretrained_model`, not both. -For **host-local** development (`nemo safe-synthesizer run-local`), set `config.training.pretrained_model` to a local adapter or work directory from an earlier run. See [Host-Local Development and Testing](host-local-development.md). +For **host-local** development (`nemo safe-synthesizer run-local`), set `config.training.pretrained_model` to a local adapter or work directory from an earlier run. See [Host-Local Development and Testing](/synthesize-safe-data/about/host-local-development). ## Job Builder API @@ -193,7 +197,7 @@ for log in job.fetch_logs(): ### Docker Compose Deployments -When running {{nss_short_name}} using Docker Compose, view container logs directly: +When running NeMo Safe Synthesizer using Docker Compose, view container logs directly: ```bash # View safe-synthesizer service logs @@ -309,7 +313,7 @@ kubectl get events -n --sort-by='.lastTimestamp' ## Related Topics -- [Host-Local Development and Testing](host-local-development.md): `run-local`, adapter reuse, and plugin tests -- [safe-synthesizer-101](../tutorials/safe-synthesizer-101.md): Get started with {{nss_short_name}} jobs -- [index](../tutorials/index.md): More hands-on tutorials -- [reference](reference.md): Full parameter reference +- [Host-Local Development and Testing](/synthesize-safe-data/about/host-local-development): `run-local`, adapter reuse, and plugin tests +- [safe-synthesizer-101](/synthesize-safe-data/tutorials/safe-synthesizer-101): Get started with NeMo Safe Synthesizer jobs +- [index](/synthesize-safe-data/tutorials/overview): More hands-on tutorials +- [reference](/synthesize-safe-data/about/parameters-reference): Full parameter reference diff --git a/docs/safe-synthesizer/about/pii-replacement.mdx b/docs/safe-synthesizer/about/pii-replacement.mdx index 21601831a5..c811178a9b 100644 --- a/docs/safe-synthesizer/about/pii-replacement.mdx +++ b/docs/safe-synthesizer/about/pii-replacement.mdx @@ -1,5 +1,9 @@ - - +--- +title: "PII Replacement" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: skip-test */} # PII Replacement @@ -16,7 +20,7 @@ The PII replacement pipeline operates in multiple stages: ## Detection Methods -{{nss_short_name}} supports multiple PII detection approaches: +NeMo Safe Synthesizer supports multiple PII detection approaches: ### Nemotron PII Detection @@ -172,7 +176,7 @@ Beyond these built-in types, you can define custom entities using: ## Configuration -PII replacement is configured through the `replace_pii` section. For the full schema, refer to [reference](reference.md). +PII replacement is configured through the `replace_pii` section. For the full schema, refer to [reference](/synthesize-safe-data/about/parameters-reference). ```json { @@ -214,5 +218,5 @@ PII replacement is always recommended as a preprocessing step before synthesis. ## Related Topics -- [safe-synthesizer-101](../tutorials/safe-synthesizer-101.md): Getting started tutorial with PII replacement -- [index](../tutorials/index.md): More tutorials +- [safe-synthesizer-101](/synthesize-safe-data/tutorials/safe-synthesizer-101): Getting started tutorial with PII replacement +- [index](/synthesize-safe-data/tutorials/overview): More tutorials diff --git a/docs/safe-synthesizer/about/reference.mdx b/docs/safe-synthesizer/about/reference.mdx index 543897d7f8..48e5e01e7e 100644 --- a/docs/safe-synthesizer/about/reference.mdx +++ b/docs/safe-synthesizer/about/reference.mdx @@ -1,11 +1,15 @@ - - +--- +title: "Parameters Reference" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: skip-test */} # Parameters Reference This page summarizes the main configuration groups available when creating -{{nss_short_name}} jobs. For generated REST API schema details, see the -[Safe Synthesizer API Reference](../../api/index.md#tag-safe-synthesizer). +NeMo Safe Synthesizer jobs. For generated REST API schema details, see the +[Safe Synthesizer API Reference](/reference/api-reference#tag-safe-synthesizer). ## Job Spec (Plugin / REST) @@ -14,10 +18,10 @@ Top-level fields on the Safe Synthesizer job spec (alongside `config`): | Field | Description | |-------|-------------| | `data_source` | Input data as a platform fileset URL (`workspace/fileset#path`). With `run-local`, override via `--data-source` and use any placeholder in the spec. | -| `pretrained_model_job` | Prior completed job whose **`adapter`** result in Files is reused for **generation-only** synthesis. Format: `` or `/`. Mutually exclusive with `config.training.pretrained_model`. | +| `pretrained_model_job` | Prior completed job whose **`adapter`** result in Files is reused for **generation-only** synthesis. Format: `<job>` or `<workspace>/<job>`. Mutually exclusive with `config.training.pretrained_model`. | | `hf_token_secret` | Platform secret name for Hugging Face token during model initialization | -For host-local runs, see [Host-Local Development and Testing](host-local-development.md). Reuse a local adapter with `config.training.pretrained_model`, not `pretrained_model_job`. +For host-local runs, see [Host-Local Development and Testing](/synthesize-safe-data/about/host-local-development). Reuse a local adapter with `config.training.pretrained_model`, not `pretrained_model_job`. ## Top-Level Configuration @@ -26,7 +30,7 @@ The `SafeSynthesizerParameters` schema defines the main configuration structure ### SafeSynthesizerParameters All fields are optional at the top level. For nested field constraints, see the -[Safe Synthesizer API Reference](../../api/index.md#tag-safe-synthesizer) and +[Safe Synthesizer API Reference](/reference/api-reference#tag-safe-synthesizer) and search for the schema name in the **Type** column. | Field | Type | Constraints / description | @@ -76,7 +80,7 @@ Configuration for synthetic data quality and privacy assessment, including MIA, ## PII Replacement Configuration -Configuration for PII detection and replacement. See [pii-replacement](../about/pii-replacement.md) for conceptual documentation. +Configuration for PII detection and replacement. See [pii-replacement](/synthesize-safe-data/about/pii-replacement) for conceptual documentation. ### Column Classification Config (`replace_pii.globals.classify`) @@ -128,6 +132,6 @@ job = builder.create_job(name="my-job", project="my-project") ## Related Topics -- [data-synthesis](../about/data-synthesis.md) - Learn about synthesis concepts -- [evaluation](../about/evaluation.md) - Learn about evaluation metrics -- [index](../tutorials/index.md) - Hands-on tutorials +- [data-synthesis](/synthesize-safe-data/about/data-synthesis) - Learn about synthesis concepts +- [evaluation](/synthesize-safe-data/about/evaluation) - Learn about evaluation metrics +- [index](/synthesize-safe-data/tutorials/overview) - Hands-on tutorials diff --git a/docs/safe-synthesizer/getting-started.mdx b/docs/safe-synthesizer/getting-started.mdx index 26eb896567..886ade59a2 100644 --- a/docs/safe-synthesizer/getting-started.mdx +++ b/docs/safe-synthesizer/getting-started.mdx @@ -1,26 +1,38 @@ +--- +title: "Getting Started with NeMo Safe Synthesizer" +description: "" +--- -# Getting Started with {{nss_short_name}} +# Getting Started with NeMo Safe Synthesizer -Get started with {{nss_short_name}} for generating private synthetic versions of sensitive tabular datasets. +Get started with NeMo Safe Synthesizer for generating private synthetic versions of sensitive tabular datasets. ## Prerequisites -Before using {{nss_short_name}}, complete the [{{platform_name}} Quickstart](../get-started/quickstart.md) to install the CLI/SDK and deploy the platform. +Before using NeMo Safe Synthesizer, complete the NeMo Platform Quickstart to install the CLI/SDK and deploy the platform. -{{nss_short_name}} has the following additional requirements: +NeMo Safe Synthesizer has the following additional requirements: - An NVIDIA GPU **on the host machine** with 80GB+ VRAM (check with `nvidia-smi`). This is separate from any GPU inside a NIM container — Safe Synthesizer training runs directly on the host. - Sufficient disk space for generated datasets (50GB+ recommended) -For general platform troubleshooting (port conflicts, health checks, and so on), refer to the [main quickstart guide](../get-started/quickstart.md). +For general platform troubleshooting (port conflicts, health checks, and so on), refer to the main quickstart guide. + + +The platform pre-configures a `system/nvidia-build` model provider during startup. +This provider routes inference requests to models hosted on `build.nvidia.com` using the API base URL `https://integrate.api.nvidia.com` +and the NGC API key with `Public API Endpoints` permissions provided during deployment (automatically saved as the built-in `system/ngc-api-key` secret). + +You can verify this provider exists by running `nemo inference providers list --workspace system`. ---8<-- "_snippets/nvidia-build-model-provider.md" +The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead. + --- ## Host-local CLI -For GPU development on your machine, install the Safe Synthesizer plugin from this repository and use `nemo safe-synthesizer run-local` (see [Host-Local Development and Testing](about/host-local-development.md)): +For GPU development on your machine, install the Safe Synthesizer plugin from this repository and use `nemo safe-synthesizer run-local` (see [Host-Local Development and Testing](/synthesize-safe-data/about/host-local-development)): ```shell BOOTSTRAP_LOCAL_PLUGIN_DIRS=plugins/nemo-safe-synthesizer make bootstrap-python @@ -31,15 +43,15 @@ uv run nemo safe-synthesizer run-local \ --output-dir ./nss-output ``` -Platform job submission (Jobs API, Studio, tutorials) is documented separately in [Jobs](about/jobs.md) and the [tutorials](tutorials/index.md). The `nemo safe-synthesizer` CLI today exposes **run-local** and **runtime** only. +Platform job submission (Jobs API, Studio, tutorials) is documented separately in [Jobs](/synthesize-safe-data/about/jobs) and the [tutorials](/synthesize-safe-data/tutorials/overview). The `nemo safe-synthesizer` CLI today exposes **run-local** and **runtime** only. --- ## Next Steps -Run one of the [tutorials](tutorials/index.md) to create your first synthetic dataset: +Run one of the [tutorials](/synthesize-safe-data/tutorials/overview) to create your first synthetic dataset: -- [Safe Synthesizer 101 Tutorial](tutorials/safe-synthesizer-101.md) - A beginner-friendly introduction -- [Differential Privacy Tutorial](tutorials/differential-privacy.md) - Generate differentially-private synthetic data +- [Safe Synthesizer 101 Tutorial](/synthesize-safe-data/tutorials/safe-synthesizer-101) - A beginner-friendly introduction +- [Differential Privacy Tutorial](/synthesize-safe-data/tutorials/differential-privacy) - Generate differentially-private synthetic data --- diff --git a/docs/safe-synthesizer/tutorials/differential-privacy.mdx b/docs/safe-synthesizer/tutorials/differential-privacy.mdx index 43c7870a41..cb48a8e47b 100644 --- a/docs/safe-synthesizer/tutorials/differential-privacy.mdx +++ b/docs/safe-synthesizer/tutorials/differential-privacy.mdx @@ -1,15 +1,19 @@ - - +--- +title: "Differential Privacy Tutorial" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: download */} # Differential Privacy Tutorial Learn how to apply differential privacy to achieve the maximum level of privacy with mathematical guarantees. This tutorial explores the privacy-utility tradeoff and demonstrates how to configure differential privacy parameters for optimal results. -If you have not yet completed the [Safe Synthesizer 101](safe-synthesizer-101.md) tutorial, consider starting there first. +If you have not yet completed the [Safe Synthesizer 101](/synthesize-safe-data/tutorials/safe-synthesizer-101) tutorial, consider starting there first. ## Prerequisites -- Understanding of [differential privacy](../about/data-synthesis.md) +- Understanding of [differential privacy](/synthesize-safe-data/about/data-synthesis) - Safe Synthesizer deployment with GPU resources --- @@ -32,7 +36,7 @@ Differential privacy (DP) provides mathematical guarantees that synthetic data d - **Epsilon (ε)**: Privacy budget - lower values mean stronger privacy - ε = 1: Very strong privacy - ε = 6-10: Moderate privacy - - ε > 10: Weak privacy + - ε > 10: Weak privacy - **Recommended starting range: ε ∈ [8, 12]** - adjust downward based on privacy needs - **Delta (δ)**: Probability of privacy breach @@ -46,7 +50,7 @@ Differential privacy (DP) provides mathematical guarantees that synthetic data d ### Record-Level vs Group-Level Privacy -By default, {{nss_short_name}} uses **record-level** differential privacy, which protects individual records. For datasets where multiple records belong to the same entity (e.g., a patient with multiple visits), you can use **group-level** privacy by setting `group_training_examples_by` to the column that identifies each entity. See [Group-Level Privacy](#group-level-privacy) in the Advanced Configuration section for a code example. +By default, NeMo Safe Synthesizer uses **record-level** differential privacy, which protects individual records. For datasets where multiple records belong to the same entity (e.g., a patient with multiple visits), you can use **group-level** privacy by setting `group_training_examples_by` to the column that identifies each entity. See [Group-Level Privacy](#group-level-privacy) in the Advanced Configuration section for a code example. **When to use group-level privacy:** - Multiple records per person/entity in your dataset @@ -57,7 +61,7 @@ By default, {{nss_short_name}} uses **record-level** differential privacy, which ## Setup -Install the {{platform_name}} SDK with Safe Synthesizer support: +Install the NeMo Platform SDK with Safe Synthesizer support: ```shell if command -v uv &> /dev/null; then diff --git a/docs/safe-synthesizer/tutorials/index.mdx b/docs/safe-synthesizer/tutorials/index.mdx index ab43168d4e..292df26d89 100644 --- a/docs/safe-synthesizer/tutorials/index.mdx +++ b/docs/safe-synthesizer/tutorials/index.mdx @@ -1,13 +1,17 @@ +--- +title: "Tutorials" +description: "" +--- # Tutorials -Learn how to run {{nss_short_name}} jobs through hands-on tutorials to generate private synthetic versions of sensitive tabular datasets. Each tutorial provides step-by-step guidance with executable code examples. +Learn how to run NeMo Safe Synthesizer jobs through hands-on tutorials to generate private synthetic versions of sensitive tabular datasets. Each tutorial provides step-by-step guidance with executable code examples. ## Prerequisites Before starting any tutorial, ensure you have: -- [{{nss_short_name}} deployed](../getting-started.md) using Docker Compose or Helm +- [NeMo Safe Synthesizer deployed](/synthesize-safe-data/getting-started) using Docker Compose or Helm - Python environment with `nemo-platform` SDK installed: ```bash pip install nemo-platform[all] @@ -20,7 +24,7 @@ Before starting any tutorial, ensure you have:
-- **[Safe Synthesizer 101](safe-synthesizer-101.md)** +- **[Safe Synthesizer 101](/synthesize-safe-data/tutorials/safe-synthesizer-101)** --- @@ -43,7 +47,7 @@ Before starting any tutorial, ensure you have:
-- **[Differential Privacy Tutorial](differential-privacy.md)** +- **[Differential Privacy Tutorial](/synthesize-safe-data/tutorials/differential-privacy)** --- @@ -66,11 +70,11 @@ Before starting any tutorial, ensure you have: After completing these tutorials, explore: -- **[index](../about/index.md)**: Understand core concepts and components +- **[index](/synthesize-safe-data/about/overview)**: Understand core concepts and components --- ## Need Help? - Check the [GitHub Issues](https://github.com/NVIDIA/GenerativeAIExamples/issues) for known issues -- Review the [jobs](../about/jobs.md) guide for job management +- Review the [jobs](/synthesize-safe-data/about/jobs) guide for job management diff --git a/docs/safe-synthesizer/tutorials/safe-synthesizer-101.mdx b/docs/safe-synthesizer/tutorials/safe-synthesizer-101.mdx index 7521fb348c..0a446092fa 100644 --- a/docs/safe-synthesizer/tutorials/safe-synthesizer-101.mdx +++ b/docs/safe-synthesizer/tutorials/safe-synthesizer-101.mdx @@ -1,15 +1,19 @@ - - +--- +title: "Safe Synthesizer 101" +description: "" +--- +{/* @nemo-nb: process */} +{/* @nemo-nb: download */} # Safe Synthesizer 101 -Learn the fundamentals of {{nss_short_name}} by creating your first Safe Synthesizer job using provided defaults. In this tutorial, you'll upload sample customer data, replace personally identifiable information, fine-tune a model, generate synthetic records, and review the evaluation report. +Learn the fundamentals of NeMo Safe Synthesizer by creating your first Safe Synthesizer job using provided defaults. In this tutorial, you'll upload sample customer data, replace personally identifiable information, fine-tune a model, generate synthetic records, and review the evaluation report. ## Prerequisites Before you begin, make sure that you have: -- Access to a deployment of {{nss_short_name}} (see [getting-started](../getting-started.md)) +- Access to a deployment of NeMo Safe Synthesizer (see [getting-started](/synthesize-safe-data/getting-started)) - **An NVIDIA GPU with 80 GB+ VRAM** — Safe Synthesizer requires GPU access for model training, even when using remote inference for other services. Verify with `nvidia-smi`. - Python environment with `nemo-platform` SDK installed - Basic understanding of Python and pandas @@ -29,7 +33,7 @@ By the end of this tutorial, you'll understand how to: ## Step 1: Install the SDK -Install the {{platform_name}} SDK with Safe Synthesizer support. Run the following command in a **terminal (shell)**: +Install the NeMo Platform SDK with Safe Synthesizer support. Run the following command in a **terminal (shell)**: ```shell if command -v uv &> /dev/null; then @@ -108,10 +112,19 @@ print(df.head()) Before running jobs, set up column classification for accurate PII detection. -!!! tip - Column classification uses an LLM to automatically detect column types and improve PII detection accuracy. Without this setup, you may see classification errors and reduced detection quality. + +Column classification uses an LLM to automatically detect column types and improve PII detection accuracy. Without this setup, you may see classification errors and reduced detection quality. + + + +The platform pre-configures a `system/nvidia-build` model provider during startup. +This provider routes inference requests to models hosted on `build.nvidia.com` using the API base URL `https://integrate.api.nvidia.com` +and the NGC API key with `Public API Endpoints` permissions provided during deployment (automatically saved as the built-in `system/ngc-api-key` secret). + +You can verify this provider exists by running `nemo inference providers list --workspace system`. ---8<-- "_snippets/nvidia-build-model-provider.md" +The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead. + ```python # Use the pre-configured NVIDIA Build model provider @@ -120,8 +133,9 @@ provider_name = "system/nvidia-build" print(f"✅ Using model provider: {provider_name}") ``` -!!! note - If you prefer not to send column data to `build.nvidia.com`, you can [deploy your own LLM](../../run-inference/tutorials/deploy-models.md) and create a custom model provider. Pass the fully-qualified provider name (`workspace/provider-name`) to `.with_classify_model_provider()` instead. + +If you prefer not to send column data to `build.nvidia.com`, you can deploy your own LLM and create a custom model provider. Pass the fully-qualified provider name (`workspace/provider-name`) to `.with_classify_model_provider()` instead. + --- @@ -305,7 +319,7 @@ job.display_report_in_notebook() ### Interpreting Scores -The evaluation report contains two high-level scores: Synthetic Quality Score (SQS) and Data Privacy Score (DPS). Both are measured out of 10, and higher is better. To learn more about how to interpret the scores, refer to the [evaluation guide](../about/evaluation.md). +The evaluation report contains two high-level scores: Synthetic Quality Score (SQS) and Data Privacy Score (DPS). Both are measured out of 10, and higher is better. To learn more about how to interpret the scores, refer to the [evaluation guide](/synthesize-safe-data/about/evaluation). --- @@ -315,11 +329,11 @@ Now that you've completed your first Safe Synthesizer job, explore more advanced ### Advanced Tutorials -- [Differential Privacy Tutorial](differential-privacy.md) - Apply mathematical privacy guarantees +- [Differential Privacy Tutorial](/synthesize-safe-data/tutorials/differential-privacy) - Apply mathematical privacy guarantees ### Documentation -- [index](../about/index.md) - Understand core concepts +- [index](/synthesize-safe-data/about/overview) - Understand core concepts ### Try These Next @@ -368,7 +382,7 @@ print(f"Total jobs: {len(all_jobs.data)}") - Use smaller model (adjust `training.pretrained_model`) - Check GPU availability -For more help, see [jobs](../about/jobs.md). +For more help, see [jobs](/synthesize-safe-data/about/jobs). **Error: "Dataset must have at least 200 records to use holdout."** @@ -387,6 +401,7 @@ builder = ( ) ``` -!!! warning - Disabling holdout means you won't get quality metrics like privacy scores and synthetic data quality - scores. For production use, ensure your dataset has at least 200 records. + +Disabling holdout means you won't get quality metrics like privacy scores and synthetic data quality +scores. For production use, ensure your dataset has at least 200 records. + diff --git a/docs/set-up/config-reference.mdx b/docs/set-up/config-reference.mdx index 93f2e2f6a8..3cc2953d46 100644 --- a/docs/set-up/config-reference.mdx +++ b/docs/set-up/config-reference.mdx @@ -1,4 +1,8 @@ -(platform-config-reference)= +--- +title: "NeMo Platform configuration reference" +description: "" +--- + # NeMo Platform configuration reference diff --git a/docs/set-up/helm/backup-and-restore.mdx b/docs/set-up/helm/backup-and-restore.mdx index da35f6be22..8496004292 100644 --- a/docs/set-up/helm/backup-and-restore.mdx +++ b/docs/set-up/helm/backup-and-restore.mdx @@ -1,6 +1,8 @@ -# Backup and Restore - -{{platform_name}} use several storage systems that require backup consideration: +--- +title: "Backup and Restore" +description: "" +--- +NeMo Platform use several storage systems that require backup consideration: 1. **PostgreSQL database**: stores entity metadata and relationships 2. **Object storage**: stores model files, datasets, and other artifacts diff --git a/docs/set-up/helm/database-setup.mdx b/docs/set-up/helm/database-setup.mdx index 92e231783f..e190682cf7 100644 --- a/docs/set-up/helm/database-setup.mdx +++ b/docs/set-up/helm/database-setup.mdx @@ -1,10 +1,14 @@ +--- +title: "Database Setup" +description: "" +--- # Database Setup -The {{platform_name}} uses a SQL-based database to store entities such as workspaces, jobs, and other records. +The NeMo Platform uses a SQL-based database to store entities such as workspaces, jobs, and other records. -By default, the {{platform_name}} Helm chart deploys an embedded PostgreSQL instance (a StatefulSet using the official Postgres image) with simplified authentication to enable quick installation. This should not be used for production use, and an external PostgreSQL instance is recommended. +By default, the NeMo Platform Helm chart deploys an embedded PostgreSQL instance (a StatefulSet using the official Postgres image) with simplified authentication to enable quick installation. This should not be used for production use, and an external PostgreSQL instance is recommended. ## External PostgreSQL Database diff --git a/docs/set-up/helm/file-storage.mdx b/docs/set-up/helm/file-storage.mdx index 4849fb0d4f..50fba37623 100644 --- a/docs/set-up/helm/file-storage.mdx +++ b/docs/set-up/helm/file-storage.mdx @@ -1,7 +1,11 @@ +--- +title: "File Storage" +description: "" +--- # File Storage -The {{platform_name}} Files service stores uploaded files and datasets. By default, it uses filesystem storage backed by a Kubernetes ReadWriteMany PersistentVolumeClaim (PVC). Alternatively, you can configure S3-compatible object storage. +The NeMo Platform Files service stores uploaded files and datasets. By default, it uses filesystem storage backed by a Kubernetes ReadWriteMany PersistentVolumeClaim (PVC). Alternatively, you can configure S3-compatible object storage. ## Prerequisites @@ -16,7 +20,7 @@ If using S3 storage, you must provision and manage your own bucket and credentia ## Local Storage (Default) -By default, the Files service uses local filesystem storage. Files are stored on the shared PVC configured in [Persistent Volumes](./persistent-volumes.md). +By default, the Files service uses local filesystem storage. Files are stored on the shared PVC configured in [Persistent Volumes](/platform/deploying-on-kubernetes/persistent-volumes). The default configuration is equivalent to: @@ -28,7 +32,7 @@ platformConfig: path: /vol/files ``` -You do not need to add this to your `values.yaml`. Once the PVC is set up as described in [Persistent Volumes](./persistent-volumes.md), local storage works out of the box. +You do not need to add this to your `values.yaml`. Once the PVC is set up as described in [Persistent Volumes](/platform/deploying-on-kubernetes/persistent-volumes), local storage works out of the box. ## S3 Object Storage @@ -127,8 +131,9 @@ platformConfig: use_sdk_auth: true ``` -!!! note - Some older S3-compatible systems may require `signature_version: s3` instead of the default `s3v4`. Only change this if you encounter signature-related errors. + +Some older S3-compatible systems may require `signature_version: s3` instead of the default `s3v4`. Only change this if you encounter signature-related errors. + ### Configuration Reference diff --git a/docs/set-up/helm/index.mdx b/docs/set-up/helm/index.mdx index 919d31b724..15d51d409c 100644 --- a/docs/set-up/helm/index.mdx +++ b/docs/set-up/helm/index.mdx @@ -1,51 +1,55 @@ +--- +title: "Install NeMo Platform with Helm" +description: "" +--- -# Install {{platform_name}} with Helm +# Install NeMo Platform with Helm -The {{platform_name}} is bundled in an all-in-one Helm chart that supports end-to-end microservice deployment. Use the following guides to build and manage a complete data flywheel with the {{platform_name}} on your Kubernetes either on-prem or in the cloud. +The NeMo Platform is bundled in an all-in-one Helm chart that supports end-to-end microservice deployment. Use the following guides to build and manage a complete data flywheel with the NeMo Platform on your Kubernetes either on-prem or in the cloud.
-- **[Prerequisites](prerequisites.md)** +- **[Prerequisites](/platform/deploying-on-kubernetes/prerequisites)** --- - Review the prerequisites for installing the {{helm_chart_short_name}}. + Review the prerequisites for installing the NeMo Platform Helm Chart. cluster-admin -- **[Install }](install.md)** +- **[Install \}](/platform/deploying-on-kubernetes/install)** --- - Install the {{platform_name}} using the Helm chart. + Install the NeMo Platform using the Helm chart. cluster-admin on-prem cloud -- **[Database Setup](database-setup.md)** +- **[Database Setup](/platform/deploying-on-kubernetes/database-setup)** --- - Set up an external database for the {{platform_name}}. + Set up an external database for the NeMo Platform. cluster-admin on-prem cloud -- **[Ingress](ingress.md)** +- **[Ingress](/platform/deploying-on-kubernetes/ingress)** --- - Set up Ingress for the {{platform_name}}. + Set up Ingress for the NeMo Platform. cluster-admin on-prem cloud -- **[Persistent Volumes](persistent-volumes.md)** +- **[Persistent Volumes](/platform/deploying-on-kubernetes/persistent-volumes)** --- - Set up persistent volumes for the {{platform_name}}. + Set up persistent volumes for the NeMo Platform. cluster-admin on-prem cloud -- **[File Storage](file-storage.md)** +- **[File Storage](/platform/deploying-on-kubernetes/file-storage)** --- @@ -53,7 +57,7 @@ The {{platform_name}} is bundled in an all-in-one Helm chart that supports end-t cluster-admin on-prem cloud -- **[Multinode Networking](multinode-networking.md)** +- **[Multinode Networking](/platform/deploying-on-kubernetes/multinode-networking)** --- @@ -61,7 +65,7 @@ The {{platform_name}} is bundled in an all-in-one Helm chart that supports end-t cluster-admin cloud -- **[OpenShift](openshift.md)** +- **[OpenShift](/platform/deploying-on-kubernetes/openshift)** --- @@ -69,11 +73,11 @@ The {{platform_name}} is bundled in an all-in-one Helm chart that supports end-t cluster-admin openshift -- **[Backup and Restore](backup-and-restore.md)** +- **[Backup and Restore](/platform/deploying-on-kubernetes/backup-and-restore)** --- - Set up backup and restore configurations for the {{platform_name}}. + Set up backup and restore configurations for the NeMo Platform. cluster-admin on-prem cloud diff --git a/docs/set-up/helm/ingress.mdx b/docs/set-up/helm/ingress.mdx index 3e3589a41c..2d214038f2 100644 --- a/docs/set-up/helm/ingress.mdx +++ b/docs/set-up/helm/ingress.mdx @@ -1,9 +1,13 @@ +--- +title: "Ingress" +description: "" +--- # Ingress -The {{platform_name}} Helm chart can expose the API service externally using **Kubernetes Ingress**, the **Gateway API HTTPRoute**, or on OpenShift an **OpenShift Route**. Choose one based on your cluster. +The NeMo Platform Helm chart can expose the API service externally using **Kubernetes Ingress**, the **Gateway API HTTPRoute**, or on OpenShift an **OpenShift Route**. Choose one based on your cluster. -**Prerequisites:** Complete [Prerequisites](./prerequisites.md) and [Install](./install.md) (or [OpenShift](./openshift.md)) so the platform is installed. Ensure your cluster has an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) controller (e.g. Traefik, OpenShift IngressController) or a [Gateway API](https://gateway-api.sigs.k8s.io/) Gateway configured. +**Prerequisites:** Complete [Prerequisites](/platform/deploying-on-kubernetes/prerequisites) and [Install](/platform/deploying-on-kubernetes/install) (or [OpenShift](/platform/deploying-on-kubernetes/openshift)) so the platform is installed. Ensure your cluster has an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) controller (e.g. Traefik, OpenShift IngressController) or a [Gateway API](https://gateway-api.sigs.k8s.io/) Gateway configured. ## Kubernetes Ingress @@ -46,7 +50,7 @@ To use standard Kubernetes [Ingress](https://kubernetes.io/docs/concepts/service Install or upgrade with your values file so the Ingress resource is created. After installation, use the URL shown in the Helm notes (`helm status nemo-platform`) or the host you configured. -**Advanced:** For multiple hostnames or different paths per host, leave `defaultHost` unset and use `ingress.hosts` (each entry has `name` and `paths`). See the [{{helm_chart_short_name}} reference](../../helm/index.md) for the full structure. +**Advanced:** For multiple hostnames or different paths per host, leave `defaultHost` unset and use `ingress.hosts` (each entry has `name` and `paths`). See the [NeMo Platform Helm Chart reference](/reference/helm-reference) for the full structure. ## Gateway API HTTPRoute @@ -66,7 +70,7 @@ On clusters that use the [Gateway API](https://gateway-api.sigs.k8s.io/), you ca 2. Install or upgrade with this values file. The HTTPRoute will be created and bound to the specified Gateway(s). -For full option details (e.g. `filters`, `labels`, `annotations`), see the [{{helm_chart_short_name}} reference](../../helm/index.md). +For full option details (e.g. `filters`, `labels`, `annotations`), see the [NeMo Platform Helm Chart reference](/reference/helm-reference). ## OpenShift Route @@ -86,12 +90,12 @@ On Red Hat OpenShift you can expose the API using an [OpenShift Route](https://d 3. Install or upgrade with your values file. After installation, use the URL from the Helm notes (`helm status nemo-platform`) or the host you configured. -For all options (`targetPort`, `annotations`, `labels`), see the [{{helm_chart_short_name}} reference](../../helm/index.md). +For all options (`targetPort`, `annotations`, `labels`), see the [NeMo Platform Helm Chart reference](/reference/helm-reference). ## Related -- [Install](./install.md) — Install steps and upgrade commands -- [OpenShift](./openshift.md) — OpenShift: security context, and optional Route (`openshiftRoute.enabled`), Ingress, or HTTPRoute +- [Install](/platform/deploying-on-kubernetes/install) — Install steps and upgrade commands +- [OpenShift](/platform/deploying-on-kubernetes/openshift) — OpenShift: security context, and optional Route (`openshiftRoute.enabled`), Ingress, or HTTPRoute ## Cloud Provider Specific Ingress @@ -135,7 +139,7 @@ On EKS, use the [AWS Load Balancer Controller](https://kubernetes-sigs.github.io alb.ingress.kubernetes.io/target-type: ip ``` -3. Install or upgrade with your values file. To verify, check that an ALB was provisioned with `kubectl get ingress -n `. +3. Install or upgrade with your values file. To verify, check that an ALB was provisioned with `kubectl get ingress -n <namespace>`. ### GKE Managed Ingress (GCE Ingress Controller) @@ -152,5 +156,5 @@ On GKE, use the built-in [GCE Ingress controller](https://cloud.google.com/kuber defaultHost: "nmp.example.com" # optional ``` -3. Install or upgrade with your values file. To verify, check that a load balancer was provisioned with `kubectl get ingress -n `. +3. Install or upgrade with your values file. To verify, check that a load balancer was provisioned with `kubectl get ingress -n <namespace>`. diff --git a/docs/set-up/helm/install.mdx b/docs/set-up/helm/install.mdx index f82e878acf..be98c170b8 100644 --- a/docs/set-up/helm/install.mdx +++ b/docs/set-up/helm/install.mdx @@ -1,11 +1,17 @@ +--- +title: "Install NeMo Platform Helm Chart" +description: "" +--- -# Install {{helm_chart_short_name}} +# Install NeMo Platform Helm Chart -!!! tip "Note: This setup is the full enterprise platform, meant for advanced use. If you're just getting started, check out [setting up a local instance of the platform](../../get-started/setup.md) instead — it's faster and easier to explore the basics." + +Note: This setup is the full enterprise platform, meant for advanced use. If you're just getting started, check out [setting up a local instance of the platform](/get-started/setup) instead — it's faster and easier to explore the basics. + -To deploy the {{platform_name}}, follow these steps after completing the [Prerequisites](./prerequisites.md). +To deploy the NeMo Platform, follow these steps after completing the [Prerequisites](/platform/deploying-on-kubernetes/prerequisites). -1. Add the {{helm_chart_short_name}} to your local Helm repositories. +1. Add the NeMo Platform Helm Chart to your local Helm repositories. ```sh helm repo add nmp https://helm.ngc.nvidia.com/nvidia/nemo-microservices \ @@ -16,24 +22,24 @@ To deploy the {{platform_name}}, follow these steps after completing the [Prereq helm repo update ``` -2. Review the default values in the [{{helm_chart_short_name}} reference](../../helm/index.md). To override the default values, create a custom values file. Review the following while creating your custom values file. +2. Review the default values in the [NeMo Platform Helm Chart reference](/reference/helm-reference). To override the default values, create a custom values file. Review the following while creating your custom values file. - - To configure an external database, see [Database Setup](./database-setup.md). - - To configure persistent volumes for jobs and files storage, see [Persistent Volumes](./persistent-volumes.md). - - To configure file storage options, see [File Storage](./file-storage.md). - - To configure ingress, see [Ingress](./ingress.md). - - To configure multi-node networking, see [Multi-Node Networking](./multinode-networking.md). - - To configure OpenShift-compatible security context overrides, see [OpenShift](./openshift.md). + - To configure an external database, see [Database Setup](/platform/deploying-on-kubernetes/database-setup). + - To configure persistent volumes for jobs and files storage, see [Persistent Volumes](/platform/deploying-on-kubernetes/persistent-volumes). + - To configure file storage options, see [File Storage](/platform/deploying-on-kubernetes/file-storage). + - To configure ingress, see [Ingress](/platform/deploying-on-kubernetes/ingress). + - To configure multi-node networking, see [Multi-Node Networking](/platform/deploying-on-kubernetes/multinode-networking). + - To configure OpenShift-compatible security context overrides, see [OpenShift](/platform/deploying-on-kubernetes/openshift). 3. Install the Volcano scheduler before installing the chart. This is required for customization jobs that leverage multiple nodes. ```sh -{% raw %} + kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v{{volcano_version}}/installer/volcano-development.yaml -{% endraw %} + ``` - After applying, wait for the Volcano admission webhook to finish initializing before proceeding. The webhook registers immediately with `failurePolicy: Fail`, but TLS certificate generation runs asynchronously. If you proceed before the webhook is ready, all pod creation — including the {{platform_name}} Helm install — will fail with a certificate error. + After applying, wait for the Volcano admission webhook to finish initializing before proceeding. The webhook registers immediately with `failurePolicy: Fail`, but TLS certificate generation runs asynchronously. If you proceed before the webhook is ready, all pod creation — including the NeMo Platform Helm install — will fail with a certificate error. ```sh kubectl wait --for=condition=complete job/volcano-admission-init -n volcano-system --timeout=120s @@ -56,7 +62,7 @@ To deploy the {{platform_name}}, follow these steps after completing the [Prereq kubectl get pods ``` -For Red Hat OpenShift, use OpenShift-compatible security context overrides so pods satisfy the restricted SCC. See [OpenShift](./openshift.md). +For Red Hat OpenShift, use OpenShift-compatible security context overrides so pods satisfy the restricted SCC. See [OpenShift](/platform/deploying-on-kubernetes/openshift). To upgrade the deployment with new configurations, use the following command: @@ -72,10 +78,13 @@ To uninstall the deployment, use the following command: helm uninstall nemo-platform ``` -!!! note "`helm uninstall` intentionally does **not** remove all resources:" - - **PVCs** are preserved to prevent accidental data loss. Delete them manually if no longer needed. - - **CRDs** are not removed by Helm design ([upstream issue](https://github.com/helm/helm/issues/4840)) to avoid destroying custom resources across the cluster. - - **Completed jobs, secrets, and the namespace** may also remain. + +`helm uninstall` intentionally does **not** remove all resources: +- **PVCs** are preserved to prevent accidental data loss. Delete them manually if no longer needed. +- **CRDs** are not removed by Helm design ([upstream issue](https://github.com/helm/helm/issues/4840)) to avoid destroying custom resources across the cluster. +- **Completed jobs, secrets, and the namespace** may also remain. + + If you need a complete teardown (e.g., for CI/CD pipelines or reinstalling in the same namespace), run the following after `helm uninstall`: ```sh @@ -86,7 +95,9 @@ kubectl delete namespace kubectl delete crd ``` -!!! warning "Deleting CRDs removes all custom resources of those types cluster-wide. Only do this if no other workloads depend on them." + +Deleting CRDs removes all custom resources of those types cluster-wide. Only do this if no other workloads depend on them. + ## Troubleshooting diff --git a/docs/set-up/helm/multinode-networking.mdx b/docs/set-up/helm/multinode-networking.mdx index 097354de0a..84e3b87f9e 100644 --- a/docs/set-up/helm/multinode-networking.mdx +++ b/docs/set-up/helm/multinode-networking.mdx @@ -1,3 +1,7 @@ +--- +title: "Multinode Networking" +description: "" +--- # Multinode Networking @@ -10,8 +14,9 @@ Multi-node GPU training requires high-bandwidth, low-latency east-west networkin - Nodes with high-performance networking hardware (EFA, InfiniBand, etc.) - The cloud provider's device plugin or network operator deployed so that networking resources are visible to Kubernetes -!!! note - The {{platform_name}} does not provision or manage the underlying cloud networking infrastructure. Each cloud provider section below lists the cluster-level prerequisites your administrator must configure. Refer to your cloud provider's documentation for setup instructions. + +The NeMo Platform does not provision or manage the underlying cloud networking infrastructure. Each cloud provider section below lists the cluster-level prerequisites your administrator must configure. Refer to your cloud provider's documentation for setup instructions. + ## How It Works @@ -21,8 +26,9 @@ When a job requests more than one node, the jobs controller annotates the pod wi Enable **exactly one** cloud provider in the `multinodeNetworking` section of your `values.yaml`. Enabling more than one causes a Helm install error. -!!! info - Device-count parameters (`efaDevicesPerGPU`, `rdmaDevicesPerGPU`) must match the hardware ratio of your instance type. A mismatch causes jobs to either fail scheduling or silently run without high-speed networking. + +Device-count parameters (`efaDevicesPerGPU`, `rdmaDevicesPerGPU`) must match the hardware ratio of your instance type. A mismatch causes jobs to either fail scheduling or silently run without high-speed networking. + ### AWS (EFA) diff --git a/docs/set-up/helm/openshift.mdx b/docs/set-up/helm/openshift.mdx index 727ef25034..b496c8383c 100644 --- a/docs/set-up/helm/openshift.mdx +++ b/docs/set-up/helm/openshift.mdx @@ -1,11 +1,15 @@ +--- +title: "OpenShift" +description: "" +--- # OpenShift -The {{platform_name}} chart works on Red Hat OpenShift when security context overrides are applied. OpenShift’s **restricted** or **restricted-v2** Security Context Constraint (SCC) requires pods to run as non-root with explicit `runAsUser` and `runAsNonRoot`. +The NeMo Platform chart works on Red Hat OpenShift when security context overrides are applied. OpenShift’s **restricted** or **restricted-v2** Security Context Constraint (SCC) requires pods to run as non-root with explicit `runAsUser` and `runAsNonRoot`. ## Values -You can override the default values for the {{platform_name}} chart to make it compatible with OpenShift. +You can override the default values for the NeMo Platform chart to make it compatible with OpenShift. 1. **Create the OpenShift values file.** Save the following as `openshift-values.yaml`: @@ -34,7 +38,7 @@ You can override the default values for the {{platform_name}} chart to make it c fsGroup: 999 ``` -2. **Install** with your custom values and the OpenShift overrides (order matters; later files override earlier). Complete the [Prerequisites](./prerequisites.md) and follow [Install](./install.md), using the provided OpenShift values file. +2. **Install** with your custom values and the OpenShift overrides (order matters; later files override earlier). Complete the [Prerequisites](/platform/deploying-on-kubernetes/prerequisites) and follow [Install](/platform/deploying-on-kubernetes/install), using the provided OpenShift values file. ```sh helm upgrade --install --namespace \ @@ -47,9 +51,9 @@ You can override the default values for the {{platform_name}} chart to make it c You can expose the API using the following methods: -- **Kubernetes Ingress** — supported by OpenShift’s default IngressController; set `ingress.enabled: true` and `ingress.defaultHost` as in [Ingress](./ingress.md). -- **Gateway API HTTPRoute** — optional; configure `httpRoute` in values. See [Gateway API HTTPRoute](./ingress.md#gateway-api-httproute). -- **OpenShift Route** — enable the chart’s Route and set an optional hostname. See [OpenShift Route](./ingress.md#openshift-route) for steps. +- **Kubernetes Ingress** — supported by OpenShift’s default IngressController; set `ingress.enabled: true` and `ingress.defaultHost` as in [Ingress](/platform/deploying-on-kubernetes/ingress). +- **Gateway API HTTPRoute** — optional; configure `httpRoute` in values. See [Gateway API HTTPRoute](/platform/deploying-on-kubernetes/ingress#gateway-api-httproute). +- **OpenShift Route** — enable the chart’s Route and set an optional hostname. See [OpenShift Route](/platform/deploying-on-kubernetes/ingress#openshift-route) for steps. ## Troubleshooting diff --git a/docs/set-up/helm/persistent-volumes.mdx b/docs/set-up/helm/persistent-volumes.mdx index 3baad49844..41cf731762 100644 --- a/docs/set-up/helm/persistent-volumes.mdx +++ b/docs/set-up/helm/persistent-volumes.mdx @@ -1,7 +1,11 @@ +--- +title: "Persistent Volumes" +description: "" +--- # Persistent Volumes -The {{platform_name}} uses persistent volume claims (PVCs) for jobs and files storage that can be mounted on multiple pods and nodes in read-write mode. This access mode is called ReadWriteMany (RWX) in Kubernetes. Using a ReadWriteMany-capable StorageClass is required for both jobs and files storage. +The NeMo Platform uses persistent volume claims (PVCs) for jobs and files storage that can be mounted on multiple pods and nodes in read-write mode. This access mode is called ReadWriteMany (RWX) in Kubernetes. Using a ReadWriteMany-capable StorageClass is required for both jobs and files storage. NVIDIA NIM microservices also scale, upgrade, and deploy more smoothly with an RWX-backed storage class. @@ -9,10 +13,11 @@ The platform does not manage storage classes. You must install an appropriate st ## Jobs and Files storage -The {{platform_name}} chart creates a single shared PVC for jobs and files storage, configured under `core.storage` in `values.yaml`. +The NeMo Platform chart creates a single shared PVC for jobs and files storage, configured under `core.storage` in `values.yaml`. -!!! note - As an alternative to PVC-based file storage, you can configure the Files service to use S3 object storage. See [File Storage](./file-storage.md) for S3 configuration options. When using S3 for files, the shared PVC is still required for jobs storage. + +As an alternative to PVC-based file storage, you can configure the Files service to use S3 object storage. See [File Storage](/platform/deploying-on-kubernetes/file-storage) for S3 configuration options. When using S3 for files, the shared PVC is still required for jobs storage. + ### Option 1: Create a new PVC (default) @@ -49,7 +54,7 @@ When set, the chart does not create a new PVC; pods mount the named volume. ## NIM storage class -For NIM deployments launched via the {{platform_name}}, you can set the default StorageClass used by NIM PVCs via platform config. In `values.yaml`, under `platformConfig`: +For NIM deployments launched via the NeMo Platform, you can set the default StorageClass used by NIM PVCs via platform config. In `values.yaml`, under `platformConfig`: ```yaml platformConfig: @@ -63,7 +68,7 @@ platformConfig: Replace `"nfs"` with your StorageClass name (e.g. `oci-nfs`, `gp3`). For NIM scaling and multi-node deployments, use a ReadWriteMany-capable StorageClass. -Refer to the [platform configuration documentation](../config-reference.md) for the full config reference. +Refer to the [platform configuration documentation](/reference/config-reference) for the full config reference. ## Persistent volume options @@ -91,7 +96,7 @@ For high-performance workloads: ## Azure persistent volumes -On AKS, Azure Disk (`managed-csi`) only supports ReadWriteOnce (RWO). Use **Azure Files** for the shared RWX volumes required by {{platform_name}}. PostgreSQL must use `managed-csi` because Azure Files does not support the POSIX permissions it requires. +On AKS, Azure Disk (`managed-csi`) only supports ReadWriteOnce (RWO). Use **Azure Files** for the shared RWX volumes required by NeMo Platform. PostgreSQL must use `managed-csi` because Azure Files does not support the POSIX permissions it requires. 1. Enable the [Azure Files CSI driver](https://learn.microsoft.com/en-us/azure/aks/azure-files-csi) if not already installed in your cluster. 2. Set `core.storage.storageClass` to `azurefile` (or `azurefile-csi` on AKS 1.29+) and `postgresql.persistence.storageClass` to `managed-csi`. diff --git a/docs/set-up/helm/prerequisites.mdx b/docs/set-up/helm/prerequisites.mdx index 360f47efa1..9b3a8cc1b2 100644 --- a/docs/set-up/helm/prerequisites.mdx +++ b/docs/set-up/helm/prerequisites.mdx @@ -1,22 +1,26 @@ +--- +title: "Prerequisites" +description: "" +--- # Prerequisites -Before installing the {{helm_chart_short_name}}, review requirements and create necessary secrets using your NGC API key. +Before installing the NeMo Platform Helm Chart, review requirements and create necessary secrets using your NGC API key. --- ## Review Requirements -To check the hardware and software specifications required for installing the {{helm_chart_short_name}}, review [](../../requirements.md). +To check the hardware and software specifications required for installing the NeMo Platform Helm Chart, review [](/reference/system-requirements). -For storage and persistent volume configuration (including ReadWriteMany and StorageClass configuration), see [](persistent-volumes.md). +For storage and persistent volume configuration (including ReadWriteMany and StorageClass configuration), see [](/platform/deploying-on-kubernetes/persistent-volumes). --- ## Create NGC API Key and Secrets -You need to have an NGC account, create an NGC API key, and create secrets using the key to access and download the {{helm_chart_short_name}} and Docker images. +You need to have an NGC account, create an NGC API key, and create secrets using the key to access and download the NeMo Platform Helm Chart and Docker images. To learn more about using Helm charts in the NGC Catalog console in general, refer to the [Helm Charts](https://docs.nvidia.com/ngc/gpu-cloud/ngc-catalog-user-guide/index.html#helm-charts) section in the _NGC Catalog User Guide_. @@ -62,4 +66,4 @@ To learn more about using Helm charts in the NGC Catalog console in general, ref --namespace ``` -For more information about various secrets you might need to create and set up through the {{helm_chart_short_name}} depending on your use case, refer to [](../../get-started/concepts/manage-secrets.md). +For more information about various secrets you might need to create and set up through the NeMo Platform Helm Chart depending on your use case, refer to [](/get-started/core-concepts/manage-secrets). diff --git a/docs/set-up/index.mdx b/docs/set-up/index.mdx index 8d408d0984..c2a4758594 100644 --- a/docs/set-up/index.mdx +++ b/docs/set-up/index.mdx @@ -1,37 +1,41 @@ +--- +title: "About Platform Setup" +description: "" +--- # About Platform Setup -This section describes how to set up the {{platform_name}} on your Kubernetes cluster using the {{helm_chart_short_name}}. -With this chart, you can deploy the {{platform_name}} as a full deployment or a subset of the APIs as you need. +This section describes how to set up the NeMo Platform on your Kubernetes cluster using the NeMo Platform Helm Chart. +With this chart, you can deploy the NeMo Platform as a full deployment or a subset of the APIs as you need. This Platform Setup chapter is for the following personas. -- **Cloud administrators**: Manage Kubernetes clusters and compute/storage resources. Deploy {{platform_name}} to the Kubernetes clusters on premises or cloud. +- **Cloud administrators**: Manage Kubernetes clusters and compute/storage resources. Deploy NeMo Platform to the Kubernetes clusters on premises or cloud. --- -## {{helm_chart_short_name}} +## NeMo Platform Helm Chart -The [{{helm_chart_short_name}}](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/helm-charts/nemo-platform) is an all-in-one Helm chart that bundles the complete {{platform_name}} ecosystem and all required dependencies for full platform deployment. +The [NeMo Platform Helm Chart](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/helm-charts/nemo-platform) is an all-in-one Helm chart that bundles the complete NeMo Platform ecosystem and all required dependencies for full platform deployment. You can also customize the configuration of your installation by updating the `values.yaml` file. You can also use the pre-configured tags to install only specific microservices that you need. -For the chart assets and additional details, refer to the [{{platform_name}} Collection](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices) page in the NVIDIA NGC Catalog. +For the chart assets and additional details, refer to the [NeMo Platform Collection](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices) page in the NVIDIA NGC Catalog. --- -## Deploy the {{platform_name}} with Helm +## Deploy the NeMo Platform with Helm -The following sections provide detailed instructions on how to deploy the {{platform_name}} using the {{helm_chart_short_name}}. +The following sections provide detailed instructions on how to deploy the NeMo Platform using the NeMo Platform Helm Chart.
-- **[Install }](helm/index.md)** +- **[Install \}](/platform/deploying-on-kubernetes/overview)** --- - Install the {{platform_name}} using the chart on your Kubernetes cluster. + Install the NeMo Platform using the chart on your Kubernetes cluster. cluster-admin @@ -45,16 +49,16 @@ Review and manage other cluster settings.
-- **[OpenTelemetry](opentelemetry.md)** +- **[OpenTelemetry](/platform/observability)** --- - Review and configure how {{platform_name}} use Open Telemetry for observability. + Review and configure how NeMo Platform use Open Telemetry for observability. -- **[Milvus](milvus.md)** +- **[Milvus](/platform/milvus)** --- - Review and configure how {{platform_name}} uses Milvus for vector database storage. + Review and configure how NeMo Platform uses Milvus for vector database storage.
\ No newline at end of file diff --git a/docs/set-up/manage-jobs.mdx b/docs/set-up/manage-jobs.mdx index 45d59c0099..92cd57be44 100644 --- a/docs/set-up/manage-jobs.mdx +++ b/docs/set-up/manage-jobs.mdx @@ -1,7 +1,11 @@ +--- +title: "Manage Jobs" +description: "" +--- # Manage Jobs -This section describes how to configure jobs in {{platform_name}}. The Jobs service is responsible for scheduling batch jobs, collecting telemetry and managing job results. +This section describes how to configure jobs in NeMo Platform. The Jobs service is responsible for scheduling batch jobs, collecting telemetry and managing job results. ## Execution Profiles @@ -13,7 +17,7 @@ An execution profile is defined by the following attributes: - A compute provider (e.g. `cpu` or `gpu`) - An [execution backend](#execution-backends) (e.g. `docker`, `kubernetes_job`, `volcano_job`) -By default, the {{platform_name}} defines a default CPU and default GPU provider to launch CPU and GPU bound jobs. +By default, the NeMo Platform defines a default CPU and default GPU provider to launch CPU and GPU bound jobs. You can configure multiple execution profiles to suit the shape of your compute environment. For example, if you have a compute environment with heterogeneous infrastructure (e.g., two types of GPU hardware such as A100 and H200), you can define a list of execution profiles as follows: @@ -45,16 +49,16 @@ jobs: node-pool-name: h200-pool ``` -For full configuration details, see the [platform configuration reference](config-reference.md). +For full configuration details, see the [platform configuration reference](/reference/config-reference). ### Default Execution Profiles -The {{platform_name}} defines a default execution profile for each execution backend depending on the platform's control plane. The default execution profile is used when no specific execution profile is specified for a job. +The NeMo Platform defines a default execution profile for each execution backend depending on the platform's control plane. The default execution profile is used when no specific execution profile is specified for a job. - Default CPU Execution Profile (`cpu`): The default execution profile for CPU based jobs. - Default GPU Execution Profile (`gpu`): The default execution profile for GPU based jobs. -You may configure the default execution profiles by updating the `executor_defaults` section of the `jobs` section of the [platform configuration](config-reference.md). The structure of the `executor_defaults` matches the configuration of the execution backend configuration. +You may configure the default execution profiles by updating the `executor_defaults` section of the `jobs` section of the [platform configuration](/reference/config-reference). The structure of the `executor_defaults` matches the configuration of the execution backend configuration. ```yaml jobs: @@ -87,9 +91,9 @@ jobs: ## Execution Backends -Execution backends are the containerized job execution systems that {{platform_name}} jobs are scheduled to run on. Each execution backend is responsible for launching and managing the job containers, and the Jobs service communicates with the execution backend to schedule and manage jobs. +Execution backends are the containerized job execution systems that NeMo Platform jobs are scheduled to run on. Each execution backend is responsible for launching and managing the job containers, and the Jobs service communicates with the execution backend to schedule and manage jobs. -{{platform_name}} currently supports the following execution backends: +NeMo Platform currently supports the following execution backends: - Docker (`docker`) - Kubernetes Jobs (`kubernetes_job`) @@ -97,11 +101,13 @@ Execution backends are the containerized job execution systems that {{platform_n ### Docker -The {{platform_name}} supports Docker as an execution backend for CPU and GPU based jobs. When the shared GPU pool is configured (see below), Docker will use only the configured GPU devices and will not over-schedule jobs if there are not currently enough GPUs available. +The NeMo Platform supports Docker as an execution backend for CPU and GPU based jobs. When the shared GPU pool is configured (see below), Docker will use only the configured GPU devices and will not over-schedule jobs if there are not currently enough GPUs available. -**Note:** The {{platform_name}} supports Docker-based job execution by default when running the {{platform_name}} in quickstart mode, which requires no configuration. +**Note:** The NeMo Platform supports Docker-based job execution by default when running the NeMo Platform in quickstart mode, which requires no configuration. -!!! tip "If you are running GPU jobs with Docker, see [GPU Configuration](../get-started/setup.md) for information on configuring the shared GPU pool. This configuration is shared between the jobs and models services to prevent GPU resource conflicts." + +If you are running GPU jobs with Docker, see [GPU Configuration](/get-started/setup) for information on configuring the shared GPU pool. This configuration is shared between the jobs and models services to prevent GPU resource conflicts. + ```yaml jobs: @@ -125,7 +131,7 @@ jobs: ### Kubernetes Jobs -The {{platform_name}} supports Kubernetes Jobs as an execution backend for CPU and GPU based jobs. +The NeMo Platform supports Kubernetes Jobs as an execution backend for CPU and GPU based jobs. ```yaml jobs: @@ -204,7 +210,7 @@ See the [KAI Scheduler documentation](https://github.com/kai-scheduler/KAI-Sched ### Volcano Jobs -The {{platform_name}} supports Volcano Jobs as an execution backend for launching distributed GPU jobs. +The NeMo Platform supports Volcano Jobs as an execution backend for launching distributed GPU jobs. Volcano Jobs are configured using the same configuration as Kubernetes Jobs, but with the following additional configuration: @@ -212,7 +218,7 @@ Volcano Jobs are configured using the same configuration as Kubernetes Jobs, but - `scheduler_name`: The Volcano scheduler to use for the job. - `plugins`: The Volcano plugins to use for the job. - `max_retry`: The maximum number of retries for the job. -- `enable_multi_node_networking`: Enable multi-node networking injection. Sets annotations to trigger Kyverno policy mutations. This is only available if the platform is configured to use multi-node networking (see [Multi-Node Networking](helm/multinode-networking.md)). +- `enable_multi_node_networking`: Enable multi-node networking injection. Sets annotations to trigger Kyverno policy mutations. This is only available if the platform is configured to use multi-node networking (see [Multi-Node Networking](/platform/deploying-on-kubernetes/multinode-networking)). ```yaml jobs: diff --git a/docs/set-up/milvus.mdx b/docs/set-up/milvus.mdx index 616a729530..bd483a7e06 100644 --- a/docs/set-up/milvus.mdx +++ b/docs/set-up/milvus.mdx @@ -1,15 +1,19 @@ +--- +title: "Milvus" +description: "" +--- # Milvus -{{nem_short_name}} uses Milvus for vector database storage for [Retrieval evaluations](../evaluator/metrics/retriever.md) and [RAG evaluations](../evaluator/metrics/rag.md). +NeMo Evaluator uses Milvus for vector database storage for Retrieval evaluations and [RAG evaluations](/evaluation/metrics/rag-metrics). ## Configuration -To configure {{nem_short_name}} to use Milvus, set the `milvus_url` in the [platform configuration](config-reference.md): +To configure NeMo Evaluator to use Milvus, set the `milvus_url` in the [platform configuration](/reference/config-reference): ```yaml evaluator: milvus_url: "milvus-standalone.default.svc.cluster.local:19530" ``` -See the [platform configuration reference](config-reference.md) for the complete {{nem_short_name}} configuration reference. +See the [platform configuration reference](/reference/config-reference) for the complete NeMo Evaluator configuration reference. diff --git a/docs/set-up/opentelemetry.mdx b/docs/set-up/opentelemetry.mdx index 23a50bebc4..2545c8c5af 100644 --- a/docs/set-up/opentelemetry.mdx +++ b/docs/set-up/opentelemetry.mdx @@ -1,15 +1,19 @@ +--- +title: "OpenTelemetry Setup" +description: "" +--- # OpenTelemetry Setup -Set up OpenTelemetry configurations to gain visibility into the operations and performance of the {{platform_name}}. +Set up OpenTelemetry configurations to gain visibility into the operations and performance of the NeMo Platform. ## Configuration -The {{platform_name}} uses OpenTelemetry to collect telemetry data from the platform and services. It leverages common OpenTelemetry SDK [configuration options](https://opentelemetry.io/docs/languages/sdk-configuration/) to configure the platform deployment. +The NeMo Platform uses OpenTelemetry to collect telemetry data from the platform and services. It leverages common OpenTelemetry SDK [configuration options](https://opentelemetry.io/docs/languages/sdk-configuration/) to configure the platform deployment. ## Helm Configuration -The {{helm_chart_short_name}} `values.yaml` exposes OpenTelemetry SDK options to configure the platform deployment. For example, to enable OpenTelemetry for the platform: +The NeMo Platform Helm Chart `values.yaml` exposes OpenTelemetry SDK options to configure the platform deployment. For example, to enable OpenTelemetry for the platform: ```yaml telemetry: @@ -19,4 +23,4 @@ telemetry: OTEL_EXPORTER_OTLP_INSECURE: true ``` -For a complete list of the default values, refer to [Helm Configuration](../helm/index.md). +For a complete list of the default values, refer to [Helm Configuration](/reference/helm-reference). diff --git a/docs/set-up/security.mdx b/docs/set-up/security.mdx index fc862f36c8..ee59d640b2 100644 --- a/docs/set-up/security.mdx +++ b/docs/set-up/security.mdx @@ -1,24 +1,29 @@ +--- +title: "Security for NeMo Platform" +description: "" +--- -# Security for {{platform_name}} +# Security for NeMo Platform -This page provides security guidelines and best practices for deploying and managing {{platform_name}} in production environments. +This page provides security guidelines and best practices for deploying and managing NeMo Platform in production environments. ## Security Considerations -- The {{platform_name}} does not impose rate limits. You must implement a rate-limiting strategy to restrict access to your application. -- The {{platform_name}} does not have an internal notion of a user. To restrict authorization to specific endpoints or users, implement an external mechanism such as an Envoy proxy. -- The {{nds_short_name}} microservice does not provide object-class-specific access controls. All items reside within a single access control boundary. -- The {{platform_name}}, by design, can access all content in the {{nds_short_name}} microservice, including LoRA adapters, training data, evaluation data, and evaluation results. +- The NeMo Platform does not impose rate limits. You must implement a rate-limiting strategy to restrict access to your application. +- The NeMo Platform does not have an internal notion of a user. To restrict authorization to specific endpoints or users, implement an external mechanism such as an Envoy proxy. +- The NeMo Data Store microservice does not provide object-class-specific access controls. All items reside within a single access control boundary. +- The NeMo Platform, by design, can access all content in the NeMo Data Store microservice, including LoRA adapters, training data, evaluation data, and evaluation results. This access is required for model evaluation. Carefully weigh the risk of data exposure from serving customized models directly to production against the overhead of a separate deployment. -- The {{platform_name}} is not intended to be internet-facing. Deploy them as the logic (middle) tier in a three-tier architecture. +- The NeMo Platform is not intended to be internet-facing. Deploy them as the logic (middle) tier in a three-tier architecture. - You are responsible for securing access to any application using the microservices. This includes: - Implementing an authentication layer between users and your application - Applying required authorization controls - Securing communication between services in your application -!!! note - Refer to the [NVIDIA Product Security](https://www.nvidia.com/en-us/security/psirt-policies/) page for information about subscribing to bulletins and updates, managing vulnerabilities, and reporting vulnerabilities. + +Refer to the [NVIDIA Product Security](https://www.nvidia.com/en-us/security/psirt-policies/) page for information about subscribing to bulletins and updates, managing vulnerabilities, and reporting vulnerabilities. + ## Default Network Ports @@ -27,31 +32,31 @@ The following table lists the default network ports for each microservice or def | Network Port | Microservice | | --- | --- | | 443/TCP | NeMo Admission Service API | -| 3000/TCP | {{nds_short_name}} API | -| 7331/TCP | {{nem_short_name}} API | -| 7331/TCP | {{ngm_short_name}} API | -| 8000/TCP | {{nim_short_name}} API | +| 3000/TCP | NeMo Data Store API | +| 7331/TCP | NeMo Evaluator API | +| 7331/TCP | NeMo Guardrails API | +| 8000/TCP | NIM API | | 8000/TCP | NeMo Retriever Text Embedding API | | 8000/TCP | NeMo Retriever Text Reranking API | -| 8000/TCP | {{ncm_short_name}} API | -| 8000/TCP | {{nes_short_name}} API | -| 8443/TCP | {{nop_short_name}} metrics | +| 8000/TCP | NeMo Customizer API | +| 8000/TCP | NeMo Entity Store API | +| 8443/TCP | NeMo Operator metrics | | 8080/TCP | Volcano Scheduler metrics | -| 9009/TCP | {{ncm_short_name}} callback | +| 9009/TCP | NeMo Customizer callback | -By default, the {{helm_chart_short_name}} configures databases with the following network ports. +By default, the NeMo Platform Helm Chart configures databases with the following network ports. Alternatively, you can configure each microservice to use an external database during installation. | Network Port | Database | | --- | --- | -| 5432/TCP | {{ncm_short_name}} Database | -| 5432/TCP | {{nds_short_name}} Database | -| 5432/TCP | {{nem_short_name}} Database | -| 5432/TCP | {{nes_short_name}} Database | +| 5432/TCP | NeMo Customizer Database | +| 5432/TCP | NeMo Data Store Database {/* nemo-postgresql */} | +| 5432/TCP | NeMo Evaluator Database | +| 5432/TCP | NeMo Entity Store Database | | 9091/TCP | Milvus metrics | | 19530/TCP | Milvus API | -By default, the {{helm_chart_short_name}} installs an open telemetry collector, which uses the following network ports: +By default, the NeMo Platform Helm Chart installs an open telemetry collector, which uses the following network ports: - 4317/TCP - 4318/TCP diff --git a/docs/studio/agents.mdx b/docs/studio/agents.mdx index c87a32db00..7d58e42f73 100644 --- a/docs/studio/agents.mdx +++ b/docs/studio/agents.mdx @@ -1,6 +1,8 @@ -# {{studio_short_name}} Agents - -Use **Agents** in the {{studio_short_name}} workspace sidebar to review and operate agent workflows managed by {{platform_name}}. +--- +title: "Agents" +description: "" +--- +Use **Agents** in the NeMo Studio workspace sidebar to review and operate agent workflows managed by NeMo Platform. ## Agent List @@ -9,10 +11,10 @@ The Agents table shows the agents in the selected workspace, including their dep | Action | Where | Result | |--------|-------|--------| | Review details | Select an agent row | Opens the agent side panel with name, workspace, description, and config format. | -| Deploy an agent | Row actions > **Deploy** | Creates a deployment for the selected agent. | -| Chat with a deployment | Agent side panel > **Deployments** > **Chat** | Opens the chat playground for a running deployment. | -| Delete a deployment | Agent side panel > **Deployments** > **Delete** | Removes that deployment. | -| Delete an agent | Row actions > **Delete** | Removes the stored agent definition from the workspace. | +| Deploy an agent | Row actions > **Deploy** | Creates a deployment for the selected agent. | +| Chat with a deployment | Agent side panel > **Deployments** > **Chat** | Opens the chat playground for a running deployment. | +| Delete a deployment | Agent side panel > **Deployments** > **Delete** | Removes that deployment. | +| Delete an agent | Row actions > **Delete** | Removes the stored agent definition from the workspace. | ## Agent Details @@ -21,10 +23,10 @@ The agent side panel has two primary areas: - **Agent Details** shows the stored agent metadata and config format. - **Deployments** lists active and historical deployments for the agent, including endpoint and status. -Agents are created and updated through the `nemo agents` CLI or Agents API. {{studio_short_name}} reflects the current workspace state after the platform services refresh. +Agents are created and updated through the `nemo agents` CLI or Agents API. NeMo Studio reflects the current workspace state after the platform services refresh. ## Related Topics -- [About Agents](../agents/index.md) -- [Optimize Agents](../agents/optimization.md) -- [Secure Agents](../agents/security.md) +- [About Agents](/agents) +- [Optimize Agents](/agents/optimize-agents) +- [Secure Agents](/agents/secure-agents) diff --git a/docs/studio/index.mdx b/docs/studio/index.mdx index 0d14af405e..40132d36f8 100644 --- a/docs/studio/index.mdx +++ b/docs/studio/index.mdx @@ -1,16 +1,19 @@ +--- +title: "About" +description: "" +--- -# About {{studio_long_name}} +# About NVIDIA NeMo Studio -!!! note - Studio is still in early development. Many features are missing or should be expected to change. +Studio is still in early development. Many features are missing or should be expected to change. -{{studio_short_name}} is the web app for AI development with NVIDIA {{platform_name}}. It provides a workspace-oriented UI for managing local platform resources, reviewing agents, running agent optimization workflows, monitoring agent telemetry, and working with datasets, jobs, and secrets. +NeMo Studio is the web app for AI development with NVIDIA NeMo Platform. It provides a workspace-oriented UI for managing local platform resources, reviewing agents, running agent optimization workflows, monitoring agent telemetry, and working with datasets, jobs, and secrets. --- ## Getting Started -{{studio_short_name}} is included with the platform. Follow the [Setup guide](../get-started/setup.md) to start {{platform_name}}, then access Studio at `/studio` on your running server. +NeMo Studio is included with the platform. Follow the [Setup guide](/get-started/setup) to start NeMo Platform, then access Studio at `/studio` on your running server. ## Features @@ -18,19 +21,19 @@ Use the Studio Agents area to review platform-managed NeMo Agent Toolkit workflows, inspect agent details, deploy agents, open a chat session against a running deployment, and clean up deployments that are no longer needed. -For the full workflow, see [Studio Agents](agents.md). +For the full workflow, see [Studio Agents](/studio-alpha/agents). ### Suggestions -Use **Agents > Suggestions** to review optimizer suggestions for deployed agents. Studio groups suggestions by workspace or agent and lets you filter by type, priority, scope, and agent. +Use **Agents > Suggestions** to review optimizer suggestions for deployed agents. Studio groups suggestions by workspace or agent and lets you filter by type, priority, scope, and agent. -For the full workflow, see [Studio Suggestions](suggestions.md). +For the full workflow, see [Studio Suggestions](/studio-alpha/suggestions). ### Monitor -Use **Agents > Monitor** to inspect agent telemetry stored by the platform, including recent inference logs and token usage summaries. +Use **Agents > Monitor** to inspect agent telemetry stored by the platform, including recent inference logs and token usage summaries. -For the full workflow, see [Studio Monitor](monitor.md). +For the full workflow, see [Studio Monitor](/studio-alpha/monitor). ### Workspaces @@ -53,7 +56,7 @@ You can upload any file type into a fileset. However, each service supports spec | Service | Supported File Types | | --------------------------------------------------- | ------------------------------------- | -| [Data Designer](../data-designer/index.md) | `.json`, `.jsonl`, `.csv`, `.parquet` | +| [Data Designer](/design-synthetic-data/about) | `.json`, `.jsonl`, `.csv`, `.parquet` | ### Jobs @@ -84,4 +87,4 @@ Navigate to **Jobs** in the workspace sidebar to see all jobs in the current wor ### Secrets -Store API keys and credentials to securely connect with external providers. See [manage-secrets](../get-started/concepts/manage-secrets.md) for details. +Store API keys and credentials to securely connect with external providers. See [manage-secrets](/get-started/core-concepts/manage-secrets) for details. diff --git a/docs/studio/monitor.mdx b/docs/studio/monitor.mdx index 22d014de12..ed4b79e05e 100644 --- a/docs/studio/monitor.mdx +++ b/docs/studio/monitor.mdx @@ -1,10 +1,12 @@ -# {{studio_short_name}} Monitor - -Use {{studio_short_name}} **Agents > Monitor** to inspect recent agent activity for the selected workspace. +--- +title: "Monitor" +description: "" +--- +Use NeMo Studio **Agents > Monitor** to inspect recent agent activity for the selected workspace. ## Telemetry Source -{{studio_short_name}} reads agent telemetry from the `nemo-agent-telemetry` fileset. Each telemetry file contributes run summaries, token usage, and request metadata that {{studio_short_name}} aggregates for the Monitor page. +NeMo Studio reads agent telemetry from the `nemo-agent-telemetry` fileset. Each telemetry file contributes run summaries, token usage, and request metadata that NeMo Studio aggregates for the Monitor page. ## Summary Cards @@ -20,5 +22,5 @@ The inference logs table shows recent agent requests from the loaded telemetry f ## Related Topics -- [Optimize Agents](../agents/optimization.md) -- [Secure Agents](../agents/security.md) +- [Optimize Agents](/agents/optimize-agents) +- [Secure Agents](/agents/secure-agents) diff --git a/docs/studio/suggestions.mdx b/docs/studio/suggestions.mdx index a373174b89..e02c790218 100644 --- a/docs/studio/suggestions.mdx +++ b/docs/studio/suggestions.mdx @@ -1,10 +1,12 @@ -# {{studio_short_name}} Suggestions +--- +title: "Suggestions" +description: "" +--- +Use NeMo Studio **Agents > Suggestions** to review optimizer suggestions for agents in the selected workspace. -Use {{studio_short_name}} **Agents > Suggestions** to review optimizer suggestions for agents in the selected workspace. +## What NeMo Studio Shows -## What {{studio_short_name}} Shows - -{{studio_short_name}} loads the latest optimizer snapshot and suggestions from the platform files service. Suggestions can be scoped to a workspace or to a specific agent. +NeMo Studio loads the latest optimizer snapshot and suggestions from the platform files service. Suggestions can be scoped to a workspace or to a specific agent. The Suggestions page summarizes: @@ -26,14 +28,14 @@ Use the table filters to narrow suggestions by: ## Run an Optimization Pass -If there are no suggestions, or if the latest snapshot is stale, {{studio_short_name}} starts a new optimizer pass for the workspace. The optimizer analyzes deployed agents and writes updated suggestions back to the platform files service. +If there are no suggestions, or if the latest snapshot is stale, NeMo Studio starts a new optimizer pass for the workspace. The optimizer analyzes deployed agents and writes updated suggestions back to the platform files service. ## Apply Suggestions -{{studio_short_name}} can apply supported suggestions from the Suggestions page. Model optimization suggestions may ask you to choose an evaluation config before applying the change. +NeMo Studio can apply supported suggestions from the Suggestions page. Model optimization suggestions may ask you to choose an evaluation config before applying the change. ## Next Steps -- [{{studio_short_name}} Agents](agents.md): review, deploy, chat with, and delete agents from {{studio_short_name}}. -- [{{studio_short_name}} Monitor](monitor.md): inspect recent agent telemetry, token usage, and inference logs. -- [Optimize Agents](../agents/optimization.md): run CLI-driven optimization and review the underlying checks. +- [NeMo Studio Agents](/studio-alpha/agents): review, deploy, chat with, and delete agents from NeMo Studio. +- [NeMo Studio Monitor](/studio-alpha/monitor): inspect recent agent telemetry, token usage, and inference logs. +- [Optimize Agents](/agents/optimize-agents): run CLI-driven optimization and review the underlying checks. diff --git a/docs/stylesheets/nvidia.css b/docs/stylesheets/nvidia.css deleted file mode 100644 index 0085785cff..0000000000 --- a/docs/stylesheets/nvidia.css +++ /dev/null @@ -1,433 +0,0 @@ -/* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ -/* SPDX-License-Identifier: Apache-2.0 */ - -/* - * NVIDIA NeMo Platform – Material for MkDocs theme overrides - * Primary: NVIDIA Green (#76b900) - * Supports light (default) and dark (slate) modes. - */ - -/* ── Google Fonts ──────────────────────────────────────────────────────────── */ -@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap'); - -/* ── Light mode – NVIDIA Green primary ─────────────────────────────────────── */ -[data-md-color-scheme="default"] { - --md-primary-fg-color: #76b900; - --md-primary-fg-color--light: #94d400; - --md-primary-fg-color--dark: #5a8c00; - --md-accent-fg-color: #76b900; - --md-accent-fg-color--transparent: rgba(118, 185, 0, 0.12); - - /* Typescale – slightly tighter */ - --md-typeset-font-size: 0.85rem; - - /* Admonition note → green tint */ - --md-admonition-fg-color: currentColor; -} - -/* ── Dark mode – NVIDIA near-black background ───────────────────────────────── */ -[data-md-color-scheme="slate"] { - /* Green accent */ - --md-primary-fg-color: #76b900; - --md-primary-fg-color--light: #94d400; - --md-primary-fg-color--dark: #5a8c00; - --md-accent-fg-color: #76b900; - --md-accent-fg-color--transparent: rgba(118, 185, 0, 0.12); - - /* Near-black backgrounds instead of the default blue-gray slate */ - --md-default-bg-color: #0e0e0e; - --md-default-bg-color--light: #161616; - --md-default-bg-color--lighter: #1e1e1e; - --md-default-bg-color--lightest: #282828; - - /* Text */ - --md-default-fg-color: hsla(0, 0%, 95%, 1); - --md-default-fg-color--light: hsla(0, 0%, 80%, 0.87); - --md-default-fg-color--lighter: hsla(0, 0%, 65%, 0.54); - --md-default-fg-color--lightest:hsla(0, 0%, 50%, 0.26); - - /* Code */ - --md-code-bg-color: #1a1a1a; - --md-code-fg-color: #d4d4d4; - - /* Typescale */ - --md-typeset-font-size: 0.85rem; -} - -/* ── Header: always dark/NVIDIA on both modes ───────────────────────────────── */ -.md-header { - background-color: #0e0e0e; - color: #ffffff; - box-shadow: 0 1px 0 0 rgba(118, 185, 0, 0.3); -} - -.md-header__button.md-logo { - padding: 0.1rem 0; - margin-right: 0.4rem; -} - -.md-header__button.md-logo img, -.md-header__button.md-logo svg { - height: 2rem; - width: auto; - max-width: none; -} - -/* Header links / search text */ -.md-header__title, -.md-header a { - color: #ffffff; -} - -/* Search bar */ -[data-md-color-scheme="slate"] .md-search__input { - background-color: rgba(255, 255, 255, 0.08); - color: #ffffff; -} - -[data-md-color-scheme="slate"] .md-search__input::placeholder { - color: rgba(255, 255, 255, 0.45); -} - -/* ── Navigation tabs removed: using sidebar-only nav ───────────────────────── */ - -/* ── Sidebar ─────────────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-nav__title, -[data-md-color-scheme="slate"] .md-nav__link { - color: var(--md-default-fg-color--light); -} - -[data-md-color-scheme="slate"] .md-nav__link:hover, -[data-md-color-scheme="slate"] .md-nav__link--active { - color: #76b900; -} - -[data-md-color-scheme="slate"] .md-sidebar { - background-color: var(--md-default-bg-color); -} - -/* Active nav item indicator */ -.md-nav__link--active { - color: #76b900 !important; - font-weight: 600; -} - -/* ── Code blocks ─────────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-typeset pre { - background-color: #161616; - border: 1px solid #2a2a2a; - border-radius: 6px; -} - -[data-md-color-scheme="default"] .md-typeset pre { - border: 1px solid #e8e8e8; - border-radius: 6px; -} - -/* Inline code */ -[data-md-color-scheme="slate"] .md-typeset code:not(.highlight code) { - background-color: #1e1e1e; - color: #94d400; - padding: 0.1em 0.35em; - border-radius: 3px; -} - -[data-md-color-scheme="default"] .md-typeset code:not(.highlight code) { - background-color: #f0f7e0; - color: #3d6200; - padding: 0.1em 0.35em; - border-radius: 3px; -} - -/* ── Admonitions ─────────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-typeset .admonition, -[data-md-color-scheme="slate"] .md-typeset details { - border-color: rgba(118, 185, 0, 0.3); - background-color: #161616; -} - -/* Note → green */ -.md-typeset .admonition.note, -.md-typeset details.note { - border-left-color: #76b900; -} - -.md-typeset .admonition.note > .admonition-title, -.md-typeset details.note > summary { - background-color: rgba(118, 185, 0, 0.1); - color: #76b900; -} - -/* ── Tables ──────────────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-typeset table:not([class]) { - background-color: #161616; -} - -[data-md-color-scheme="slate"] .md-typeset table:not([class]) th { - background-color: #1e1e1e; - color: #94d400; -} - -[data-md-color-scheme="slate"] .md-typeset table:not([class]) tr:hover { - background-color: #1e1e1e; -} - -/* ── Buttons / links ─────────────────────────────────────────────────────────── */ -.md-typeset a { - color: #76b900; -} - -.md-typeset a:hover { - color: #94d400; -} - -[data-md-color-scheme="slate"] .md-typeset a { - color: #94d400; -} - -[data-md-color-scheme="slate"] .md-typeset a:hover { - color: #b8e050; -} - -/* ── Footer ──────────────────────────────────────────────────────────────────── */ -.md-footer { - background-color: #0a0a0a; - color: rgba(255, 255, 255, 0.6); -} - -.md-footer-meta { - background-color: #080808; -} - -/* ── Content area ────────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-content { - background-color: var(--md-default-bg-color); -} - -/* ── Headings ────────────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-typeset h1 { - color: #ffffff; - font-weight: 700; -} - -[data-md-color-scheme="slate"] .md-typeset h2, -[data-md-color-scheme="slate"] .md-typeset h3 { - color: #e0e0e0; - font-weight: 600; -} - -/* H1 bottom border accent */ -.md-typeset h1 { - border-bottom: 2px solid #76b900; - padding-bottom: 0.3em; - margin-bottom: 1em; -} - -/* ── Top-of-page feedback widget ─────────────────────────────────────────────── */ -.md-feedback { - border-top: 1px solid rgba(118, 185, 0, 0.2); - padding-top: 1rem; - margin-top: 2rem; -} - -/* ── Version banner ──────────────────────────────────────────────────────────── */ -.md-version__current { - color: #76b900 !important; -} - -/* ── Scrollbar (dark mode) ───────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] ::-webkit-scrollbar { - width: 6px; - height: 6px; -} - -[data-md-color-scheme="slate"] ::-webkit-scrollbar-track { - background: #161616; -} - -[data-md-color-scheme="slate"] ::-webkit-scrollbar-thumb { - background: #3a3a3a; - border-radius: 3px; -} - -[data-md-color-scheme="slate"] ::-webkit-scrollbar-thumb:hover { - background: #76b900; -} - -/* ── Material tabs ───────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-typeset .tabbed-labels > label { - color: rgba(255, 255, 255, 0.6); - border-bottom-color: transparent; -} - -[data-md-color-scheme="slate"] .md-typeset .tabbed-labels > label:hover, -[data-md-color-scheme="slate"] .md-typeset .tabbed-labels > input:checked + label { - color: #76b900; - border-bottom-color: #76b900; -} - -/* ── Search results ──────────────────────────────────────────────────────────── */ -[data-md-color-scheme="slate"] .md-search-result__article, -[data-md-color-scheme="slate"] .md-search-result__meta { - background-color: #161616; -} - -/* ── GitHub source widget in header ─────────────────────────────────────────── */ -.md-source { - background-color: #1a1a1a; - border-radius: 0.2rem; - color: #ffffff; -} - -.md-source:hover { - opacity: 0.8; -} - -.md-source__icon svg { - fill: #ffffff; -} - -.md-source__repository { - color: #ffffff; - font-weight: 500; -} - -.md-source__facts { - color: rgba(255, 255, 255, 0.7); -} - -/* ── Theme toggle → far right ───────────────────────────────────────────────── */ -.md-header__inner { - display: flex; - align-items: center; -} - -/* Push palette toggle after the source/repo widget */ -[data-md-component="palette"] { - order: 2; - margin-left: 0.4rem; -} - -[data-md-component="palette"] .md-header__button { - color: #ffffff; -} - -[data-md-component="palette"] .md-header__button:hover { - color: #76b900; -} - -/* ── Footer: remove social section ──────────────────────────────────────────── */ -.md-footer-meta__inner .md-social { - display: none; -} - -/* ── Grid cards: flex layout for bottom-aligned badges ───────────────────────── */ -.md-typeset .grid.cards > ol > li, -.md-typeset .grid.cards > ul > li { - display: flex; - flex-direction: column; - position: relative; - cursor: pointer; - transition: border-color 0.2s; -} - -.md-typeset .grid.cards > ol > li:hover, -.md-typeset .grid.cards > ul > li:hover { - border-color: #76b900; -} - -/* Stretch the title link to cover the entire card */ -.md-typeset .grid.cards > ol > li > p:first-child a, -.md-typeset .grid.cards > ul > li > p:first-child a { - text-decoration: none; -} - -.md-typeset .grid.cards > ol > li > p:first-child a::after, -.md-typeset .grid.cards > ul > li > p:first-child a::after { - content: ""; - position: absolute; - inset: 0; - z-index: 1; -} - -.md-typeset .grid.cards > ol > li > small:last-child, -.md-typeset .grid.cards > ul > li > small:last-child { - margin-top: auto; - padding-top: 0.6rem; - border-top: 1px solid rgba(118, 185, 0, 0.2); -} - -/* ── Card badge tags ─────────────────────────────────────────────────────────── */ -.md-typeset .md-tag { - display: inline-block; - padding: 0.15em 0.6em; - margin: 0.15em 0.15em; - font-size: 0.65rem; - font-weight: 600; - border-radius: 2px; - background-color: #1a3a00; - color: #94d400; - border: none; - letter-spacing: 0.02em; -} - -[data-md-color-scheme="default"] .md-typeset .md-tag { - background-color: #e8f5cc; - color: #2d5000; - border: none; -} - -/* ── API Reference: service filter chips ──────────────────────────────────── */ -.api-filter-chips { - display: flex; - flex-wrap: wrap; - gap: 0.35rem; - margin-bottom: 1.5rem; -} - -.api-chip { - display: inline-flex; - align-items: center; - padding: 0.15em 0.75em; - font-size: 0.7rem; - font-weight: 600; - line-height: 1.6; - border-radius: 999px; - border: 1.5px solid #76b900; - background: transparent; - color: #76b900; - cursor: pointer; - transition: background-color 0.2s, color 0.2s; -} - -.api-chip:hover { - background-color: rgba(118, 185, 0, 0.1); -} - -.api-chip.active { - background-color: #76b900; - color: #fff; -} - -[data-md-color-scheme="default"] .api-chip { - border-color: #76b900; - color: #76b900; -} - -[data-md-color-scheme="default"] .api-chip:hover { - background-color: rgba(118, 185, 0, 0.1); -} - -[data-md-color-scheme="default"] .api-chip.active { - background-color: #76b900; - color: #fff; -} - -/* ── Architecture diagram: clickable with hover effect ──────────────────────── */ -.md-typeset a > img { - transition: opacity 0.2s; -} - -.md-typeset a > img:hover { - opacity: 0.85; -} diff --git a/docs/support-matrix.mdx b/docs/support-matrix.mdx index fe3c333f63..c4842d72e7 100644 --- a/docs/support-matrix.mdx +++ b/docs/support-matrix.mdx @@ -1,19 +1,21 @@ -# Support Matrix +--- +title: "Support Matrix" +description: "" +--- +This matrix defines the supported OSS local-install target for NeMo Platform 0.1.0. It applies to the Python package, CLI, SDK, and local services started by `nemo setup`. -This matrix defines the supported OSS local-install target for {{platform_name}} {{ release }}. It applies to the Python package, CLI, SDK, and local services started by `nemo setup`. - -Docker Compose, Helm, Kubernetes, and OpenShift deployment paths are not part of the OSS {{ release }} documentation scope. +Docker Compose, Helm, Kubernetes, and OpenShift deployment paths are not part of the OSS 0.1.0 documentation scope. ## Host Platforms | Area | Supported | Notes | |------|-----------|-------| | Deployment mode | `nemo setup` local install | Setup starts local services, registers a model provider, and configures the CLI and SDK. | -| Python | 3.11, 3.12, 3.13 (`>=3.11,<3.14`) | Use an isolated virtual environment. Python 3.11 is the lowest supported Python version and Python 3.13 is the highest supported Python version for the OSS local install path. | +| Python | 3.11, 3.12, 3.13 (`>=3.11,<3.14`) | Use an isolated virtual environment. Python 3.11 is the lowest supported Python version and Python 3.13 is the highest supported Python version for the OSS local install path. | | Linux | Ubuntu 22.04 LTS, Ubuntu 24.04 LTS, RHEL 9, Rocky Linux 9, Debian 12 | Supported for CLI, SDK, local services, and NVIDIA GPU workloads. | | macOS | [macOS Tahoe 26](https://support.apple.com/en-us/122868) and macOS Sequoia 15 | Supported for CLI, SDK, local services, and cloud/provider workflows. Local NVIDIA GPU workloads are not supported on macOS. | | Architecture | x86_64 Linux; Apple Silicon and Intel macOS | NVIDIA GPU workloads require x86_64 Linux. | -| Windows and WSL | Not in the OSS {{ release }} support matrix | Use a supported Linux or macOS host for the local install path. | +| Windows and WSL | Not in the OSS 0.1.0 support matrix | Use a supported Linux or macOS host for the local install path. | ## Local Runtime @@ -53,7 +55,7 @@ Docker Compose, Helm, Kubernetes, and OpenShift deployment paths are not part of ## Out of Scope -The following are not part of the OSS {{ release }} local support matrix: +The following are not part of the OSS 0.1.0 local support matrix: - Native Windows local install. - Docker Compose, Helm, Kubernetes, and OpenShift deployment guides. diff --git a/docs/template/EULA.md b/docs/template/EULA.md deleted file mode 100644 index 4e8360e71f..0000000000 --- a/docs/template/EULA.md +++ /dev/null @@ -1,4 +0,0 @@ -# Eula - -By using MICROSERVICE container, you acknowledge that you have read and agreed to the -[license](https://registry.ngc.nvidia.com/orgs/ORG/teams/TEAM/resources/eula) diff --git a/docs/template/acknowledgements.md b/docs/template/acknowledgements.md deleted file mode 100644 index 7661efaecf..0000000000 --- a/docs/template/acknowledgements.md +++ /dev/null @@ -1,7 +0,0 @@ -# Acknowledgements - -## Software - -``` text -copyright notice from them -``` diff --git a/docs/template/getting_started/deploy-docker.md b/docs/template/getting_started/deploy-docker.md deleted file mode 100644 index d20ced6e01..0000000000 --- a/docs/template/getting_started/deploy-docker.md +++ /dev/null @@ -1,38 +0,0 @@ -# Deploying with Docker - - is intended to be run on a system with NVIDIA Datacenter GPUs, -with the exact requirements depending on the specific model and deployment options. - -For full systems hardware and software requirements see [Support -Matrix](./support-matrix). -For information about the models supported by the different containers, and the GPUs needed to run the models, see [Models](./models/models). - -## Pre-requisite Software - -To run LLM NIMs, you'll need a container runtime with support for NVIDIA GPUs. You can set this up with the following steps: - -1. Install [Docker](https://docs.docker.com/engine/install/) -1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit) - -## Setting up the Environment - -Set the **NGC_CLI_API_KEY** environment variable to your NGC API key, as -shown in the following example. - -``` bash -export NGC_CLI_API_KEY="key from ngc" -``` - -If you have not set up NGC, see [NGC -Setup](https://catalog.ngc.nvidia.comsetup). Don't forget to download and -install the NCG CLI (the download is on that page). - -## Download Container - -## Download a Model - -## Launching the Container - -## Health and Liveness Checks - -## Stopping the Container diff --git a/docs/template/getting_started/deploy-helm.md b/docs/template/getting_started/deploy-helm.md deleted file mode 100644 index 1f9f3bdb1b..0000000000 --- a/docs/template/getting_started/deploy-helm.md +++ /dev/null @@ -1,35 +0,0 @@ -# Deploying with Helm - - is intended to be run on a system with NVIDIA Datacenter GPUs, -with the exact requirements depending on the specific model and deployment options. - -For full systems hardware and software requirements see [Support -Matrix](./support-matrix). -For information about the models supported by the different containers, and the GPUs needed to run the models, see [Models](./models/models). - -Since helm is deploying a container, we recommend that you become familiar with the general information in -[Deploying with Docker](./deploy-docker) before using helm and Kubernetes. - -## Pre-requisite Software - -To run LLM NIMs, you'll need a container runtime with support for NVIDIA GPUs. You can set this up with the following steps: - -1. Install [Docker](https://docs.docker.com/engine/install/) -1. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit) - -## Setting up the Environment - -Set the **NGC_CLI_API_KEY** environment variable to your NGC API key, as -shown in the following example. - -``` bash -export NGC_CLI_API_KEY="key from ngc" -``` - -If you have not set up NGC, see [NGC -Setup](https://catalog.ngc.nvidia.comsetup). Don't forget to download and -install the NCG CLI (the download is on that page). - -## Download a Model - -## Launching in Kubernetes diff --git a/docs/template/models.md b/docs/template/models.md deleted file mode 100644 index c2c908b55c..0000000000 --- a/docs/template/models.md +++ /dev/null @@ -1,20 +0,0 @@ -# Models - -## Optimized Models - -The following models are supported in the optimized CONTAINER container. - - -| Foundation Model | Strengths | Max I/O Tokens | Parameters | Training Data | GPU(s) | -| ---------------- | --------- | -------------- | ---------- | ------------- | ------ | -| **MODEL NAME** -Creator: CREATOR Architecture: Transformer | MODEL NAME is a ... | N | N billion | N billion | trillion tokens (up to MONTH YEAR) | (N) GPU - -## Supported Models - -The following models have been tested and validated for performance and accuracy with the CONTAINER container. Other models supported by vLLM may also be used. - -| Foundation Model | Strengths | Max I/O Tokens | Parameters | Training Data | GPU(s) | -| ---------------- | --------- | -------------- | ---------- | ------------- | ------ | -| **MODEL NAME** -Creator: CREATOR Architecture: Transformer | MODEL NAME is a ... | N | N billion | N billion tokens (up to MONTH YEAR) | (N) GPU diff --git a/docs/template/overview.md b/docs/template/overview.md deleted file mode 100644 index 4d7b9f3407..0000000000 --- a/docs/template/overview.md +++ /dev/null @@ -1,12 +0,0 @@ -# Overview - -() \... - -## Features - -## Applications - -## License - -By using MICROSERVICE, you acknowledge that you have read and agreed to the -[license](https://registry.ngc.nvidia.com/orgs/ORG/teams/TEAM/resources/eula/files). diff --git a/docs/template/playbooks/playbook.md b/docs/template/playbooks/playbook.md deleted file mode 100644 index e436935bbe..0000000000 --- a/docs/template/playbooks/playbook.md +++ /dev/null @@ -1,31 +0,0 @@ -# ??? Playbook - -The container used in this playbook: [](???). - -## Notebook Requirements - -- Access to -- Access to NGC ??? -- Docker - -## Getting - -To see the list of all available prebuilt models: - -``` bash -ngc registry model list "${inference_ngc_org_team}/*" -``` - -Once you see the model you want to use, -you can get information about the model, -as shown in the following example: - -```bash -ngc registry model info nvcr.io/ORG/TEAM/MODEL -``` - -And then download the model using the following command: - -``` bash -ngc registry model download-version "ORG/TEAM/MODEL" -``` diff --git a/docs/template/reference/api-reference.md b/docs/template/reference/api-reference.md deleted file mode 100644 index bcc736c789..0000000000 --- a/docs/template/reference/api-reference.md +++ /dev/null @@ -1,81 +0,0 @@ -# API Reference - -Use this page as a starting point for service-specific API reference content. -For a complete REST API page in the current MkDocs stack, render an OpenAPI file -with the `mkdocs-swagger-ui-tag` plugin: - -```html - -``` - -## Examples - -### List Models - -=== "CLI (cURL)" - - ```bash - curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/models" \ - -H "Accept: application/json" - ``` - -=== "Python" - - ```python - import requests - - url = "http://:/v1/models" - headers = {"Accept": "application/json"} - response = requests.get(url, headers=headers, timeout=30) - print(response.text) - ``` - -**Response** - -```json -{ - "object": "list", - "data": [ - { - "id": "NV-Embed-QA", - "created": 0, - "object": "model", - "owned_by": "organization-owner" - } - ] -} -``` - -### Service-Specific Request - -=== "CLI (cURL)" - - ```bash - curl -X "VERB" \ - "http://${HOSTNAME}:${SERVICE_PORT}/v1/ENDPOINT" \ - -H "Accept: application/json" \ - -H "Content-Type: application/json" \ - -d '{ - "arg1": "value1", - "argN": "valueN" - }' - ``` - -=== "Python" - - ```python - import json - - import requests - - url = "http://:/v1/ENDPOINT" - payload = json.dumps( - { - "arg1": "value1", - "argN": "valueN", - } - ) - headers = {"Content-Type": "application/json"} - response = requests.request("VERB", url, headers=headers, data=payload, timeout=30) - print(response.text) - ``` diff --git a/docs/template/release-notes.md b/docs/template/release-notes.md deleted file mode 100644 index 95377ce19a..0000000000 --- a/docs/template/release-notes.md +++ /dev/null @@ -1,64 +0,0 @@ -# Release Notes - -## Release YY.MM - -### Summary - -This release ... - -### Key Features - -- Feature -- Feature - -### Language Models - -- Model -- Model - -### API Endpoints - -- OpenAI API endpoints - - `endpoint` - - `endpoint` -- NemoLLM API endpoints - - `endpoint` - - `endpoint` - -### OpenAI API Endpoints - -- endpoint - - Supported Parameters - - `parameter` - - `parameter` - - Unsupported Parameters - - `parameter` - - `parameter` -- endpoint ... - -### NemoLLM API Endpoints - -- endpoint - - Supported Parameters - - `parameter` - - `parameter` - - - Unsupported Parameters - - `parameter` - - `parameter` -- endpoint ... - -### Fixes - -- Fix -- Fix - -### Known Issues - -- Issue -- Issue - -#### Fixed bugs - -- Bug fix -- Bug fix diff --git a/docs/template/support-matrix.md b/docs/template/support-matrix.md deleted file mode 100644 index 04f6ef2162..0000000000 --- a/docs/template/support-matrix.md +++ /dev/null @@ -1,34 +0,0 @@ -# Support Matrix - -## Hardware - -The YY.MM release of MICROSERVICE has been tested on the following NVIDIA GPUs: - -- [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) -- [NVIDIA L4](https://www.nvidia.com/en-us/data-center/l4/) - -## Software - -### NVIDIA Driver - -Release YY.MM is based on [CUDA -12.2.2](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) -which requires [NVIDIA -Driver](https://www.nvidia.com/Download/index.aspx?lang=en-us) release -535 or later. However, if you are running on a [data center -GPU](https://www.nvidia.com/en-us/data-center/products/), you can use -NVIDIA driver release 450.51 (or later R450), 470.57 (or later R470), -510.47 (or later R510), 515.65 (or later R515), or 525.85 (or later -R525), or 535.86 (or later R535). The CUDA driver's compatibility -package only supports particular drivers. Thus, users should upgrade -from all R418, R440, and R460 drivers, which are not forward-compatible -with CUDA 12.2. For a complete list of supported drivers, see [CUDA -Application -Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#use-the-right-compat-package). - -### NVIDIA Container Toolkit - -Your Docker environment must support NVIDIA GPUs. See the [NVIDIA -Container -Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) -for more information. diff --git a/docs/template/using-ms.md b/docs/template/using-ms.md deleted file mode 100644 index 367509df94..0000000000 --- a/docs/template/using-ms.md +++ /dev/null @@ -1,33 +0,0 @@ -# Client Examples - -The Python client is uploaded to the NGC Private Registry Resources. -Since the packages are not deployed to a public package registry, -you must download the `.whl` and `pip` files and install them from the following locations: - -- [EA Participants](https://registry.ngc.nvidia.com/orgs/ohlfw0olaadg/teams/ea-participants/resources/nemo-MICROSERVICE-python-client) -- [NVIDIAN/nemo-llm](https://registry.ngc.nvidia.com/orgs/nvidian/teams/nemo-llm/resources/nemo-MICROSERVICE-python-client) - -## MICROSERVICE - -You must modify the `base_url` if you did not start the MICROSERVICE using the default configuration. - -``` python -from nemo_MICROSERVICE_client import MICROSERVICEClient -from pprint import pprint - -NAME = MICROSERVICEClient(base_url="http://localhost:1984") - -# GET pipelines: list all pipeline options available -response = NAME.get_pipelines() - -# GET collections: list all created collections -response = NAME.get_collections() - -# CREATE a new collection - specify the pipeline type and name of the collection -response = NAME.create_collection(pipeline="dense_elasticsearch", name="testCollection") -created_collection_id = ( - response.collection.id -) # store ID of the newly created collection - -# other examples -``` diff --git a/docs/troubleshooting/cluster-setup.mdx b/docs/troubleshooting/cluster-setup.mdx index d8fd4a012f..7f49767c89 100644 --- a/docs/troubleshooting/cluster-setup.mdx +++ b/docs/troubleshooting/cluster-setup.mdx @@ -1,14 +1,16 @@ -# Troubleshooting {{platform_name}} Deployment on Kubernetes +--- +title: "Troubleshooting NeMo Platform Deployment on Kubernetes" +description: "" +--- +Use this documentation to troubleshoot issues that can arise while you deploy and run the NeMo Platform on Kubernetes. -Use this documentation to troubleshoot issues that can arise while you deploy and run the {{platform_name}} on Kubernetes. - - +*/} ## Network Issues @@ -72,7 +74,7 @@ Sorry, home directories outside of /home needs configuration. To configure snap to use the correct home directory, follow the instructions at [Home directories outside of /home](https://snapcraft.io/docs/home-outside-home) in the Snap documentation. - +*/} diff --git a/docs/troubleshooting/customizer.mdx b/docs/troubleshooting/customizer.mdx index ee4152986e..d208ae775d 100644 --- a/docs/troubleshooting/customizer.mdx +++ b/docs/troubleshooting/customizer.mdx @@ -1,5 +1,7 @@ -# Troubleshooting {{ncm_short_name}} - +--- +title: "Troubleshooting NeMo Customizer" +description: "" +--- **Job fails during model download:** - Verify the HuggingFace token secret is configured correctly - Accept the model's license on the [HuggingFace model page](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) @@ -9,7 +11,7 @@ - The platform's shared persistent volume is likely full. Customization jobs require significant disk space: ~3× model size for full SFT, ~1.5× for LoRA. If you are also deploying the model from a base checkpoint fileset, plan for ~2.5× model size overall. - Clean up completed job artifacts or increase the PVC size (default: 200Gi at `/var/run/scratch/job`). - DPO/GRPO jobs also consume ephemeral node storage under `/tmp` via Ray workers — check node disk in addition to the PVC. -- See [ft-tut-understand-models](../customizer/tutorials/understand-configurations-and-models.md) for full storage requirement details. +- See [ft-tut-understand-models](/fine-tune-models/tutorials/understanding-models-and-training) for full storage requirement details. **Job fails with OOM (Out of Memory):** 1. Reduce `micro_batch_size` to 1 diff --git a/docs/troubleshooting/data-designer.mdx b/docs/troubleshooting/data-designer.mdx index 3256b19683..a1f320fe04 100644 --- a/docs/troubleshooting/data-designer.mdx +++ b/docs/troubleshooting/data-designer.mdx @@ -1,7 +1,11 @@ +--- +title: "Data Designer" +description: "" +--- # Troubleshoot Data Designer -Learn how to troubleshoot common issues with {{ndd_short_name}} in a local {{platform_name}} setup. +Learn how to troubleshoot common issues with NeMo Data Designer in a local NeMo Platform setup. ## Prerequisites @@ -27,7 +31,7 @@ nemo setup ## Check Service Logs -Local setup writes service output to `~/.local/state/nmp/instances//services.log` in the directory where services were started. +Local setup writes service output to `~/.local/state/nmp/instances/<scope>/services.log` in the directory where services were started. ```bash tail -n 200 ~/.local/state/nmp/instances//services.log @@ -46,8 +50,8 @@ Look for startup errors, provider authentication failures, port conflicts, or Da ## Next steps -- Tutorial: [Data Designer Tutorials](../data-designer/tutorials/index.md) covers common generation workflows. -- How-to: [Set Up {{platform_name}}](../get-started/setup.md) covers local setup and service startup. -- Explanation: [Data Designer Overview](../data-designer/index.md) explains configuration, execution, and service behavior. -- Reference: [API Reference](../api/index.md#tag-data-designer) lists Data Designer endpoints. -- Troubleshooting: [Troubleshooting Overview](index.md) links to other service troubleshooting pages. +- Tutorial: [Data Designer Tutorials](/design-synthetic-data/tutorials/overview) covers common generation workflows. +- How-to: [Set Up NeMo Platform](/get-started/setup) covers local setup and service startup. +- Explanation: [Data Designer Overview](/design-synthetic-data/about) explains configuration, execution, and service behavior. +- Reference: [API Reference](/reference/api-reference#tag-data-designer) lists Data Designer endpoints. +- Troubleshooting: [Troubleshooting Overview](/reference/troubleshooting/overview) links to other service troubleshooting pages. diff --git a/docs/troubleshooting/evaluator.mdx b/docs/troubleshooting/evaluator.mdx index 911734c0ce..af318f5bfa 100644 --- a/docs/troubleshooting/evaluator.mdx +++ b/docs/troubleshooting/evaluator.mdx @@ -1,10 +1,15 @@ +--- +title: "Evaluator" +description: "" +--- -# Troubleshooting {{nem_short_name}} +# Troubleshooting NeMo Evaluator -Use this documentation to troubleshoot issues that can arise when you work with [{{nem_long_name}}](../evaluator/index.md). +Use this documentation to troubleshoot issues that can arise when you work with [NVIDIA NeMo Evaluator](/evaluation/about). -!!! tip - You can [get metric logs](../evaluator/metrics/job-management.md) or [get benchmark logs](../evaluator/benchmarks/job-management.md) for `COMPLETED` or `FAILED` jobs and use them to help troubleshoot. + +You can [get metric logs](/evaluation/metrics/job-management) or [get benchmark logs](/evaluation/benchmarks/job-management) for `COMPLETED` or `FAILED` jobs and use them to help troubleshoot. + --- @@ -35,12 +40,12 @@ Not all models make good judges. If the judge produces inconsistent output and d Incoming request body={'messages': [{'content': 'The output string did not satisfy the constraints given in the prompt. Fix the output string and return it.\nPlease return the output in a JSON format that complies with the following schema as specified in JSON Schema:\n{"properties": {"text": {"title": "Text", "type": "string"}}, "required": ["text"], "title": "StringIO", "type": "object"} ``` -## Dataset {dataset} is not in the expected format; it needs to have the files_url property set +## Dataset \{dataset\} is not in the expected format; it needs to have the files_url property set This means that either the `files_url` is not provided as part of the dataset specification in the config, or that the `files_url` is not provided in the expected format. The dataset must be a JSON object with the `files_url` property set, -pointing to the path of the file in the {{nds_short_name}} in the format: `hf://datasets///`. +pointing to the path of the file in the NeMo Data Store in the format: `hf://datasets/<dataset-namespace>/<dataset-namespace>/<file-path>`. ## Error connecting to inference server @@ -49,18 +54,18 @@ This means that for a custom evaluation, the target LLM endpoint is unable to co ## Inference SSL Error -An evaluation job that uses an HTTPS model endpoint can fail if the endpoint certificate or DNS name is not trusted by the local environment. Verify that the model URL is reachable from the host running {{platform_name}} and that the endpoint presents a valid certificate for its hostname. +An evaluation job that uses an HTTPS model endpoint can fail if the endpoint certificate or DNS name is not trusted by the local environment. Verify that the model URL is reachable from the host running NeMo Platform and that the endpoint presents a valid certificate for its hostname. ``` Error: HTTPSConnectionPool(host="", port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError)) ``` -## Error occurred while checking the existence of file {file_ref} on {{nds_short_name}} +## Error occurred while checking the existence of file \{file_ref\} on NeMo Data Store -This could mean that the dataset is not specified correctly, or that the {{nds_short_name}} itself is unresponsive. +This could mean that the dataset is not specified correctly, or that the NeMo Data Store itself is unresponsive. -- Verify that the files URL is correct and that the dataset and file exists in the {{nds_short_name}}. -- Verify that the {{nds_short_name}} is responsive and reachable. +- Verify that the files URL is correct and that the dataset and file exists in the NeMo Data Store. +- Verify that the NeMo Data Store is responsive and reachable. If the error contains the string `Dataset {file_ref} is not present on datastore`, it means that the datastore is responsive, but the file reference does not exist. @@ -91,7 +96,9 @@ If you changed the platform URL, use the value configured in `NMP_BASE_URL` or i To troubleshoot an evaluation job that has failed, download the evaluation result archive and inspect the job logs. -!!! warning "These are advanced troubleshooting steps that should only be done after all other troubleshooting fails." + +These are advanced troubleshooting steps that should only be done after all other troubleshooting fails. + ### Evaluation Job Logs @@ -114,36 +121,38 @@ Log files can be found in the results folder with the file extension `*.log`. ## Skip validation checks -When you launch an evaluation job, {{nem_short_name}} performs availability checks (for example, checking if the dataset and files exist in {{nds_short_name}}). +When you launch an evaluation job, NeMo Evaluator performs availability checks (for example, checking if the dataset and files exist in NeMo Data Store). To speed up job launch, or due to strict constraints of validation checks, you can pass the query parameter `skip_validation_checks` during job launch. Use the following code to create an evaluation job that skips validation checks. -=== "curl" - ```bash - curl -X 'POST' \ - 'https://${EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ + + +```bash +curl -X 'POST' \ +'https://${EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True' \ +-H 'accept: application/json' \ +-H 'Content-Type: application/json' \ +-d '{ +"namespace": "my-organization", +"target": "", +"config": "" +}' +``` + + +```python +data = { "namespace": "my-organization", "target": "", - "config": "" - }' - ``` + "config": "", +} -=== "Python" - ```python - data = { - "namespace": "my-organization", - "target": "", - "config": "", - } +endpoint = f"{EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True" - endpoint = f"{EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True" - - response = requests.post(endpoint, json=data).json() - ``` - - +response = requests.post(endpoint, json=data).json() +``` + + +{/* end of tab set */} diff --git a/docs/troubleshooting/guardrails.mdx b/docs/troubleshooting/guardrails.mdx index 3a75b24ee1..24cf9fae00 100644 --- a/docs/troubleshooting/guardrails.mdx +++ b/docs/troubleshooting/guardrails.mdx @@ -1,10 +1,12 @@ -# Troubleshooting {{ngm_short_name}} - -Use this documentation to troubleshoot issues that can arise when you work with [{{ngm_long_name}}]. +--- +title: "Guardrails" +description: "" +--- +Use this documentation to troubleshoot issues that can arise when you work with [NVIDIA NeMo Guardrails]. ## API Catalog Endpoint Issues -Several sample configurations in the documentation use NIMs with model endpoints hosted at . +Several sample configurations in the documentation use NIMs with model endpoints hosted at [https://integrate.api.nvidia.com/v1](https://integrate.api.nvidia.com/v1). The purpose of using the endpoints is to avoid deploying NIMs locally to reduce the initial effort to get started. Perform the following steps to troubleshoot configurations that use model endpoints from the API catalog: @@ -17,14 +19,53 @@ Perform the following steps to troubleshoot configurations that use model endpoi 1. Access a model, such as the Llama 3.1 8B NemoGuard Content Safety, from the model endpoint: - --8<-- "troubleshooting/_snippets/input/guardrails-cmds.sh" + ```bash + # start-content-safety + curl https://integrate.api.nvidia.com/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer ${NVIDIA_API_KEY}" \ + -d '{ + "model": "nvidia/llama-3.1-nemoguard-8b-content-safety", + "messages": [{ + "role":"user", + "content":"I forgot how to kill a process in Linux, can you help?" + }], + "stream": false + }' + # end-content-safety + ``` ??? "Example Output" - --8<-- "troubleshooting/_snippets/output/guardrails-content-safety.json" + ```json + { + "id": "chat-2aaefebc0963475d862f1fd202f172b2", + "object": "chat.completion", + "created": 1743540105, + "model": "nvidia/llama-3.1-nemoguard-8b-content-safety", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "{\"User Safety\": \"safe\"} " + }, + "logprobs": null, + "finish_reason": "stop", + "stop_reason": null + } + ], + "usage": { + "prompt_tokens": 408, + "total_tokens": 416, + "completion_tokens": 8 + }, + "prompt_logprobs": null + } + ``` If your request is not successful, such as a 401 or 403 HTTP status code, - go to to generate a new API key. + go to [https://build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys) to generate a new API key. 1. Access other models. In the preceding sample `curl` command, replace the `model` value with one of the following: diff --git a/docs/troubleshooting/index.mdx b/docs/troubleshooting/index.mdx index 4a1435fb9c..9387d164d9 100644 --- a/docs/troubleshooting/index.mdx +++ b/docs/troubleshooting/index.mdx @@ -1,27 +1,29 @@ -# Troubleshooting Guide - -This section provides troubleshooting information for the {{platform_name}}. +--- +title: "Overview" +description: "" +--- +This section provides troubleshooting information for the NeMo Platform.
-- **[Evaluator Troubleshooting](evaluator.md)** +- **[Evaluator Troubleshooting](/reference/troubleshooting/evaluator)** --- - Learn how to troubleshoot common issues with {{nem_short_name}}. + Learn how to troubleshoot common issues with NeMo Evaluator. -- **[Guardrails Troubleshooting](guardrails.md)** +- **[Guardrails Troubleshooting](/reference/troubleshooting/guardrails)** --- - Learn how to troubleshoot common issues with {{ngm_short_name}}. + Learn how to troubleshoot common issues with NeMo Guardrails. -- **[Data Designer Troubleshooting](data-designer.md)** +- **[Data Designer Troubleshooting](/reference/troubleshooting/data-designer)** --- - Learn how to troubleshoot common issues with {{ndd_short_name}}. + Learn how to troubleshoot common issues with NeMo Data Designer. -- **[Studio Troubleshooting](studio.md)** +- **[Studio Troubleshooting](/reference/troubleshooting/studio)** --- diff --git a/docs/troubleshooting/studio.mdx b/docs/troubleshooting/studio.mdx index 7043b4c579..080c2e7b0a 100644 --- a/docs/troubleshooting/studio.mdx +++ b/docs/troubleshooting/studio.mdx @@ -1,7 +1,11 @@ +--- +title: "Studio" +description: "" +--- # Troubleshooting Studio -Learn how to troubleshoot common issues with {{studio_short_name}}. +Learn how to troubleshoot common issues with NeMo Studio. ## Datasets or Jobs Not Appearing in Studio @@ -37,4 +41,4 @@ curl -X POST http://localhost:8080/v1/datasets \ ``` All resource creation and update APIs support the `project` parameter. -Refer to the [Python SDK](../pysdk/index.md) and [API references](../api/index.md) for more details. +Refer to the [Python SDK](/reference/python-sdk/overview) and [API references](/reference/api-reference) for more details. diff --git a/docs/work/guardrails/README.md b/docs/work/guardrails/README.md deleted file mode 100644 index 47ddb7803b..0000000000 --- a/docs/work/guardrails/README.md +++ /dev/null @@ -1,7 +0,0 @@ -docker run --rm -it \ - -v $(pwd)/work/guardrails:/config-store \ - -e CONFIG_STORE_PATH=/config-store \ - -e DB_URI=sqlite:///config-store/config.sqlite \ - --net=host \ - -e NIM_ENDPOINT_API_KEY=$NVIDIA_API_KEY\ - guardrails:latest diff --git a/docs/work/guardrails/content-safety/config.yml b/docs/work/guardrails/content-safety/config.yml deleted file mode 100644 index fb3de46ead..0000000000 --- a/docs/work/guardrails/content-safety/config.yml +++ /dev/null @@ -1,20 +0,0 @@ -models: - - type: main - engine: nim - model: nvdev/meta/llama-3.1-8b-instruct - parameters: - base_url: https://integrate.api.nvidia.com/v1 - - - type: "content_safety" - engine: nim - model: "nvdev/nvidia/llama-3.1-nemoguard-8b-content-safety" - parameters: - base_url: https://integrate.api.nvidia.com/v1 - -rails: - input: - flows: - - content safety check input $model=content_safety - output: - flows: - - content safety check output $model=content_safety diff --git a/mkdocs.yml b/mkdocs.yml deleted file mode 100644 index e509663059..0000000000 --- a/mkdocs.yml +++ /dev/null @@ -1,513 +0,0 @@ -site_name: NeMo Platform -site_description: NVIDIA NeMo Platform Documentation -site_url: !ENV [NMP_DOCS_SITE_URL, "https://nvidia-nemo.github.io/nemo-platform/latest/"] -repo_url: https://github.com/NVIDIA-NeMo/nemo-platform -repo_name: NVIDIA-NeMo/nemo-platform -edit_uri: edit/main/docs/ -copyright: "Copyright © 2026 NVIDIA Corporation" - -docs_dir: docs -site_dir: site -use_directory_urls: true - -exclude_docs: | - _scripts/ - _generated/ - _hooks/ - _build/ - _overrides/ - _snippets/*.md - _snippets/**/*.md - **/_snippets/*.md - **/_snippets/**/*.md - .venv-mkdocs/ - template/ - work/ - notebooks/ - **/*.executed.ipynb - **/*.tmp.ipynb - **/*.py - **/*.pyc - README.md - CONTRIBUTING.md - CHANGELOG.md - MIGRATION.md - Makefile - requirements-mkdocs.txt - -# ─── Theme ─────────────────────────────────────────────────────────────────── -theme: - name: material - custom_dir: docs/_overrides - logo: assets/nvidia-logo-white.png - favicon: assets/favicon.ico - icon: - repo: fontawesome/brands/github - - palette: - # Light mode - - media: "(prefers-color-scheme: light)" - scheme: default - primary: custom - accent: custom - toggle: - icon: material/weather-sunny - name: Switch to dark mode - # Dark mode - - media: "(prefers-color-scheme: dark)" - scheme: slate - primary: custom - accent: custom - toggle: - icon: material/weather-night - name: Switch to light mode - - font: - text: Inter - code: JetBrains Mono - - features: - - navigation.indexes - - navigation.top - - navigation.path - - navigation.footer - - search.suggest - - search.highlight - - search.share - - content.code.copy - - content.code.annotate - - content.tabs.link - - content.tooltips - - toc.follow - - announce.dismiss - -# ─── Plugins ───────────────────────────────────────────────────────────────── -plugins: - - search: - lang: en - - - awesome-pages - - - macros: - on_error_fail: false - on_undefined: keep - - - mkdocstrings: - handlers: - python: - paths: [sdk/python/nemo-platform/src] - options: - docstring_style: google - show_source: false - show_root_heading: true - show_root_full_path: false - show_if_no_docstring: false - members_order: source - inherited_members: false - - - mkdocs-jupyter: - execute: false - include_source: true - ignore: - - "_scripts/*.py" - - "_extensions/**" - - "_templates/**" - - - redirects: - redirect_maps: - get-started/platform-prereq.md: get-started/setup.md - get-started/installation.md: get-started/setup.md - get-started/quickstart.md: get-started/setup.md - get-started/getting-started.md: index.md - generate-synthetic-data/index.md: data-designer/index.md - - - swagger-ui-tag: - docExpansion: list - filter: true - syntaxHighlightTheme: monokai - tryItOutEnabled: false - - - mike: - alias_type: redirect - canonical_version: !ENV [NMP_DOCS_CANONICAL_VERSION, latest] - -# ─── Hooks ─────────────────────────────────────────────────────────────────── -hooks: - # Reads extra.hidden_docs below to hide temporary pages from nav, output, and API filters. - - docs/_hooks/hide_unready_docs.py - -# ─── Markdown Extensions ───────────────────────────────────────────────────── -markdown_extensions: - - admonition - - pymdownx.details - - pymdownx.superfences: - custom_fences: - - name: mermaid - class: mermaid - format: !!python/name:pymdownx.superfences.fence_code_format - - pymdownx.tabbed: - alternate_style: true - slugify: !!python/object/apply:pymdownx.slugs.slugify - kwds: - case: lower - - pymdownx.highlight: - anchor_linenums: true - line_spans: __span - pygments_lang_class: true - - pymdownx.inlinehilite - - pymdownx.snippets: - base_path: - - docs - check_paths: false - - pymdownx.emoji: - emoji_index: !!python/name:material.extensions.emoji.twemoji - emoji_generator: !!python/name:material.extensions.emoji.to_svg - - attr_list - - md_in_html - - tables - - footnotes - - def_list - - toc: - permalink: true - title: On this page - - pymdownx.keys - - pymdownx.mark - - pymdownx.caret - - pymdownx.tilde - - pymdownx.tasklist: - custom_checkbox: true - -# ─── Extra ─────────────────────────────────────────────────────────────────── -extra: - social: [] - analytics: - feedback: - title: Was this page helpful? - ratings: - - icon: material/thumb-up-outline - name: This page was helpful - data: 1 - note: >- - Thanks for your feedback! - - icon: material/thumb-down-outline - name: This page could be improved - data: 0 - note: >- - Thanks for your feedback! Help us improve by - opening an issue. - - # Temporary gating for docs that are not ready for the 2026-05-12 launch. - # Used by docs/_hooks/hide_unready_docs.py. - # Set NMP_HIDE_UNREADY_DOCS=false, or run `make html-with-unready` / `make live-with-unready`, - # to build or serve these pages locally without editing nav. - hidden_docs: - enabled: !ENV [NMP_HIDE_UNREADY_DOCS, true] - sections: - - Platform - - Fine-tune Models - - Customizer - - Synthesize Safe Data - - Safe Synthesizer - - Example Applications - paths: - - auth/** - - customizer/** - - evaluator/benchmarks/** - - evaluator/metrics/job-management.md - - evaluator/metrics/results.md - - evaluator/metrics/retriever.md - - evaluator/tutorials/run-an-evaluation.md - - example-applications/tool-calling.ipynb - - get-started/quickstart.md - - helm/** - - run-inference/tutorials/deploy-models.md - - safe-synthesizer/** - - set-up/** - - troubleshooting/cluster-setup.md - - troubleshooting/customizer.md - api_tags: - - Auditor - - Customizer - - Safe Synthesizer - - # Substitution variables (accessed as {{ release }}, {{ ncm_short_name }}, etc.) - release: "0.1.0" - platform_name: "NeMo Platform" - ncm_long_name: "NVIDIA NeMo Customizer" - ncm_short_name: "NeMo Customizer" - ndd_long_name: "NVIDIA NeMo Data Designer" - ndd_short_name: "NeMo Data Designer" - nds_long_name: "NVIDIA NeMo Data Store" - nds_short_name: "NeMo Data Store" - nes_long_name: "NVIDIA NeMo Entity Store" - nes_short_name: "NeMo Entity Store" - nem_long_name: "NVIDIA NeMo Evaluator" - nem_short_name: "NeMo Evaluator" - ngm_long_name: "NVIDIA NeMo Guardrails" - ngm_short_name: "NeMo Guardrails" - nim_long_name: "NVIDIA NIM" - nim_short_name: "NIM" - nim_llm_long_name_cap: "NVIDIA NIM for Large Language Models (LLMs)" - nim_llm_long_name: "NVIDIA NIM for large language models (LLMs)" - nim_llm_short_name: "NIM for LLMs" - nrm_long_name: "NVIDIA NeMo Retriever" - nrm_short_name: "NeMo Retriever" - nsm_long_name: "NVIDIA NeMo Studio" - nsm_short_name: "NeMo Studio" - nss_long_name: "NVIDIA NeMo Safe Synthesizer" - nss_short_name: "NeMo Safe Synthesizer" - nop_long_name: "NVIDIA NeMo Operator" - nop_short_name: "NeMo Operator" - studio_long_name: "NVIDIA NeMo Studio" - studio_short_name: "NeMo Studio" - helm_chart_long_name: "NVIDIA NeMo Platform Helm Chart" - helm_chart_short_name: "NeMo Platform Helm Chart" - parent_helm_chart_latest_version: "1.0.0" - docker_compose_latest_version: "26.3" - build_nvidia_dot_com: "build.nvidia.com" - nemo_pysdk: "NeMo Platform Python SDK" - NEMO_HOST: "nemo.test" - inference_ngc_org: "nvidia" - deployment_management_long_name: "NVIDIA NeMo Deployment Management" - deployment_management_short_name: "NeMo Deployment Management" - proxy_short_name: "NIM Proxy" - __auditor_long_name: "NVIDIA NeMo Auditor" - __auditor_short_name: "NeMo Auditor" - __auditor_img_tag: "26.3" - volcano_version: "1.9.0" - -extra_javascript: - - javascripts/api-filter.js - -extra_css: - - stylesheets/nvidia.css - -# ─── Nav ───────────────────────────────────────────────────────────────────── -nav: - - Home: index.md - - Get Started: - - Setup: get-started/setup.md - - Core Concepts: - - Overview: get-started/concepts/index.md - - Entities: get-started/concepts/entities.md - - Entity References: get-started/concepts/entity-references.md - - Filtering: get-started/concepts/filtering.md - - Manage Files: get-started/concepts/manage-files.md - - Manage Secrets: get-started/concepts/manage-secrets.md - - Projects: get-started/concepts/projects.md - - Workspaces: get-started/concepts/workspaces.md - - Example Applications: - - About: example-applications/about.md - - Tool Calling: example-applications/tool-calling.ipynb - - Models and Inference: - - About: run-inference/about.md - - Tutorials: - - Overview: run-inference/tutorials/index.md - - Run Inference: run-inference/tutorials/run-inference.md - - Agents: - - About: agents/index.md - - Optimize Agents: agents/optimization.md - - Secure Agents: agents/security.md - - Plugins and Skills: agents/plugins.md - - Design Synthetic Data: - - About: data-designer/index.md - - Execution Modes: data-designer/execution-modes.md - - CLI: data-designer/cli.md - - Tutorials: - - Overview: data-designer/tutorials/index.md - - The Basics: data-designer/tutorials/basics.md - - Seeding with External Datasets: data-designer/tutorials/seeding.md - - SDK Resources: data-designer/sdk-resources.md - - Migrating from Standalone Library: data-designer/migration.md - - Synthesize Safe Data: - - About: - - Overview: safe-synthesizer/about/index.md - - Data Synthesis: safe-synthesizer/about/data-synthesis.md - - Evaluation: safe-synthesizer/about/evaluation.md - - Jobs: safe-synthesizer/about/jobs.md - - Host-Local Development: safe-synthesizer/about/host-local-development.md - - PII Replacement: safe-synthesizer/about/pii-replacement.md - - Parameters Reference: safe-synthesizer/about/reference.md - - Getting Started: safe-synthesizer/getting-started.md - - Tutorials: - - Overview: safe-synthesizer/tutorials/index.md - - Safe Synthesizer 101: safe-synthesizer/tutorials/safe-synthesizer-101.md - - Differential Privacy: safe-synthesizer/tutorials/differential-privacy.md - - Anonymize Data: - - About: anonymizer/index.md - - Quickstart: anonymizer/quickstart.md - - Tutorials: - - Overview: anonymizer/tutorials/index.md - - Preview a Config: anonymizer/tutorials/preview.md - - Run an Anonymizer Job: anonymizer/tutorials/run.md - - SDK Resources: anonymizer/sdk-resources.md - - CLI Reference: anonymizer/cli.md - - Fine-tune Models: - - About: customizer/index.md - - Customization Concepts: customizer/about.md - - Tutorials: - - Overview: customizer/tutorials/index.md - - Format Training Dataset: customizer/tutorials/format-training-dataset.md - - Import HuggingFace Models: customizer/tutorials/import-hf-model.md - - Job Metrics: customizer/tutorials/metrics.md - - Understanding Models and Training: customizer/tutorials/understand-configurations-and-models.md - - Models: - - Model Catalog: customizer/models/index.md - - Dataset Format: customizer/models/data-format.md - - Llama: customizer/models/llama.md - - Llama Nemotron: customizer/models/llama-nemotron.md - - Mistral: customizer/models/mistral.md - - Phi: customizer/models/phi.md - - Qwen: customizer/models/qwen.md - - GPT-OSS: customizer/models/gpt-oss.md - - Embedding: customizer/models/embedding.md - - Manage Model Entities: - - Overview: customizer/manage-model-entities/index.md - - Create a Model FileSet: customizer/manage-model-entities/create-fileset.md - - Create a Model Entity: customizer/manage-model-entities/create-model-entity.md - - Manage Jobs: - - Overview: customizer/manage-customization-jobs/index.md - - Create Job: customizer/manage-customization-jobs/create-job.md - - Customization Job Reference: customizer/manage-customization-jobs/customization-job-reference.md - - Get Job Status: customizer/manage-customization-jobs/get-job-status.md - - List Active Jobs: customizer/manage-customization-jobs/list-active-jobs.md - - Cancel Job: customizer/manage-customization-jobs/cancel-job.md - - Training Configuration: customizer/manage-customization-jobs/hyperparameters.md - - Evaluation: - - About: evaluator/index.md - - Tutorials: - - Overview: evaluator/tutorials/index.md - - Run LLM-as-a-Judge Evaluation: evaluator/tutorials/run-llm-judge-evaluation.md - - SDK Resources: evaluator/sdk-resources.md - - Metrics: - - Overview: evaluator/metrics/index.md - - Manage Metrics: evaluator/metrics/manage-metrics.md - - LLM-as-a-Judge: evaluator/metrics/llm-as-a-judge.md - - RAG Metrics: evaluator/metrics/rag.md - - Similarity Metrics: evaluator/metrics/similarity.md - - Agentic Metrics: evaluator/metrics/agentic.md - - Bring Your Own Metric: evaluator/metrics/remote.md - - Agent Configuration: evaluator/metrics/agent-configuration.md - - Model Configuration: evaluator/metrics/model-configuration.md - - Job Management: evaluator/metrics/job-management.md - - Metric Results: evaluator/metrics/results.md - - Benchmarks: - - Overview: evaluator/benchmarks/index.md - - Industry Benchmarks: evaluator/benchmarks/industry.md - - Discover Industry Benchmarks: evaluator/benchmarks/discover-industry-benchmarks.md - - Agentic Benchmarks: evaluator/benchmarks/agentic.md - - Custom Benchmarks: evaluator/benchmarks/custom.md - - Manage Benchmarks: evaluator/benchmarks/manage-benchmarks.md - - HuggingFace Secret: evaluator/benchmarks/hf-secret.md - - Job Management: evaluator/benchmarks/job-management.md - - Benchmark Results: evaluator/benchmarks/results.md - - Vulnerability Scanning: - - About: auditor/index.md - - Tutorials: - - Overview: auditor/tutorials/index.md - - Run an Audit Locally: auditor/tutorials/run-audit-locally.md - - SDK Resources: auditor/sdk-resources.md - - Configurations: - - Overview: auditor/configs/index.md - - Selecting Probes: auditor/configs/probes.md - - Schema: auditor/configs/schema.md - - Targets: - - Overview: auditor/targets/index.md - - Inference Gateway: auditor/targets/inference-gateway.md - - Schema: auditor/targets/schema.md - - Guardrails: - - About: guardrails/index.md - - Core Concepts: - - Overview: guardrails/concepts/index.md - - Architecture: guardrails/concepts/architecture.md - - Configurations: - - Overview: guardrails/concepts/configurations/index.md - - Configuration Structure: guardrails/concepts/configurations/configuration-structure.md - - Default Configurations: guardrails/concepts/configurations/default-configs.md - - Manage Configurations: guardrails/concepts/configurations/manage-configs.md - - Running Inference: guardrails/concepts/inference.md - - Running Checks: guardrails/concepts/checks.md - - Tutorials: - - Overview: guardrails/tutorials/index.md - - Content Safety: guardrails/tutorials/content-safety.md - - Deploy NemoGuard NIMs: guardrails/tutorials/deploy-nemoguard-nims.md - - Injection Detection: guardrails/tutorials/injection-detection.md - - Multimodal Data: guardrails/tutorials/multimodal-data.md - - Parallel Rails: guardrails/tutorials/parallel-rails.md - - Terminology: guardrails/terminology.md - - Observability: guardrails/observability.md - - Platform: - - About: set-up/index.md - - Authentication & Authorization: - - Overview: auth/index.md - - Concepts: auth/concepts.md - - Security Model: auth/security-model.md - - Authentication: - - Overview: auth/authentication/index.md - - OIDC Setup: auth/authentication/oidc.md - - Using Authentication: auth/authentication/using-authentication.md - - Providers: - - Overview: auth/authentication/providers/index.md - - Azure AD (Entra ID): auth/authentication/providers/azure-ad.md - - Generic OIDC: auth/authentication/providers/generic.md - - Authorization: - - Overview: auth/authorization/index.md - - Roles and Permissions: auth/authorization/roles-and-permissions.md - - Managing Access: auth/authorization/managing-access.md - - Permissions Reference: auth/authorization/permissions-reference.md - - API Scopes: auth/authorization/api-scopes.md - - Policy Engine: auth/authorization/policy-engine.md - - Deployment: - - Configuration: auth/deployment/configuration.md - - Credential Propagation: auth/deployment/credential-propagation.md - - Gateway Integration: auth/deployment/gateway.md - - Production Hardening: auth/deployment/hardening.md - - Troubleshooting: auth/troubleshooting.md - - Deploying on Kubernetes: - - Overview: set-up/helm/index.md - - Prerequisites: set-up/helm/prerequisites.md - - Install: set-up/helm/install.md - - Ingress: set-up/helm/ingress.md - - Database Setup: set-up/helm/database-setup.md - - File Storage: set-up/helm/file-storage.md - - Persistent Volumes: set-up/helm/persistent-volumes.md - - Multinode Networking: set-up/helm/multinode-networking.md - - OpenShift: set-up/helm/openshift.md - - Backup and Restore: set-up/helm/backup-and-restore.md - - Jobs: set-up/manage-jobs.md - - Observability: set-up/opentelemetry.md - - Security: set-up/security.md - - Milvus: set-up/milvus.md - - Studio (alpha): - - About: studio/index.md - - Agents: studio/agents.md - - Suggestions: studio/suggestions.md - - Monitor: studio/monitor.md - - Reference: - - Release Notes: - - Overview: about/release-notes/index.md - - Current Release: about/release-notes/current-release.md - - System Requirements: requirements.md - - Support Matrix: support-matrix.md - - API Reference: api/index.md - - Python SDK: - - Overview: pysdk/index.md - - Client APIs: pysdk/client/index.md - - CLI Reference: - - Overview: cli/index.md - - Configuration: cli/configuration.md - - Working with Resources: cli/working-with-resources.md - - Full CLI Reference: cli/reference.md - - Troubleshooting: cli/troubleshooting.md - - Config Reference: set-up/config-reference.md - - Helm Reference: helm/index.md - - Troubleshooting: - - Overview: troubleshooting/index.md - - Cluster Setup: troubleshooting/cluster-setup.md - - Customizer: troubleshooting/customizer.md - - Data Designer: troubleshooting/data-designer.md - - Evaluator: troubleshooting/evaluator.md - - Guardrails: troubleshooting/guardrails.md - - Studio: troubleshooting/studio.md - - EULA: eula.md - - Acknowledgements: acknowledgements/index.md