Skip to content

SageMaker AI models and MLflow for agent evaluation with Strands Agents SDK#4871

Merged
mollyheamazon merged 3 commits intoaws:defaultfrom
dhegde-aws:strands-mlflow-sagemaker-models
Feb 2, 2026
Merged

SageMaker AI models and MLflow for agent evaluation with Strands Agents SDK#4871
mollyheamazon merged 3 commits intoaws:defaultfrom
dhegde-aws:strands-mlflow-sagemaker-models

Conversation

@dhegde-aws
Copy link
Copy Markdown
Contributor

This PR adds a new notebook demonstrating how to use SageMaker AI endpoints and MLflow
with the Strands Agents SDK for building observable, production-ready AI agents.

What's included

  • Deploy foundation models from SageMaker JumpStart as inference endpoints
  • Configure SageMaker AI endpoints with Strands Agents SDK using SageMakerAIModel
  • Set up SageMaker Managed MLflow for automatic agent tracing and observability
  • Implement A/B testing using SageMaker production variants (Qwen3-4B vs Qwen3-8B)
  • Evaluate agent performance using MLflow GenAI scorers (custom + built-in)

Why SageMaker AI endpoints

  • Full infrastructure control over compute, networking, and scaling
  • Deploy custom/fine-tuned models or open-source alternatives
  • Cost predictability with reserved instances
  • Native MLflow integration for enterprise MLOps

Key SageMaker + MLflow features demonstrated

  • JumpStartModel for quick model deployment
  • Production variants for traffic splitting
  • target_variant parameter for controlled experiments
  • mlflow.strands.autolog() for automatic trace capture
  • mlflow.genai.evaluate() with Correctness and custom scorers

Testing done

Completed testing of the whole workbook on SageMaker AI Studio JupyterLab

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • [x ] I have verified that my PR does not contain any new notebook/s which demonstrate a SageMaker functionality already showcased by another existing notebook in the repository
  • [ x] I have read the CONTRIBUTING doc and adhered to the guidelines regarding folder placement, notebook naming convention and example notebook best practices
  • I have updated the necessary documentation, including the README of the appropriate folder as well as the index.rst file
  • [x ] I have tested my notebook(s) and ensured it runs end-to-end
  • I have linted my notebook(s) and code using python3 -m black -l 100 {path}/{notebook-name}.ipynb

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…s SDK

with models deployed on SageMaker AI endpoints and MLflow observability.

Covers SageMaker JumpStart model deployment, agent tracing with MLflow,
A/B testing with production variants, and evaluation using MLflow GenAI scorers.
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dhegde-aws
Copy link
Copy Markdown
Contributor Author

@aviruthen @monamo19 - Would request you to please review and merge this PR. Created this sample in support of a blog I am writing and has been approved in tech review.

@mollyheamazon mollyheamazon merged commit f9712cd into aws:default Feb 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants