Bring an existing SageMaker Managed MLFlow into Amazon SageMaker Unified Studio
TL;DR: You have already created a SageMaker-managed MLFlow Tracking Server and now you want to use it with SageMaker Unified Studio. In just two CLI commands, you will be able to!
What is Amazon SageMaker Managed MLFlow?
Amazon SageMaker’s fully managed MLflow capability, which became generally available in June 2024, offers a streamlined solution for managing the complete machine learning lifecycle. The service allows data scientists and ML developers to easily set up and manage MLflow Tracking Servers with minimal effort. The offering consists of three core components:
- an MLflow Tracking Server that serves REST API endpoints for monitoring experiments,
- a metadata store for persisting experiment-related information
- and an artifact store using Amazon S3 for secure storage of ML artifacts.
This managed service eliminates the undifferentiated heavy lifting of infrastructure management, provides comprehensive experiment tracking across various environments, supports full MLflow capabilities, and offers enhanced security through AWS IAM integration. The service is designed to scale efficiently, with a “Small” tracking server supporting teams of up to 25 users.
What is SageMaker Unified Studio?
Amazon SageMaker Unified Studio is a comprehensive environment that integrates data and AI tools for complete development workflows. It provides a unified experience for model development, generative AI app development, data processing, and SQL analytics in a single governed environment. For ML workflows specifically, it leverages SageMaker AI to offer fully managed infrastructure, tools, and workflows for each step of the model lifecycle. This includes data preparation, training, governance, MLOps, inference, experimentation, pipelines, and model monitoring and evaluation. Users can access their data stored in various sources like Amazon S3, Amazon Redshift, and other data sources through the Amazon SageMaker Lakehouse. The platform also integrates with Amazon Q Developer to assist with tasks across the development lifecycle, including data discovery, ML model building, and code authoring, making the entire ML workflow more streamlined and efficient.
How to create/connect MLFlow Tracking Server in SageMaker Unified Studio
There are two ways to make sure you can use MLFlow Tracking Server in SageMaker Unified Studio. Both methods assume you’ve already created your Project using your preferred project profile. To learn how to do so, check out the SageMaker Unified Studio documentation.
1. Create a new MLFlow Tracking Server directly in SageMaker Unified Studio
If you have the right permissions (AKA your project profile allows you), it takes less than 10 seconds to create a new MLFlow Tracking Server directly from the SageMaker Unified Studio UI:
- Open your Project
- Click on Compute
- Select MLFlow Tracking Servers
- Click Create MLFlow Tracking Server
- Select your configuration and provide a name
- Click on Create MLFlow Tracking Server
- Wait ~10–15 minutes
Your MLFlow Tracking Server is ready to go! All you need to do is to add the right code in your notebook/training job to track your experiments. Before getting into the code, make sure you retrieve the MLFlow Tracking Server ARN, by clicking the Copy ARN button in the Compute tab.
2. Import an existing MLFLow Tracking Server into SageMaker Unified Studio
Let’s assume you’ve created elsewhere your MLFlow Tracking Server, and now you want to be able to access it from SageMaker Unified Studio. For example, let’s say you’ve created the MLFlow Tracking Server using the CLI command:
aws sagemaker create-mlflow-tracking-server \
--tracking-server-name my-manually-created-tracking-server \
--artifact-store-uri s3://[YOUR-BUCKET]/[YOUR-PATH]/mlflow \
--role-arn arn:aws:iam::[ACCOUNT-ID]:role/[YOUR-ROLE]
In order to be able to visualize it in SageMaker Unified Studio, you need to tag it accordingly using three tags — Environment ID, Project ID and Domain ID. Two of them, Project ID and Domain ID, are available in the Project overview page, as indicated in the image below.
To get the Environment ID, it’s a bit more complicated. First of all, make sure your Project has the Blueprint for MLExperiments
attached to it.
Then, run this command line using the info obtained at the previous step:
aws datazone list-environments \
--domain-identifier [DOMAIN-ID-#3] \
--project-identifier [PROJECT-ID-#2] \
| jq -r '.items[] | select(.name == "MLExperiments") | .id'
Copy this value, and use it for this CLI command:
aws sagemaker add-tags \
--resource-arn arn:aws:sagemaker:[YOUR-REGION]:[YOUR-ACCOUNT-ID]:mlflow-tracking-server/[TRACKING-SERVER-NAME] \
--tags \
Key=AmazonDataZoneEnvironment,Value=[ENVIRONMENT-ID-#1] \
Key=AmazonDataZoneProject,Value=[PROJECT-ID-#2] \
Key=AmazonDataZoneDomain,Value=[DOMAIN-ID-#3]
That’s it! Now you should be able to see your MLFlow Tracking Server in the Compute tab of your Project:
How to use MLFLow Tracking Server in SageMaker Unified Studio
In your code, make sure you’ve added these lines:
import mlflow
from mlflow.models import infer_signature
mlflow_arn = "[REPLACE-WITH-THE-ARN-YOU-HAVE-JUST-COPIED]"
mlflow.set_tracking_uri(mlflow_arn)
mlflow.set_experiment("my-experiment-name")
with mlflow.start_run() as run:
[.... training code goes here ....]
signature = infer_signature(X, model.predict(X))
mlflow.log_params(params)
mlflow.sklearn.log_model(rfr, artifact_path="sklearn-model", signature=signature)
# Optional - use to register the model
model_uri = f"runs:/{run.info.run_id}/[YOUR-EXPERIMENT-NAME]"
mv = mlflow.register_model(model_uri, "[YOUR-MODEL-NAME]")
Then, you can open MLFlow by clicking on Build in the top menu bar, then MLFlow, choose your tracking server (if you have more than one), and finally select the experiment you’ve created. MLFlow UI will open up in another tab.
Happy coding! 🚀 If this content has been useful, please leave a clap 👏 or a comment 🗯. This will let us know that our work has been appreciated! 😄