Skip to main content

Quick Start

This quick start guide will walk you through creating and deploying your first machine learning service with BentoML.

Prerequisites

  • Python 3.8 or higher
  • BentoML installed (pip install bentoml)
  • Basic Python and ML knowledge

Step 1: Train a Model

Let's start with a simple Iris classification model using Scikit-learn:

# train_model.py
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import bentoml

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Train a Random Forest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Save the model to BentoML model store
saved_model = bentoml.sklearn.save_model(
"iris_classifier",
model,
labels={
"framework": "sklearn",
"task": "classification"
},
metadata={
"accuracy": "0.97",
"features": ["sepal_length", "sepal_width", "petal_length", "petal_width"]
}
)

print(f"Model saved: {saved_model.tag}")

Run the training script:

python train_model.py

Expected output:

Model saved: iris_classifier:abcd1234

Step 2: Verify Model Storage

Check that your model is saved:

bentoml models list

You should see your model:

Tag                        Module           Size      Creation Time
iris_classifier:abcd1234 bentoml.sklearn 10.23 KiB 2024-01-15 10:30:45

Get detailed information:

bentoml models get iris_classifier:latest

Step 3: Create a Service

Create a service file to define the API:

# service.py
import bentoml
import numpy as np
from pydantic import BaseModel

# Define input data structure
class IrisFeatures(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float

# Create BentoML service
@bentoml.service(
resources={
"cpu": "2",
"memory": "512Mi",
},
traffic={
"timeout": 10,
}
)
class IrisClassifier:

# Load model when service starts
model_ref = bentoml.models.get("iris_classifier:latest")

def __init__(self):
# Import runner inside __init__ to avoid loading during service definition
self.model = bentoml.sklearn.get("iris_classifier:latest").to_runner()

@bentoml.api
async def classify(self, features: IrisFeatures) -> dict:
"""
Classify iris flower species based on features
"""
# Convert input to numpy array
input_data = np.array([[
features.sepal_length,
features.sepal_width,
features.petal_length,
features.petal_width
]])

# Make prediction
prediction = await self.model.predict.async_run(input_data)

# Map prediction to species name
species_map = {0: "setosa", 1: "versicolor", 2: "virginica"}
species = species_map[int(prediction[0])]

return {
"species": species,
"prediction": int(prediction[0])
}

@bentoml.api
async def classify_batch(self, features: list[IrisFeatures]) -> list:
"""
Classify multiple iris flowers at once
"""
# Convert inputs to numpy array
input_data = np.array([[
f.sepal_length,
f.sepal_width,
f.petal_length,
f.petal_width
] for f in features])

# Make batch prediction
predictions = await self.model.predict.async_run(input_data)

# Map predictions to species names
species_map = {0: "setosa", 1: "versicolor", 2: "virginica"}
results = [
{
"species": species_map[int(pred)],
"prediction": int(pred)
}
for pred in predictions
]

return results

Step 4: Test Locally

Start the development server:

bentoml serve service:IrisClassifier

You should see:

Starting production BentoServer from "service:IrisClassifier" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)

The service is now running at http://localhost:3000!

Step 5: Test the API

Using cURL

Test single prediction:

curl -X POST http://localhost:3000/classify \
-H "Content-Type: application/json" \
-d '{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}'

Expected response:

{
"species": "setosa",
"prediction": 0
}

Test batch prediction:

curl -X POST http://localhost:3000/classify_batch \
-H "Content-Type: application/json" \
-d '[
{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
},
{
"sepal_length": 7.0,
"sepal_width": 3.2,
"petal_length": 4.7,
"petal_width": 1.4
}
]'

Using Python

# test_api.py
import requests

url = "http://localhost:3000/classify"
data = {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}

response = requests.post(url, json=data)
print(response.json())

Using Swagger UI

Open your browser and navigate to:

  • Swagger UI: http://localhost:3000/docs
  • You can interactively test the API from the browser!

Step 6: Build a Bento

Create a configuration file:

# bentofile.yaml
service: "service:IrisClassifier"
labels:
owner: data-science-team
project: iris-classification
include:
- "service.py"
- "train_model.py"
python:
packages:
- scikit-learn
- pydantic
- numpy
docker:
python_version: "3.10"

Build the Bento:

bentoml build

Output:

Building BentoML service "iris_classifier:xyz789" from build context "/path/to/project"
Successfully built Bento(tag="iris_classifier:xyz789")

List built Bentos:

bentoml list

Step 7: Containerize

Build a Docker image from your Bento:

bentoml containerize iris_classifier:latest

This creates a Docker image with everything needed to run your service.

Run the container:

docker run -p 3000:3000 iris_classifier:latest

Your service is now running in a Docker container!

Step 8: Make Predictions

Test the containerized service:

curl -X POST http://localhost:3000/classify \
-H "Content-Type: application/json" \
-d '{
"sepal_length": 6.3,
"sepal_width": 2.8,
"petal_length": 5.1,
"petal_width": 1.5
}'

Understanding the Workflow

Here's what we did:

  1. Train & Save - Trained a model and saved it to BentoML's model store
  2. Define Service - Created a service class with API endpoints
  3. Test Locally - Ran the service locally for development
  4. Build Bento - Packaged everything into a deployable artifact
  5. Containerize - Created a Docker image for deployment
  6. Deploy - Ran the service in a container

Project Structure

Your project should look like this:

iris-classifier/
├── train_model.py # Model training script
├── service.py # BentoML service definition
├── bentofile.yaml # Bento build configuration
├── requirements.txt # Python dependencies
└── test_api.py # API testing script

Next Steps

Now that you've created your first BentoML service, explore:

Advanced Topics

  • Model Management: Version control and model registry
  • Adaptive Batching: Optimize throughput with automatic batching
  • Monitoring: Add metrics and logging
  • Production Deployment: Deploy to Kubernetes, AWS, GCP, or Azure
  • Multi-Model Serving: Serve multiple models in one service
  • Custom Runners: Create custom model runners for specialized needs