Quick Start
This quick start guide will walk you through creating and deploying your first machine learning service with BentoML.
Prerequisites
- Python 3.8 or higher
- BentoML installed (
pip install bentoml) - Basic Python and ML knowledge
Step 1: Train a Model
Let's start with a simple Iris classification model using Scikit-learn:
# train_model.py
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import bentoml
# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Train a Random Forest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Save the model to BentoML model store
saved_model = bentoml.sklearn.save_model(
"iris_classifier",
model,
labels={
"framework": "sklearn",
"task": "classification"
},
metadata={
"accuracy": "0.97",
"features": ["sepal_length", "sepal_width", "petal_length", "petal_width"]
}
)
print(f"Model saved: {saved_model.tag}")
Run the training script:
python train_model.py
Expected output:
Model saved: iris_classifier:abcd1234
Step 2: Verify Model Storage
Check that your model is saved:
bentoml models list
You should see your model:
Tag Module Size Creation Time
iris_classifier:abcd1234 bentoml.sklearn 10.23 KiB 2024-01-15 10:30:45
Get detailed information:
bentoml models get iris_classifier:latest
Step 3: Create a Service
Create a service file to define the API:
# service.py
import bentoml
import numpy as np
from pydantic import BaseModel
# Define input data structure
class IrisFeatures(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
# Create BentoML service
@bentoml.service(
resources={
"cpu": "2",
"memory": "512Mi",
},
traffic={
"timeout": 10,
}
)
class IrisClassifier:
# Load model when service starts
model_ref = bentoml.models.get("iris_classifier:latest")
def __init__(self):
# Import runner inside __init__ to avoid loading during service definition
self.model = bentoml.sklearn.get("iris_classifier:latest").to_runner()
@bentoml.api
async def classify(self, features: IrisFeatures) -> dict:
"""
Classify iris flower species based on features
"""
# Convert input to numpy array
input_data = np.array([[
features.sepal_length,
features.sepal_width,
features.petal_length,
features.petal_width
]])
# Make prediction
prediction = await self.model.predict.async_run(input_data)
# Map prediction to species name
species_map = {0: "setosa", 1: "versicolor", 2: "virginica"}
species = species_map[int(prediction[0])]
return {
"species": species,
"prediction": int(prediction[0])
}
@bentoml.api
async def classify_batch(self, features: list[IrisFeatures]) -> list:
"""
Classify multiple iris flowers at once
"""
# Convert inputs to numpy array
input_data = np.array([[
f.sepal_length,
f.sepal_width,
f.petal_length,
f.petal_width
] for f in features])
# Make batch prediction
predictions = await self.model.predict.async_run(input_data)
# Map predictions to species names
species_map = {0: "setosa", 1: "versicolor", 2: "virginica"}
results = [
{
"species": species_map[int(pred)],
"prediction": int(pred)
}
for pred in predictions
]
return results
Step 4: Test Locally
Start the development server:
bentoml serve service:IrisClassifier
You should see:
Starting production BentoServer from "service:IrisClassifier" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
The service is now running at http://localhost:3000!
Step 5: Test the API
Using cURL
Test single prediction:
curl -X POST http://localhost:3000/classify \
-H "Content-Type: application/json" \
-d '{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}'
Expected response:
{
"species": "setosa",
"prediction": 0
}
Test batch prediction:
curl -X POST http://localhost:3000/classify_batch \
-H "Content-Type: application/json" \
-d '[
{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
},
{
"sepal_length": 7.0,
"sepal_width": 3.2,
"petal_length": 4.7,
"petal_width": 1.4
}
]'
Using Python
# test_api.py
import requests
url = "http://localhost:3000/classify"
data = {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}
response = requests.post(url, json=data)
print(response.json())
Using Swagger UI
Open your browser and navigate to:
- Swagger UI:
http://localhost:3000/docs - You can interactively test the API from the browser!
Step 6: Build a Bento
Create a configuration file:
# bentofile.yaml
service: "service:IrisClassifier"
labels:
owner: data-science-team
project: iris-classification
include:
- "service.py"
- "train_model.py"
python:
packages:
- scikit-learn
- pydantic
- numpy
docker:
python_version: "3.10"
Build the Bento:
bentoml build
Output:
Building BentoML service "iris_classifier:xyz789" from build context "/path/to/project"
Successfully built Bento(tag="iris_classifier:xyz789")
List built Bentos:
bentoml list
Step 7: Containerize
Build a Docker image from your Bento:
bentoml containerize iris_classifier:latest
This creates a Docker image with everything needed to run your service.
Run the container:
docker run -p 3000:3000 iris_classifier:latest
Your service is now running in a Docker container!
Step 8: Make Predictions
Test the containerized service:
curl -X POST http://localhost:3000/classify \
-H "Content-Type: application/json" \
-d '{
"sepal_length": 6.3,
"sepal_width": 2.8,
"petal_length": 5.1,
"petal_width": 1.5
}'
Understanding the Workflow
Here's what we did:
- Train & Save - Trained a model and saved it to BentoML's model store
- Define Service - Created a service class with API endpoints
- Test Locally - Ran the service locally for development
- Build Bento - Packaged everything into a deployable artifact
- Containerize - Created a Docker image for deployment
- Deploy - Ran the service in a container
Project Structure
Your project should look like this:
iris-classifier/
├── train_model.py # Model training script
├── service.py # BentoML service definition
├── bentofile.yaml # Bento build configuration
├── requirements.txt # Python dependencies
└── test_api.py # API testing script
Next Steps
Now that you've created your first BentoML service, explore:
- Deployment Example - Complete production deployment guide
- Comparison - Compare BentoML with other tools
- Best Practices - Production-ready patterns
Advanced Topics
- Model Management: Version control and model registry
- Adaptive Batching: Optimize throughput with automatic batching
- Monitoring: Add metrics and logging
- Production Deployment: Deploy to Kubernetes, AWS, GCP, or Azure
- Multi-Model Serving: Serve multiple models in one service
- Custom Runners: Create custom model runners for specialized needs