Quick Start

This quick start guide will walk you through creating and deploying your first machine learning service with BentoML.

Prerequisites

Python 3.8 or higher
BentoML installed (pip install bentoml)
Basic Python and ML knowledge

Step 1: Train a Model

Let's start with a simple Iris classification model using Scikit-learn:

# train_model.py
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import bentoml

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Train a Random Forest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Save the model to BentoML model store
saved_model = bentoml.sklearn.save_model(
    "iris_classifier",
    model,
    labels={
        "framework": "sklearn",
        "task": "classification"
    },
    metadata={
        "accuracy": "0.97",
        "features": ["sepal_length", "sepal_width", "petal_length", "petal_width"]
    }
)

print(f"Model saved: {saved_model.tag}")

Run the training script:

python train_model.py

Expected output:

Model saved: iris_classifier:abcd1234

Step 2: Verify Model Storage

Check that your model is saved:

bentoml models list

You should see your model:

Tag                        Module           Size      Creation Time
iris_classifier:abcd1234   bentoml.sklearn  10.23 KiB 2024-01-15 10:30:45

Get detailed information:

bentoml models get iris_classifier:latest

Step 3: Create a Service

Create a service file to define the API:

# service.py
import bentoml
import numpy as np
from pydantic import BaseModel

# Define input data structure
class IrisFeatures(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float

# Create BentoML service
@bentoml.service(
    resources={
        "cpu": "2",
        "memory": "512Mi",
    },
    traffic={
        "timeout": 10,
    }
)
class IrisClassifier:
    
    # Load model when service starts
    model_ref = bentoml.models.get("iris_classifier:latest")
    
    def __init__(self):
        # Import runner inside __init__ to avoid loading during service definition
        self.model = bentoml.sklearn.get("iris_classifier:latest").to_runner()
    
    @bentoml.api
    async def classify(self, features: IrisFeatures) -> dict:
        """
        Classify iris flower species based on features
        """
        # Convert input to numpy array
        input_data = np.array([[
            features.sepal_length,
            features.sepal_width,
            features.petal_length,
            features.petal_width
        ]])
        
        # Make prediction
        prediction = await self.model.predict.async_run(input_data)
        
        # Map prediction to species name
        species_map = {0: "setosa", 1: "versicolor", 2: "virginica"}
        species = species_map[int(prediction[0])]
        
        return {
            "species": species,
            "prediction": int(prediction[0])
        }
    
    @bentoml.api
    async def classify_batch(self, features: list[IrisFeatures]) -> list:
        """
        Classify multiple iris flowers at once
        """
        # Convert inputs to numpy array
        input_data = np.array([[
            f.sepal_length,
            f.sepal_width,
            f.petal_length,
            f.petal_width
        ] for f in features])
        
        # Make batch prediction
        predictions = await self.model.predict.async_run(input_data)
        
        # Map predictions to species names
        species_map = {0: "setosa", 1: "versicolor", 2: "virginica"}
        results = [
            {
                "species": species_map[int(pred)],
                "prediction": int(pred)
            }
            for pred in predictions
        ]
        
        return results

Step 4: Test Locally

Start the development server:

bentoml serve service:IrisClassifier

You should see:

Starting production BentoServer from "service:IrisClassifier" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)

The service is now running at http://localhost:3000!

Step 5: Test the API

Using cURL

Test single prediction:

curl -X POST http://localhost:3000/classify \
  -H "Content-Type: application/json" \
  -d '{
    "sepal_length": 5.1,
    "sepal_width": 3.5,
    "petal_length": 1.4,
    "petal_width": 0.2
  }'

Expected response:

{
  "species": "setosa",
  "prediction": 0
}

Test batch prediction:

curl -X POST http://localhost:3000/classify_batch \
  -H "Content-Type: application/json" \
  -d '[
    {
      "sepal_length": 5.1,
      "sepal_width": 3.5,
      "petal_length": 1.4,
      "petal_width": 0.2
    },
    {
      "sepal_length": 7.0,
      "sepal_width": 3.2,
      "petal_length": 4.7,
      "petal_width": 1.4
    }
  ]'

Using Python

# test_api.py
import requests

url = "http://localhost:3000/classify"
data = {
    "sepal_length": 5.1,
    "sepal_width": 3.5,
    "petal_length": 1.4,
    "petal_width": 0.2
}

response = requests.post(url, json=data)
print(response.json())

Using Swagger UI

Open your browser and navigate to:

Swagger UI: http://localhost:3000/docs
You can interactively test the API from the browser!

Step 6: Build a Bento

Create a configuration file:

# bentofile.yaml
service: "service:IrisClassifier"
labels:
  owner: data-science-team
  project: iris-classification
include:
  - "service.py"
  - "train_model.py"
python:
  packages:
    - scikit-learn
    - pydantic
    - numpy
docker:
  python_version: "3.10"

Build the Bento:

bentoml build

Output:

Building BentoML service "iris_classifier:xyz789" from build context "/path/to/project"
Successfully built Bento(tag="iris_classifier:xyz789")

List built Bentos:

bentoml list

Step 7: Containerize

Build a Docker image from your Bento:

bentoml containerize iris_classifier:latest

This creates a Docker image with everything needed to run your service.

Run the container:

docker run -p 3000:3000 iris_classifier:latest

Your service is now running in a Docker container!

Step 8: Make Predictions

Test the containerized service:

curl -X POST http://localhost:3000/classify \
  -H "Content-Type: application/json" \
  -d '{
    "sepal_length": 6.3,
    "sepal_width": 2.8,
    "petal_length": 5.1,
    "petal_width": 1.5
  }'

Understanding the Workflow

Here's what we did:

Train & Save - Trained a model and saved it to BentoML's model store
Define Service - Created a service class with API endpoints
Test Locally - Ran the service locally for development
Build Bento - Packaged everything into a deployable artifact
Containerize - Created a Docker image for deployment
Deploy - Ran the service in a container

Project Structure

Your project should look like this:

iris-classifier/
├── train_model.py      # Model training script
├── service.py          # BentoML service definition
├── bentofile.yaml      # Bento build configuration
├── requirements.txt    # Python dependencies
└── test_api.py        # API testing script

Next Steps

Now that you've created your first BentoML service, explore:

Deployment Example - Complete production deployment guide
Comparison - Compare BentoML with other tools
Best Practices - Production-ready patterns

Advanced Topics

Model Management: Version control and model registry
Adaptive Batching: Optimize throughput with automatic batching
Monitoring: Add metrics and logging
Production Deployment: Deploy to Kubernetes, AWS, GCP, or Azure
Multi-Model Serving: Serve multiple models in one service
Custom Runners: Create custom model runners for specialized needs

Prerequisites​

Step 1: Train a Model​

Step 2: Verify Model Storage​

Step 3: Create a Service​

Step 4: Test Locally​

Step 5: Test the API​

Using cURL​

Using Python​

Using Swagger UI​

Step 6: Build a Bento​

Step 7: Containerize​

Step 8: Make Predictions​

Understanding the Workflow​

Project Structure​

Next Steps​

Advanced Topics​