Microservices: Mike's Journey from Monolith to Distributed Architecture
The Breaking Point
Mike stared at his monitor, watching the deployment logs scroll by at 3 AM. Again. The third emergency deployment this week. His hands trembled slightly as he typed the restart command, knowing it would take at least 15 minutes to bring their entire ML platform back online. Fifteen minutes of angry users, lost predictions, and revenue bleeding away.
"There has to be a better way," he muttered, rubbing his tired eyes.
Mike was an MLOps engineer at VisionAI, a rapidly growing startup that provided real-time image classification APIs to e-commerce companies. Six months ago, their monolithic application had been their pride — a single Python application that handled everything: user authentication, image uploads, model inference, billing, and analytics. "Simple and elegant," his tech lead had called it.
But that was before they scaled from 100 to 10,000 requests per minute.
The Monolith's Death Spiral
The problem started subtly. A memory leak in the analytics module would occasionally crash the entire application. A CPU-intensive model update would slow down the authentication service. A database migration required taking everything offline. Every tiny change meant rebuilding and redeploying a 2GB Docker image that took 20 minutes to build.
Mike had tried everything: vertical scaling (throwing more RAM and CPUs at the problem), optimizing queries, adding caching layers. But the fundamental issue remained — everything was coupled together. One failing component brought down the entire house of cards.
During the weekly engineering meeting, his manager gave him a challenge: "Mike, I need you to research how we can make our system more resilient. I keep hearing about 'microservices' from other companies. Can you figure out if it's right for us?"
The First Attempt: Naive Separation
Mike spent the weekend reading about microservices. The concept seemed simple enough: break the monolith into smaller, independent services. Each service would handle one business capability and could be deployed independently.
Excited, Mike opened his IDE on Monday morning and started sketching out the new architecture:
# auth-service/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
class LoginRequest(BaseModel):
username: str
password: str
@app.post("/auth/login")
async def login(request: LoginRequest):
# Validate credentials
if validate_user(request.username, request.password):
token = generate_jwt_token(request.username)
return {"token": token}
raise HTTPException(status_code=401, detail="Invalid credentials")
@app.get("/auth/validate")
async def validate_token(token: str):
# Verify JWT token
return {"valid": verify_token(token), "user_id": extract_user_id(token)}
# inference-service/main.py
from fastapi import FastAPI, File, UploadFile, HTTPException
import httpx
app = FastAPI()
@app.post("/predict")
async def predict(image: UploadFile, token: str):
# Validate token by calling auth service
async with httpx.AsyncClient() as client:
auth_response = await client.get(
"http://auth-service:8001/auth/validate",
params={"token": token}
)
if not auth_response.json()["valid"]:
raise HTTPException(status_code=401, detail="Unauthorized")
# Process image
image_data = await image.read()
prediction = run_ml_model(image_data)
# Log to analytics service
async with httpx.AsyncClient() as client:
await client.post(
"http://analytics-service:8002/log",
json={"event": "prediction", "result": prediction}
)
return {"prediction": prediction}
He containerized each service, set up a simple docker-compose.yml, and deployed to staging. Initially, it worked! The services started independently, and he could update the ML model without touching authentication.
But within hours, problems emerged. The inference service kept timing out when calling the analytics service. When the auth service restarted, every other service started failing. The logs were scattered across multiple containers, making debugging a nightmare. And the worst part? The network calls between services added 200-300ms of latency to every request.
Mike slumped in his chair. "This is worse than the monolith," he admitted to his mentor during their 1-on-1.
His mentor smiled knowingly. "You've discovered the first rule of microservices: they're not just small services; they're distributed systems with all the complexity that entails."
The Learning Moment
His mentor pulled up a whiteboard and drew out what Mike's architecture was missing:
"Mike, microservices aren't just about splitting code. You need to think about these patterns:
- Service Discovery: Services need to find each other dynamically
- API Gateway: A single entry point that routes requests
- Circuit Breakers: Prevent cascading failures
- Async Communication: Not everything needs immediate responses
- Centralized Logging: Unified view of distributed logs
- Health Checks: Know when services are actually ready"
Mike's eyes widened. "So I need to build all of that?"
"No," his mentor laughed. "You use existing tools. Let me show you."
The Proper Architecture
Over the next two weeks, Mike rebuilt the architecture properly. Here's what he implemented:
1. API Gateway Pattern
Instead of services calling each other directly, he introduced an API Gateway using Kong:
# kong.yml
_format_version: "3.0"
services:
- name: auth-service
url: http://auth-service:8001
routes:
- name: auth-route
paths:
- /api/auth
- name: inference-service
url: http://inference-service:8000
routes:
- name: inference-route
paths:
- /api/predict
plugins:
- name: rate-limiting
config:
minute: 100
- name: jwt
config:
key_claim_name: user_id
Now clients only talked to one endpoint. The gateway handled routing, rate limiting, and authentication.
2. Message Queue for Async Operations
For non-critical operations like analytics, Mike introduced RabbitMQ:
# inference-service/main.py (updated)
import pika
# Don't wait for analytics - just queue it
def log_prediction_async(event_data):
connection = pika.BlockingConnection(
pika.ConnectionParameters('rabbitmq')
)
channel = connection.channel()
channel.queue_declare(queue='analytics_events')
channel.basic_publish(
exchange='',
routing_key='analytics_events',
body=json.dumps(event_data)
)
connection.close()
@app.post("/predict")
async def predict(image: UploadFile):
# Auth is now handled by API Gateway
image_data = await image.read()
prediction = run_ml_model(image_data)
# Fire and forget - don't wait for analytics
log_prediction_async({"event": "prediction", "result": prediction})
return {"prediction": prediction}
# analytics-service/consumer.py
import pika
def callback(ch, method, properties, body):
event = json.loads(body)
# Process analytics asynchronously
store_in_database(event)
update_metrics(event)
connection = pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
channel = connection.channel()
channel.queue_declare(queue='analytics_events')
channel.basic_consume(queue='analytics_events', on_message_callback=callback, auto_ack=True)
channel.start_consuming()
3. Service Health Checks and Circuit Breakers
Mike added health endpoints to each service:
# Standard health check endpoint
@app.get("/health")
async def health():
return {
"status": "healthy",
"service": "inference-service",
"version": "1.2.0",
"dependencies": {
"model_loaded": check_model_loaded(),
"rabbitmq": check_rabbitmq_connection()
}
}
And configured proper orchestration in docker-compose.yml:
version: '3.8'
services:
inference-service:
build: ./inference-service
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
depends_on:
rabbitmq:
condition: service_healthy
restart: unless-stopped
rabbitmq:
image: rabbitmq:3-management
healthcheck:
test: rabbitmq-diagnostics -q ping
interval: 30s
timeout: 10s
retries: 3
The Resolution
Three weeks after the refactoring, Mike's microservices architecture was humming smoothly. The benefits became immediately apparent:
- Independent Deployments: Updated the ML model 5 times in one day without touching other services
- Fault Isolation: Analytics service crashed overnight, but predictions kept running
- Scalability: Scaled inference service to 10 instances while keeping auth service at 2
- Developer Velocity: Three team members could work on different services without conflicts
- Deployment Time: From 20 minutes to 3 minutes per service
The platform handled Black Friday traffic (30x normal load) without breaking a sweat. Mike simply scaled the inference service horizontally, and the load balancer distributed requests automatically.
Reflection: What Mike Learned
As Mike documented the new architecture for the team wiki, he reflected on his journey. Here's what he wished he'd known from the start:
Microservices aren't about size — they're about boundaries. Each service should represent a distinct business capability with clear ownership.
Distributed systems are hard. You trade code complexity for operational complexity. Be ready for network failures, eventual consistency, and debugging across services.
Start with the minimum viable architecture. Mike's team needed only 4 services initially: auth, inference, analytics, and billing. More can be added later.
Invest in infrastructure early. API gateways, service meshes, logging, and monitoring aren't optional — they're essential.
Embrace async communication. Not every operation needs an immediate response. Message queues reduce coupling and improve resilience.
What You've Learned
| Concept | Key Takeaway |
|---|---|
| Monolith vs Microservices | Monoliths couple everything; microservices isolate business capabilities for independent scaling and deployment |
| API Gateway | Single entry point for routing, authentication, rate limiting, and protocol translation |
| Service Discovery | Services find each other dynamically rather than hardcoded URLs |
| Async Communication | Message queues (RabbitMQ, Kafka) decouple services and improve resilience |
| Health Checks | Essential for knowing service state and enabling graceful degradation |
| Circuit Breakers | Prevent cascading failures when dependent services fail |
| Trade-offs | Microservices increase operational complexity but improve scalability, resilience, and team velocity |
Final Wisdom: Microservices are a powerful pattern, but they're not a silver bullet. Start with a modular monolith, and split into microservices only when you have clear scaling or organizational needs. When you do make the transition, invest in the infrastructure to do it right — your 3 AM self will thank you.