Installation
This guide covers different ways to install and run NVIDIA Triton Inference Server in your environment.
Prerequisites
Hardware Requirements
Minimum:
- CPU: 4 cores
- RAM: 8 GB
- Disk: 10 GB free space
Recommended for GPU:
- NVIDIA GPU with Compute Capability 6.0+ (Pascal or newer)
- NVIDIA Driver: 450.80.02+ for CUDA 11.0+
- 16 GB RAM
- 50 GB free disk space
Software Requirements
- Docker 19.03+ (for containerized deployment)
- NVIDIA Container Toolkit (for GPU support)
- Kubernetes 1.19+ (for K8s deployment)
- Python 3.8+ (for client libraries)
Installation Methods
Method 1: Docker (Recommended)
The easiest way to get started with Triton is using pre-built Docker images.
Pull the Triton Image
For GPU environments:
docker pull nvcr.io/nvidia/tritonserver:24.10-py3
For CPU-only environments:
docker pull nvcr.io/nvidia/tritonserver:24.10-py3-min
Verify Installation
Check the server version:
docker run --rm nvcr.io/nvidia/tritonserver:24.10-py3 tritonserver --version
Expected output:
tritonserver 2.50.0
Method 2: Build from Source
For custom requirements or development, build from source.
Clone the Repository
git clone https://github.com/triton-inference-server/server.git
cd server
Build Using Docker
# For GPU build
python3 build.py --enable-all --backend=all
# For CPU-only build
python3 build.py --enable-all --backend=all --endpoint=http --endpoint=grpc
This process can take 1-2 hours depending on your system.
Method 3: Kubernetes Deployment
Deploy Triton on Kubernetes using Helm.
Add the Helm Repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
Install Triton
helm install triton-inference-server nvidia/triton-inference-server \
--set image.tag=24.10-py3 \
--set modelRepositoryPath=/models
Verify the Deployment
kubectl get pods -l app=triton-inference-server
kubectl logs -f <triton-pod-name>
Method 4: Cloud Platforms
AWS SageMaker
Deploy using AWS SageMaker Multi-Model Server:
import sagemaker
from sagemaker.triton import TritonModel
triton_model = TritonModel(
model_uri="s3://your-bucket/model-repository/",
role="your-sagemaker-role",
image_uri="nvcr.io/nvidia/tritonserver:24.10-py3"
)
predictor = triton_model.deploy(
instance_type="ml.g4dn.xlarge",
initial_instance_count=1
)
Google Cloud Platform (GCP)
Deploy on GKE:
gcloud container clusters create triton-cluster \
--machine-type n1-standard-4 \
--num-nodes 2 \
--accelerator type=nvidia-tesla-t4,count=1
kubectl apply -f triton-deployment.yaml
Azure
Deploy on AKS with GPU:
az aks create \
--resource-group myResourceGroup \
--name tritonCluster \
--node-count 2 \
--node-vm-size Standard_NC6 \
--generate-ssh-keys
kubectl apply -f triton-deployment.yaml
Install NVIDIA Container Toolkit (for GPU)
Required for running Triton with GPU support on Docker.
Ubuntu/Debian
# Add NVIDIA package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
CentOS/RHEL
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-container-toolkit
sudo systemctl restart docker
Verify GPU Access
docker run --rm --gpus all nvcr.io/nvidia/tritonserver:24.10-py3 nvidia-smi
Install Client Libraries
Install Python client library for sending requests to Triton.
Using pip
pip install tritonclient[all]
Or install specific protocols:
# HTTP only
pip install tritonclient[http]
# GRPC only
pip install tritonclient[grpc]
Verify Client Installation
import tritonclient.http as httpclient
# This should not raise any errors
print("Triton HTTP client installed successfully")
Quick Verification
Test that everything is working correctly.
1. Create a Test Directory
mkdir -p /tmp/triton-test/models
2. Start Triton Server
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v /tmp/triton-test/models:/models \
nvcr.io/nvidia/tritonserver:24.10-py3 \
tritonserver --model-repository=/models
3. Check Server Status
In another terminal:
curl -v localhost:8000/v2/health/ready
Expected response:
HTTP/1.1 200 OK
Environment Variables
Configure Triton using environment variables:
| Variable | Description | Default |
|---|---|---|
TRITON_MODEL_REPOSITORY | Path to model repository | /models |
TRITON_LOG_VERBOSE | Enable verbose logging | 0 |
TRITON_MIN_COMPUTE_CAPABILITY | Minimum GPU compute capability | 6.0 |
TRITON_SERVER_THREAD_COUNT | Number of server threads | Auto |
CUDA_VISIBLE_DEVICES | GPUs visible to Triton | All |
Example:
docker run --gpus all -e TRITON_LOG_VERBOSE=1 \
-v /models:/models \
nvcr.io/nvidia/tritonserver:24.10-py3
Common Installation Issues
Issue: CUDA Driver Version Mismatch
Error:
CUDA driver version is insufficient for CUDA runtime version
Solution: Update your NVIDIA drivers:
# Ubuntu
sudo apt-get update
sudo apt-get install --reinstall nvidia-driver-535
# Verify
nvidia-smi
Issue: Permission Denied on Model Repository
Error:
failed to load model: permission denied
Solution: Fix directory permissions:
chmod -R 755 /path/to/models
Issue: Out of GPU Memory
Error:
out of memory
Solution:
- Limit GPU memory per model in config
- Reduce model instance count
- Use smaller batch sizes
Next Steps
Now that Triton is installed, you can:
- Quick Start - Deploy your first model
- Model Repository - Learn about model organization
- Deployment - Production deployment strategies