Installation

This guide covers different ways to install and run NVIDIA Triton Inference Server in your environment.

Prerequisites

Hardware Requirements

Minimum:

CPU: 4 cores
RAM: 8 GB
Disk: 10 GB free space

Recommended for GPU:

NVIDIA GPU with Compute Capability 6.0+ (Pascal or newer)
NVIDIA Driver: 450.80.02+ for CUDA 11.0+
16 GB RAM
50 GB free disk space

Software Requirements

Docker 19.03+ (for containerized deployment)
NVIDIA Container Toolkit (for GPU support)
Kubernetes 1.19+ (for K8s deployment)
Python 3.8+ (for client libraries)

Installation Methods

Method 1: Docker (Recommended)

The easiest way to get started with Triton is using pre-built Docker images.

Pull the Triton Image

For GPU environments:

docker pull nvcr.io/nvidia/tritonserver:24.10-py3

For CPU-only environments:

docker pull nvcr.io/nvidia/tritonserver:24.10-py3-min

Verify Installation

Check the server version:

docker run --rm nvcr.io/nvidia/tritonserver:24.10-py3 tritonserver --version

Expected output:

tritonserver 2.50.0

Method 2: Build from Source

For custom requirements or development, build from source.

Clone the Repository

git clone https://github.com/triton-inference-server/server.git
cd server

Build Using Docker

# For GPU build
python3 build.py --enable-all --backend=all

# For CPU-only build
python3 build.py --enable-all --backend=all --endpoint=http --endpoint=grpc

This process can take 1-2 hours depending on your system.

Method 3: Kubernetes Deployment

Deploy Triton on Kubernetes using Helm.

Add the Helm Repository

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

Install Triton

helm install triton-inference-server nvidia/triton-inference-server \
  --set image.tag=24.10-py3 \
  --set modelRepositoryPath=/models

Verify the Deployment

kubectl get pods -l app=triton-inference-server
kubectl logs -f <triton-pod-name>

Method 4: Cloud Platforms

AWS SageMaker

Deploy using AWS SageMaker Multi-Model Server:

import sagemaker
from sagemaker.triton import TritonModel

triton_model = TritonModel(
    model_uri="s3://your-bucket/model-repository/",
    role="your-sagemaker-role",
    image_uri="nvcr.io/nvidia/tritonserver:24.10-py3"
)

predictor = triton_model.deploy(
    instance_type="ml.g4dn.xlarge",
    initial_instance_count=1
)

Google Cloud Platform (GCP)

Deploy on GKE:

gcloud container clusters create triton-cluster \
    --machine-type n1-standard-4 \
    --num-nodes 2 \
    --accelerator type=nvidia-tesla-t4,count=1

kubectl apply -f triton-deployment.yaml

Azure

Deploy on AKS with GPU:

az aks create \
    --resource-group myResourceGroup \
    --name tritonCluster \
    --node-count 2 \
    --node-vm-size Standard_NC6 \
    --generate-ssh-keys

kubectl apply -f triton-deployment.yaml

Install NVIDIA Container Toolkit (for GPU)

Required for running Triton with GPU support on Docker.

Ubuntu/Debian

# Add NVIDIA package repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker

CentOS/RHEL

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
    sudo tee /etc/yum.repos.d/nvidia-docker.repo

sudo yum install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU Access

docker run --rm --gpus all nvcr.io/nvidia/tritonserver:24.10-py3 nvidia-smi

Install Client Libraries

Install Python client library for sending requests to Triton.

Using pip

pip install tritonclient[all]

Or install specific protocols:

# HTTP only
pip install tritonclient[http]

# GRPC only
pip install tritonclient[grpc]

Verify Client Installation

import tritonclient.http as httpclient

# This should not raise any errors
print("Triton HTTP client installed successfully")

Quick Verification

Test that everything is working correctly.

1. Create a Test Directory

mkdir -p /tmp/triton-test/models

2. Start Triton Server

docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
    -v /tmp/triton-test/models:/models \
    nvcr.io/nvidia/tritonserver:24.10-py3 \
    tritonserver --model-repository=/models

3. Check Server Status

In another terminal:

curl -v localhost:8000/v2/health/ready

Expected response:

HTTP/1.1 200 OK

Environment Variables

Configure Triton using environment variables:

Variable	Description	Default
`TRITON_MODEL_REPOSITORY`	Path to model repository	`/models`
`TRITON_LOG_VERBOSE`	Enable verbose logging	`0`
`TRITON_MIN_COMPUTE_CAPABILITY`	Minimum GPU compute capability	`6.0`
`TRITON_SERVER_THREAD_COUNT`	Number of server threads	Auto
`CUDA_VISIBLE_DEVICES`	GPUs visible to Triton	All

Example:

docker run --gpus all -e TRITON_LOG_VERBOSE=1 \
    -v /models:/models \
    nvcr.io/nvidia/tritonserver:24.10-py3

Common Installation Issues

Issue: CUDA Driver Version Mismatch

Error:

CUDA driver version is insufficient for CUDA runtime version

Solution: Update your NVIDIA drivers:

# Ubuntu
sudo apt-get update
sudo apt-get install --reinstall nvidia-driver-535

# Verify
nvidia-smi

Issue: Permission Denied on Model Repository

Error:

failed to load model: permission denied

Solution: Fix directory permissions:

chmod -R 755 /path/to/models

Issue: Out of GPU Memory

Error:

out of memory

Solution:

Limit GPU memory per model in config
Reduce model instance count
Use smaller batch sizes

Next Steps

Now that Triton is installed, you can:

Quick Start - Deploy your first model
Model Repository - Learn about model organization
Deployment - Production deployment strategies

Prerequisites​

Hardware Requirements​

Software Requirements​

Installation Methods​

Method 1: Docker (Recommended)​

Pull the Triton Image​

Verify Installation​

Method 2: Build from Source​

Clone the Repository​

Build Using Docker​

Method 3: Kubernetes Deployment​

Add the Helm Repository​

Install Triton​

Verify the Deployment​

Method 4: Cloud Platforms​

AWS SageMaker​

Google Cloud Platform (GCP)​

Azure​

Install NVIDIA Container Toolkit (for GPU)​

Ubuntu/Debian​

CentOS/RHEL​

Verify GPU Access​

Install Client Libraries​

Using pip​

Verify Client Installation​

Quick Verification​

1. Create a Test Directory​

2. Start Triton Server​

3. Check Server Status​

Environment Variables​

Common Installation Issues​

Issue: CUDA Driver Version Mismatch​

Issue: Permission Denied on Model Repository​

Issue: Out of GPU Memory​

Next Steps​

Prerequisites

Hardware Requirements

Software Requirements

Installation Methods

Method 1: Docker (Recommended)

Pull the Triton Image

Verify Installation

Method 2: Build from Source

Clone the Repository

Build Using Docker

Method 3: Kubernetes Deployment

Add the Helm Repository

Install Triton

Verify the Deployment

Method 4: Cloud Platforms

AWS SageMaker

Google Cloud Platform (GCP)

Azure

Install NVIDIA Container Toolkit (for GPU)

Ubuntu/Debian

CentOS/RHEL

Verify GPU Access

Install Client Libraries

Using pip

Verify Client Installation

Quick Verification

1. Create a Test Directory

2. Start Triton Server

3. Check Server Status

Environment Variables

Common Installation Issues

Issue: CUDA Driver Version Mismatch

Issue: Permission Denied on Model Repository

Issue: Out of GPU Memory

Next Steps