Skip to main content

Troubleshooting

This guide helps you diagnose and resolve common issues when working with gVisor.

General Debugging Approach

Enable Debug Logging

First, enable comprehensive logging to understand what's happening:

# Create debug configuration
mkdir -p /tmp/gvisor-debug
cat > /tmp/gvisor-debug/runsc.toml <<EOF
[runsc]
debug = true
debug-log = "/tmp/gvisor-debug/debug.log"
strace = true
log-level = "debug"
log-packets = true
platform = "auto"
EOF

# Test with debug logging
docker run --rm --runtime=runsc \
--runtime-opt config=/tmp/gvisor-debug/runsc.toml \
alpine:latest echo "Debug test"

# Check debug logs
cat /tmp/gvisor-debug/debug.log

Check System Information

Verify your environment setup:

# Check gVisor installation
runsc --version

# Check available platforms
ls -la /dev/kvm # For KVM support
cat /proc/version

# Check Docker runtime configuration
docker info | grep -i runtime

# Check containerd configuration (if using containerd)
sudo containerd config dump | grep runsc

Installation Issues

gVisor Not Found

Problem: runsc: command not found or similar errors.

Solutions:

# Check if runsc is installed
which runsc
echo $PATH

# Install from repository
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list > /dev/null
sudo apt-get update && sudo apt-get install -y runsc

# Or install manually
GVISOR_VERSION="20231113.0"
ARCH="x86_64"
wget https://storage.googleapis.com/gvisor/releases/release/${GVISOR_VERSION}/${ARCH}/runsc
chmod +x runsc
sudo mv runsc /usr/local/bin/

Permission Issues

Problem: Permission denied accessing /dev/kvm or similar.

Solutions:

# Check KVM permissions
ls -la /dev/kvm
sudo chmod 666 /dev/kvm

# Add user to kvm group (permanent fix)
sudo usermod -a -G kvm $USER
newgrp kvm # Apply group change immediately

# Check if user is in kvm group
groups $USER

Docker Runtime Configuration Issues

Problem: Docker doesn't recognize the gVisor runtime.

Solutions:

# Check current Docker daemon configuration
cat /etc/docker/daemon.json

# Fix daemon.json format
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"runsc": {
"path": "/usr/local/bin/runsc"
}
}
}
EOF

# Restart Docker daemon
sudo systemctl restart docker

# Verify runtime is available
docker info | grep -i runtime

Runtime Issues

Container Fails to Start

Problem: Container exits immediately or fails to start.

Debugging Steps:

# Run with debug logging
docker run --rm --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt debug-log=/tmp/runsc.log \
alpine:latest echo "test"

# Check the debug log
cat /tmp/runsc.log | grep -i error

# Try different platform
docker run --rm --runtime=runsc \
--runtime-opt platform=ptrace \
alpine:latest echo "test"

# Check container logs
docker logs container-name

System Call Issues

Problem: Application fails with system call errors.

Solutions:

# Enable strace logging
docker run --rm --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt strace=true \
--runtime-opt debug-log=/tmp/strace.log \
problematic-app:latest

# Analyze system calls
cat /tmp/strace.log | grep -i "not implemented\|unsupported"

# Try different file access mode
docker run --rm --runtime=runsc \
--runtime-opt file-access=shared \
problematic-app:latest

Memory Issues

Problem: Out of memory errors or excessive memory usage.

Debugging:

# Check memory limits
docker run --rm --runtime=runsc \
-m 512m \
alpine:latest sh -c 'cat /proc/meminfo | head -10'

# Monitor memory usage
docker stats container-name

# Adjust gVisor memory settings
cat > /tmp/memory-config.toml <<EOF
[runsc]
total-memory = "2GB"
memory-file = "/tmp/gvisor-memory"
EOF

docker run --rm --runtime=runsc \
--runtime-opt config=/tmp/memory-config.toml \
memory-intensive-app:latest

Kubernetes-Specific Issues

RuntimeClass Not Working

Problem: Pods don't use gVisor runtime despite RuntimeClass configuration.

Debugging:

# Check RuntimeClass exists
kubectl get runtimeclass

# Verify RuntimeClass configuration
kubectl describe runtimeclass gvisor

# Check pod specification
kubectl get pod pod-name -o yaml | grep -i runtime

# Check node supports runtime
kubectl describe node node-name | grep -i runtime

Pod Scheduling Issues

Problem: Pods remain in Pending state.

Solutions:

# Check pod events
kubectl describe pod pod-name

# Check node availability
kubectl get nodes -o wide

# Check node taints and pod tolerations
kubectl describe node node-name | grep -i taint
kubectl get pod pod-name -o yaml | grep -A5 tolerations

# Check resource availability
kubectl top nodes
kubectl describe node node-name | grep -A5 "Allocated resources"

Container Runtime Issues in Kubernetes

Problem: CRI-related errors or container runtime failures.

Debugging:

# Check containerd service
sudo systemctl status containerd

# Check containerd logs
sudo journalctl -u containerd -f

# Test containerd with gVisor directly
sudo ctr --namespace k8s.io run --runtime runsc --rm \
docker.io/library/alpine:latest test echo "Hello"

# Check containerd configuration
sudo containerd config dump | grep -A10 runsc

Performance Issues

Slow Container Startup

Problem: Containers take a long time to start.

Solutions:

# Use KVM platform for better performance
docker run --runtime=runsc \
--runtime-opt platform=kvm \
alpine:latest echo "test"

# Optimize file access
docker run --runtime=runsc \
--runtime-opt file-access=shared \
alpine:latest echo "test"

# Enable VFS2
cat > /tmp/performance.toml <<EOF
[runsc]
platform = "kvm"
file-access = "shared"
vfs2 = true
gso = true
EOF

docker run --runtime=runsc \
--runtime-opt config=/tmp/performance.toml \
alpine:latest echo "test"

High CPU Usage

Problem: gVisor processes consuming excessive CPU.

Investigation:

# Monitor gVisor processes
ps aux | grep runsc
top -p $(pgrep runsc)

# Profile CPU usage
docker run --runtime=runsc \
--runtime-opt enable-profiling=true \
--runtime-opt profile-cpu=/tmp/cpu.prof \
cpu-intensive-app:latest

# Analyze CPU profile (if Go tools available)
go tool pprof /tmp/cpu.prof

Network Performance Issues

Problem: Slow network performance in gVisor containers.

Tuning:

# Enable network optimizations
cat > /tmp/network-tuning.toml <<EOF
[runsc]
platform = "kvm"
network = "sandbox"
gso = true
software-gso = true
tx-checksum-offload = true
rx-checksum-offload = true
EOF

docker run --runtime=runsc \
--runtime-opt config=/tmp/network-tuning.toml \
network-app:latest

# Test network performance
docker run --rm --runtime=runsc \
--runtime-opt config=/tmp/network-tuning.toml \
appropriate/curl curl -w "@curl-format.txt" -o /dev/null -s http://example.com

Application Compatibility Issues

Application Won't Run

Problem: Existing application fails to run in gVisor.

Troubleshooting:

# Check system call compatibility
docker run --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt strace=true \
--runtime-opt debug-log=/tmp/compat.log \
your-app:latest

# Look for unsupported system calls
grep "Unsupported syscall" /tmp/compat.log

# Try compatibility mode
docker run --runtime=runsc \
--runtime-opt platform=ptrace \
--runtime-opt file-access=shared \
your-app:latest

File System Issues

Problem: File operations fail or behave unexpectedly.

Solutions:

# Test different file access modes
docker run --rm -v /tmp/test:/test --runtime=runsc \
--runtime-opt file-access=exclusive \
alpine:latest ls -la /test

docker run --rm -v /tmp/test:/test --runtime=runsc \
--runtime-opt file-access=shared \
alpine:latest ls -la /test

# Check overlay filesystem
docker run --rm --runtime=runsc \
--runtime-opt overlay=false \
alpine:latest df -h

Database Issues

Problem: Databases don't work correctly with gVisor.

Common Solutions:

# For PostgreSQL - disable shared memory
docker run --runtime=runsc \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_INITDB_ARGS="--auth-host=trust --auth-local=trust" \
postgres:13 postgres -c shared_preload_libraries= -c dynamic_shared_memory_type=none

# For MySQL - adjust configuration
docker run --runtime=runsc \
-e MYSQL_ROOT_PASSWORD=secret \
mysql:8 --innodb-use-native-aio=0 --innodb-flush-method=fsync

# For Redis - disable background save
docker run --runtime=runsc redis:7 redis-server --save ""

Security Issues

Security Context Violations

Problem: Containers violate security policies.

Investigation:

# Check security context
kubectl get pod pod-name -o yaml | grep -A20 securityContext

# Check for privileged containers
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{": "}{.spec.containers[*].securityContext.privileged}{"\n"}{end}'

# Verify gVisor isolation
docker run --runtime=runsc alpine:latest cat /proc/version
# Should show gVisor version, not host kernel

Network Policy Issues

Problem: Network policies not working correctly.

Debugging:

# Test network connectivity
kubectl exec -it test-pod -- nc -zv target-service 80

# Check NetworkPolicy configuration
kubectl describe networkpolicy policy-name

# Verify pod labels match policy selectors
kubectl get pods --show-labels | grep target-app

Log Analysis

Understanding gVisor Logs

Common log entries and their meanings:

# Enable comprehensive logging
cat > /tmp/full-debug.toml <<EOF
[runsc]
debug = true
debug-log = "/tmp/gvisor-full.log"
log-level = "debug"
strace = true
log-packets = true
EOF

# Analyze different types of log entries
grep "Unsupported syscall" /tmp/gvisor-full.log # Compatibility issues
grep "OOM" /tmp/gvisor-full.log # Memory issues
grep "Segmentation fault" /tmp/gvisor-full.log # Crash issues
grep "Permission denied" /tmp/gvisor-full.log # Permission issues

Structured Log Analysis

Parse logs systematically:

# Create log analysis script
cat > /tmp/analyze-logs.sh <<'EOF'
#!/bin/bash
LOG_FILE="$1"

echo "=== gVisor Log Analysis ==="
echo "Log file: $LOG_FILE"
echo

echo "Error Summary:"
grep -c "ERROR" "$LOG_FILE" && echo "Total errors: $(grep -c "ERROR" "$LOG_FILE")"
grep -c "FATAL" "$LOG_FILE" && echo "Fatal errors: $(grep -c "FATAL" "$LOG_FILE")"
grep -c "panic" "$LOG_FILE" && echo "Panics: $(grep -c "panic" "$LOG_FILE")"
echo

echo "System Call Issues:"
grep "Unsupported syscall" "$LOG_FILE" | head -5
echo

echo "Memory Issues:"
grep -i "out of memory\|oom\|memory" "$LOG_FILE" | head -5
echo

echo "Platform Issues:"
grep -i "platform\|kvm\|ptrace" "$LOG_FILE" | head -5
echo

echo "Network Issues:"
grep -i "network\|socket\|connection" "$LOG_FILE" | head -5
EOF

chmod +x /tmp/analyze-logs.sh
/tmp/analyze-logs.sh /tmp/gvisor-full.log

Environment-Specific Troubleshooting

AWS EKS Issues

Problem: gVisor issues specific to AWS EKS.

Solutions:

# Check instance type supports KVM
curl -s http://169.254.169.254/latest/meta-data/instance-type
# KVM may not be available on all instance types

# Use ptrace platform on EKS
kubectl patch runtimeclass gvisor --type='merge' -p='{"handler": "runsc", "spec": {"platform": "ptrace"}}'

# Check EKS-specific networking
kubectl get pods -o wide
kubectl describe node node-name | grep -i vpc

Google GKE Issues

Problem: gVisor issues specific to Google GKE.

Solutions:

# Enable GKE Sandbox (managed gVisor)
gcloud container clusters update cluster-name \
--enable-sandbox

# Or use custom gVisor installation
kubectl apply -f gvisor-gke-setup.yaml

On-Premises Issues

Problem: Issues in on-premises Kubernetes clusters.

Common Solutions:

# Check kernel version
uname -r
# gVisor requires Linux 4.14+

# Verify cgroup configuration
mount | grep cgroup
ls -la /sys/fs/cgroup/

# Check apparmor/selinux
sudo aa-status
sestatus

Advanced Debugging Techniques

Using strace with gVisor

# Enable system call tracing
docker run --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt strace=true \
--runtime-opt debug-log=/tmp/strace.log \
problematic-app:latest

# Filter strace output
grep "syscall.*=" /tmp/strace.log | head -20
grep "Unsupported syscall" /tmp/strace.log

Memory Profiling

# Enable memory profiling
docker run --runtime=runsc \
--runtime-opt enable-profiling=true \
--runtime-opt profile-heap=/tmp/heap.prof \
memory-app:latest

# Analyze memory profile (requires Go tools)
go tool pprof /tmp/heap.prof

Network Debugging

# Enable packet logging
docker run --runtime=runsc \
--runtime-opt log-packets=true \
--runtime-opt debug-log=/tmp/packets.log \
network-app:latest

# Analyze network traffic
tcpdump -i docker0 host container-ip

Recovery Procedures

Recovering from gVisor Issues

Clean up problematic containers:

# Stop all gVisor containers
docker ps --filter "runtime=runsc" -q | xargs docker stop

# Remove problematic containers
docker ps -a --filter "runtime=runsc" -q | xargs docker rm

# Clean up gVisor resources
sudo pkill -f runsc
sudo rm -rf /tmp/runsc-*

Reset gVisor configuration:

# Backup current configuration
sudo cp /etc/docker/daemon.json /etc/docker/daemon.json.backup

# Reset to minimal configuration
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"runsc": {
"path": "/usr/local/bin/runsc"
}
}
}
EOF

# Restart services
sudo systemctl restart docker

Rollback Procedures

Rollback Kubernetes workloads:

# Remove gVisor RuntimeClass from deployment
kubectl patch deployment app-name -p '{"spec":{"template":{"spec":{"runtimeClassName":null}}}}'

# Scale down and up to restart pods
kubectl scale deployment app-name --replicas=0
kubectl scale deployment app-name --replicas=3

# Remove RuntimeClass if needed
kubectl delete runtimeclass gvisor

Getting Help

Community Support

Reporting Bugs

When reporting issues, include:

# Collect system information
cat > /tmp/debug-info.txt <<EOF
System Information:
$(uname -a)

gVisor Version:
$(runsc --version)

Docker Version:
$(docker --version)

Kernel Version:
$(cat /proc/version)

Container Runtime:
$(docker info | grep -i runtime)

Error Logs:
$(tail -50 /var/log/gvisor/debug.log)
EOF

# Include this file with your bug report

Professional Support

For production environments, consider:

  • Google Cloud Support (for GKE)
  • Commercial Kubernetes distributions with gVisor support
  • Professional services from container security vendors

Remember: Always test gVisor thoroughly in non-production environments before deploying to production workloads.

Summary

This troubleshooting guide covers the most common issues encountered when using gVisor. Always start with enabling debug logging, check the basics (installation, permissions, configuration), and work systematically through the specific problem domain.

The key to successful gVisor troubleshooting is understanding that gVisor sits between your application and the host kernel, so issues can originate from:

  1. Application compatibility with gVisor's system call implementation
  2. gVisor configuration and platform selection
  3. Container runtime integration
  4. Host system configuration and resources

By following this systematic approach, you should be able to resolve most gVisor-related issues effectively.