Troubleshooting
This guide helps you diagnose and resolve common issues when working with gVisor.
General Debugging Approach
Enable Debug Logging
First, enable comprehensive logging to understand what's happening:
# Create debug configuration
mkdir -p /tmp/gvisor-debug
cat > /tmp/gvisor-debug/runsc.toml <<EOF
[runsc]
debug = true
debug-log = "/tmp/gvisor-debug/debug.log"
strace = true
log-level = "debug"
log-packets = true
platform = "auto"
EOF
# Test with debug logging
docker run --rm --runtime=runsc \
--runtime-opt config=/tmp/gvisor-debug/runsc.toml \
alpine:latest echo "Debug test"
# Check debug logs
cat /tmp/gvisor-debug/debug.log
Check System Information
Verify your environment setup:
# Check gVisor installation
runsc --version
# Check available platforms
ls -la /dev/kvm # For KVM support
cat /proc/version
# Check Docker runtime configuration
docker info | grep -i runtime
# Check containerd configuration (if using containerd)
sudo containerd config dump | grep runsc
Installation Issues
gVisor Not Found
Problem: runsc: command not found or similar errors.
Solutions:
# Check if runsc is installed
which runsc
echo $PATH
# Install from repository
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list > /dev/null
sudo apt-get update && sudo apt-get install -y runsc
# Or install manually
GVISOR_VERSION="20231113.0"
ARCH="x86_64"
wget https://storage.googleapis.com/gvisor/releases/release/${GVISOR_VERSION}/${ARCH}/runsc
chmod +x runsc
sudo mv runsc /usr/local/bin/
Permission Issues
Problem: Permission denied accessing /dev/kvm or similar.
Solutions:
# Check KVM permissions
ls -la /dev/kvm
sudo chmod 666 /dev/kvm
# Add user to kvm group (permanent fix)
sudo usermod -a -G kvm $USER
newgrp kvm # Apply group change immediately
# Check if user is in kvm group
groups $USER
Docker Runtime Configuration Issues
Problem: Docker doesn't recognize the gVisor runtime.
Solutions:
# Check current Docker daemon configuration
cat /etc/docker/daemon.json
# Fix daemon.json format
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"runsc": {
"path": "/usr/local/bin/runsc"
}
}
}
EOF
# Restart Docker daemon
sudo systemctl restart docker
# Verify runtime is available
docker info | grep -i runtime
Runtime Issues
Container Fails to Start
Problem: Container exits immediately or fails to start.
Debugging Steps:
# Run with debug logging
docker run --rm --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt debug-log=/tmp/runsc.log \
alpine:latest echo "test"
# Check the debug log
cat /tmp/runsc.log | grep -i error
# Try different platform
docker run --rm --runtime=runsc \
--runtime-opt platform=ptrace \
alpine:latest echo "test"
# Check container logs
docker logs container-name
System Call Issues
Problem: Application fails with system call errors.
Solutions:
# Enable strace logging
docker run --rm --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt strace=true \
--runtime-opt debug-log=/tmp/strace.log \
problematic-app:latest
# Analyze system calls
cat /tmp/strace.log | grep -i "not implemented\|unsupported"
# Try different file access mode
docker run --rm --runtime=runsc \
--runtime-opt file-access=shared \
problematic-app:latest
Memory Issues
Problem: Out of memory errors or excessive memory usage.
Debugging:
# Check memory limits
docker run --rm --runtime=runsc \
-m 512m \
alpine:latest sh -c 'cat /proc/meminfo | head -10'
# Monitor memory usage
docker stats container-name
# Adjust gVisor memory settings
cat > /tmp/memory-config.toml <<EOF
[runsc]
total-memory = "2GB"
memory-file = "/tmp/gvisor-memory"
EOF
docker run --rm --runtime=runsc \
--runtime-opt config=/tmp/memory-config.toml \
memory-intensive-app:latest
Kubernetes-Specific Issues
RuntimeClass Not Working
Problem: Pods don't use gVisor runtime despite RuntimeClass configuration.
Debugging:
# Check RuntimeClass exists
kubectl get runtimeclass
# Verify RuntimeClass configuration
kubectl describe runtimeclass gvisor
# Check pod specification
kubectl get pod pod-name -o yaml | grep -i runtime
# Check node supports runtime
kubectl describe node node-name | grep -i runtime
Pod Scheduling Issues
Problem: Pods remain in Pending state.
Solutions:
# Check pod events
kubectl describe pod pod-name
# Check node availability
kubectl get nodes -o wide
# Check node taints and pod tolerations
kubectl describe node node-name | grep -i taint
kubectl get pod pod-name -o yaml | grep -A5 tolerations
# Check resource availability
kubectl top nodes
kubectl describe node node-name | grep -A5 "Allocated resources"
Container Runtime Issues in Kubernetes
Problem: CRI-related errors or container runtime failures.
Debugging:
# Check containerd service
sudo systemctl status containerd
# Check containerd logs
sudo journalctl -u containerd -f
# Test containerd with gVisor directly
sudo ctr --namespace k8s.io run --runtime runsc --rm \
docker.io/library/alpine:latest test echo "Hello"
# Check containerd configuration
sudo containerd config dump | grep -A10 runsc
Performance Issues
Slow Container Startup
Problem: Containers take a long time to start.
Solutions:
# Use KVM platform for better performance
docker run --runtime=runsc \
--runtime-opt platform=kvm \
alpine:latest echo "test"
# Optimize file access
docker run --runtime=runsc \
--runtime-opt file-access=shared \
alpine:latest echo "test"
# Enable VFS2
cat > /tmp/performance.toml <<EOF
[runsc]
platform = "kvm"
file-access = "shared"
vfs2 = true
gso = true
EOF
docker run --runtime=runsc \
--runtime-opt config=/tmp/performance.toml \
alpine:latest echo "test"
High CPU Usage
Problem: gVisor processes consuming excessive CPU.
Investigation:
# Monitor gVisor processes
ps aux | grep runsc
top -p $(pgrep runsc)
# Profile CPU usage
docker run --runtime=runsc \
--runtime-opt enable-profiling=true \
--runtime-opt profile-cpu=/tmp/cpu.prof \
cpu-intensive-app:latest
# Analyze CPU profile (if Go tools available)
go tool pprof /tmp/cpu.prof
Network Performance Issues
Problem: Slow network performance in gVisor containers.
Tuning:
# Enable network optimizations
cat > /tmp/network-tuning.toml <<EOF
[runsc]
platform = "kvm"
network = "sandbox"
gso = true
software-gso = true
tx-checksum-offload = true
rx-checksum-offload = true
EOF
docker run --runtime=runsc \
--runtime-opt config=/tmp/network-tuning.toml \
network-app:latest
# Test network performance
docker run --rm --runtime=runsc \
--runtime-opt config=/tmp/network-tuning.toml \
appropriate/curl curl -w "@curl-format.txt" -o /dev/null -s http://example.com
Application Compatibility Issues
Application Won't Run
Problem: Existing application fails to run in gVisor.
Troubleshooting:
# Check system call compatibility
docker run --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt strace=true \
--runtime-opt debug-log=/tmp/compat.log \
your-app:latest
# Look for unsupported system calls
grep "Unsupported syscall" /tmp/compat.log
# Try compatibility mode
docker run --runtime=runsc \
--runtime-opt platform=ptrace \
--runtime-opt file-access=shared \
your-app:latest
File System Issues
Problem: File operations fail or behave unexpectedly.
Solutions:
# Test different file access modes
docker run --rm -v /tmp/test:/test --runtime=runsc \
--runtime-opt file-access=exclusive \
alpine:latest ls -la /test
docker run --rm -v /tmp/test:/test --runtime=runsc \
--runtime-opt file-access=shared \
alpine:latest ls -la /test
# Check overlay filesystem
docker run --rm --runtime=runsc \
--runtime-opt overlay=false \
alpine:latest df -h
Database Issues
Problem: Databases don't work correctly with gVisor.
Common Solutions:
# For PostgreSQL - disable shared memory
docker run --runtime=runsc \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_INITDB_ARGS="--auth-host=trust --auth-local=trust" \
postgres:13 postgres -c shared_preload_libraries= -c dynamic_shared_memory_type=none
# For MySQL - adjust configuration
docker run --runtime=runsc \
-e MYSQL_ROOT_PASSWORD=secret \
mysql:8 --innodb-use-native-aio=0 --innodb-flush-method=fsync
# For Redis - disable background save
docker run --runtime=runsc redis:7 redis-server --save ""
Security Issues
Security Context Violations
Problem: Containers violate security policies.
Investigation:
# Check security context
kubectl get pod pod-name -o yaml | grep -A20 securityContext
# Check for privileged containers
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{": "}{.spec.containers[*].securityContext.privileged}{"\n"}{end}'
# Verify gVisor isolation
docker run --runtime=runsc alpine:latest cat /proc/version
# Should show gVisor version, not host kernel
Network Policy Issues
Problem: Network policies not working correctly.
Debugging:
# Test network connectivity
kubectl exec -it test-pod -- nc -zv target-service 80
# Check NetworkPolicy configuration
kubectl describe networkpolicy policy-name
# Verify pod labels match policy selectors
kubectl get pods --show-labels | grep target-app
Log Analysis
Understanding gVisor Logs
Common log entries and their meanings:
# Enable comprehensive logging
cat > /tmp/full-debug.toml <<EOF
[runsc]
debug = true
debug-log = "/tmp/gvisor-full.log"
log-level = "debug"
strace = true
log-packets = true
EOF
# Analyze different types of log entries
grep "Unsupported syscall" /tmp/gvisor-full.log # Compatibility issues
grep "OOM" /tmp/gvisor-full.log # Memory issues
grep "Segmentation fault" /tmp/gvisor-full.log # Crash issues
grep "Permission denied" /tmp/gvisor-full.log # Permission issues
Structured Log Analysis
Parse logs systematically:
# Create log analysis script
cat > /tmp/analyze-logs.sh <<'EOF'
#!/bin/bash
LOG_FILE="$1"
echo "=== gVisor Log Analysis ==="
echo "Log file: $LOG_FILE"
echo
echo "Error Summary:"
grep -c "ERROR" "$LOG_FILE" && echo "Total errors: $(grep -c "ERROR" "$LOG_FILE")"
grep -c "FATAL" "$LOG_FILE" && echo "Fatal errors: $(grep -c "FATAL" "$LOG_FILE")"
grep -c "panic" "$LOG_FILE" && echo "Panics: $(grep -c "panic" "$LOG_FILE")"
echo
echo "System Call Issues:"
grep "Unsupported syscall" "$LOG_FILE" | head -5
echo
echo "Memory Issues:"
grep -i "out of memory\|oom\|memory" "$LOG_FILE" | head -5
echo
echo "Platform Issues:"
grep -i "platform\|kvm\|ptrace" "$LOG_FILE" | head -5
echo
echo "Network Issues:"
grep -i "network\|socket\|connection" "$LOG_FILE" | head -5
EOF
chmod +x /tmp/analyze-logs.sh
/tmp/analyze-logs.sh /tmp/gvisor-full.log
Environment-Specific Troubleshooting
AWS EKS Issues
Problem: gVisor issues specific to AWS EKS.
Solutions:
# Check instance type supports KVM
curl -s http://169.254.169.254/latest/meta-data/instance-type
# KVM may not be available on all instance types
# Use ptrace platform on EKS
kubectl patch runtimeclass gvisor --type='merge' -p='{"handler": "runsc", "spec": {"platform": "ptrace"}}'
# Check EKS-specific networking
kubectl get pods -o wide
kubectl describe node node-name | grep -i vpc
Google GKE Issues
Problem: gVisor issues specific to Google GKE.
Solutions:
# Enable GKE Sandbox (managed gVisor)
gcloud container clusters update cluster-name \
--enable-sandbox
# Or use custom gVisor installation
kubectl apply -f gvisor-gke-setup.yaml
On-Premises Issues
Problem: Issues in on-premises Kubernetes clusters.
Common Solutions:
# Check kernel version
uname -r
# gVisor requires Linux 4.14+
# Verify cgroup configuration
mount | grep cgroup
ls -la /sys/fs/cgroup/
# Check apparmor/selinux
sudo aa-status
sestatus
Advanced Debugging Techniques
Using strace with gVisor
# Enable system call tracing
docker run --runtime=runsc \
--runtime-opt debug=true \
--runtime-opt strace=true \
--runtime-opt debug-log=/tmp/strace.log \
problematic-app:latest
# Filter strace output
grep "syscall.*=" /tmp/strace.log | head -20
grep "Unsupported syscall" /tmp/strace.log
Memory Profiling
# Enable memory profiling
docker run --runtime=runsc \
--runtime-opt enable-profiling=true \
--runtime-opt profile-heap=/tmp/heap.prof \
memory-app:latest
# Analyze memory profile (requires Go tools)
go tool pprof /tmp/heap.prof
Network Debugging
# Enable packet logging
docker run --runtime=runsc \
--runtime-opt log-packets=true \
--runtime-opt debug-log=/tmp/packets.log \
network-app:latest
# Analyze network traffic
tcpdump -i docker0 host container-ip
Recovery Procedures
Recovering from gVisor Issues
Clean up problematic containers:
# Stop all gVisor containers
docker ps --filter "runtime=runsc" -q | xargs docker stop
# Remove problematic containers
docker ps -a --filter "runtime=runsc" -q | xargs docker rm
# Clean up gVisor resources
sudo pkill -f runsc
sudo rm -rf /tmp/runsc-*
Reset gVisor configuration:
# Backup current configuration
sudo cp /etc/docker/daemon.json /etc/docker/daemon.json.backup
# Reset to minimal configuration
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"runsc": {
"path": "/usr/local/bin/runsc"
}
}
}
EOF
# Restart services
sudo systemctl restart docker
Rollback Procedures
Rollback Kubernetes workloads:
# Remove gVisor RuntimeClass from deployment
kubectl patch deployment app-name -p '{"spec":{"template":{"spec":{"runtimeClassName":null}}}}'
# Scale down and up to restart pods
kubectl scale deployment app-name --replicas=0
kubectl scale deployment app-name --replicas=3
# Remove RuntimeClass if needed
kubectl delete runtimeclass gvisor
Getting Help
Community Support
- GitHub Issues: https://github.com/google/gvisor/issues
- GitHub Discussions: https://github.com/google/gvisor/discussions
- Stack Overflow: Tag questions with
gvisor
Reporting Bugs
When reporting issues, include:
# Collect system information
cat > /tmp/debug-info.txt <<EOF
System Information:
$(uname -a)
gVisor Version:
$(runsc --version)
Docker Version:
$(docker --version)
Kernel Version:
$(cat /proc/version)
Container Runtime:
$(docker info | grep -i runtime)
Error Logs:
$(tail -50 /var/log/gvisor/debug.log)
EOF
# Include this file with your bug report
Professional Support
For production environments, consider:
- Google Cloud Support (for GKE)
- Commercial Kubernetes distributions with gVisor support
- Professional services from container security vendors
Remember: Always test gVisor thoroughly in non-production environments before deploying to production workloads.
Summary
This troubleshooting guide covers the most common issues encountered when using gVisor. Always start with enabling debug logging, check the basics (installation, permissions, configuration), and work systematically through the specific problem domain.
The key to successful gVisor troubleshooting is understanding that gVisor sits between your application and the host kernel, so issues can originate from:
- Application compatibility with gVisor's system call implementation
- gVisor configuration and platform selection
- Container runtime integration
- Host system configuration and resources
By following this systematic approach, you should be able to resolve most gVisor-related issues effectively.