Skip to main content

Security Best Practices

Learn how to maximize the security benefits of gVisor while following industry best practices for container security.

Security Model Overview

gVisor provides multiple layers of security:

  1. Application Kernel Boundary: User applications interact with gVisor's kernel implementation
  2. System Call Filtering: Only necessary system calls reach the host kernel
  3. Memory Isolation: Memory-safe Go implementation prevents many attack vectors
  4. Namespace Isolation: Strong separation between sandboxed applications

Core Security Principles

Principle of Least Privilege

Always run containers with minimal required permissions:

apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
runtimeClassName: gvisor
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
containers:
- name: app
image: secure-app:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if needed
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /var/cache/app
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}

Defense in Depth

Layer multiple security controls:

apiVersion: v1
kind: Pod
metadata:
name: defense-in-depth
labels:
security.policy/enforce: "strict"
spec:
runtimeClassName: gvisor
# Pod-level security context
securityContext:
runAsNonRoot: true
runAsUser: 65534 # nobody
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: app:secure
# Container-level security context
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
capabilities:
drop:
- ALL
# Resource limits prevent DoS
resources:
limits:
memory: "512Mi"
cpu: "500m"
ephemeral-storage: "1Gi"
requests:
memory: "256Mi"
cpu: "100m"
# Health checks for integrity
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
# Controlled volume access
volumeMounts:
- name: app-data
mountPath: /data
readOnly: true
- name: tmp-data
mountPath: /tmp
volumes:
- name: app-data
secret:
secretName: app-data
defaultMode: 0400
- name: tmp-data
emptyDir:
sizeLimit: 100Mi

gVisor-Specific Security Configuration

Platform Security Considerations

KVM Platform (Most Secure)

[runsc]
# KVM provides hardware-level isolation
platform = "kvm"

# Restrict file system access
file-access = "exclusive"

# Disable potentially risky features
overlay = false
host-network = false
host-pid = false

# Enable additional protections
vfs2 = true
seccomp = true

ptrace Platform (Broad Compatibility)

[runsc]
# ptrace with security focus
platform = "ptrace"

# More restrictive settings for ptrace
file-access = "exclusive"
overlay = false

# Limit system call access
disable-mount = true
host-network = false
host-pid = false
host-ipc = false

System Call Filtering

Configure system call filtering for enhanced security:

[runsc]
# Enable seccomp-bpf filtering
seccomp = true

# Custom seccomp profile (optional)
seccomp-profile = "/etc/gvisor/seccomp-strict.json"

# Log blocked system calls
strace = true
debug-log = "/var/log/gvisor/security.log"

Custom seccomp profile example:

{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": [
"read", "write", "open", "close", "stat", "fstat", "lstat", "poll",
"lseek", "mmap", "munmap", "brk", "rt_sigaction", "rt_sigprocmask",
"ioctl", "access", "pipe", "select", "sched_yield", "mremap",
"msync", "mincore", "madvise", "shmget", "shmat", "shmctl",
"dup", "dup2", "pause", "nanosleep", "getitimer", "alarm",
"setitimer", "getpid", "sendfile", "socket", "connect", "accept",
"sendto", "recvfrom", "sendmsg", "recvmsg", "shutdown", "bind",
"listen", "getsockname", "getpeername", "socketpair", "setsockopt",
"getsockopt", "clone", "fork", "vfork", "execve", "exit", "wait4",
"kill", "uname", "semget", "semop", "semctl", "shmdt", "msgget",
"msgsnd", "msgrcv", "msgctl", "fcntl", "flock", "fsync",
"fdatasync", "truncate", "ftruncate", "getdents", "getcwd",
"chdir", "fchdir", "rename", "mkdir", "rmdir", "creat", "link",
"unlink", "symlink", "readlink", "chmod", "fchmod", "chown",
"fchown", "lchown", "umask", "gettimeofday", "getrlimit", "getrusage",
"sysinfo", "times", "ptrace", "getuid", "syslog", "getgid", "setuid",
"setgid", "geteuid", "getegid", "setpgid", "getppid", "getpgrp",
"setsid", "setreuid", "setregid", "getgroups", "setgroups",
"setresuid", "getresuid", "setresgid", "getresgid", "getpgid",
"setfsuid", "setfsgid", "getsid", "capget", "capset", "rt_sigpending",
"rt_sigtimedwait", "rt_sigqueueinfo", "rt_sigsuspend", "sigaltstack",
"utime", "mknod", "personality", "ustat", "statfs", "fstatfs",
"sysfs", "getpriority", "setpriority", "sched_setparam",
"sched_getparam", "sched_setscheduler", "sched_getscheduler",
"sched_get_priority_max", "sched_get_priority_min", "sched_rr_get_interval",
"mlock", "munlock", "mlockall", "munlockall", "vhangup", "modify_ldt",
"pivot_root", "_sysctl", "prctl", "arch_prctl", "adjtimex", "setrlimit",
"chroot", "sync", "acct", "settimeofday", "mount", "umount2",
"swapon", "swapoff", "reboot", "sethostname", "setdomainname",
"iopl", "ioperm", "create_module", "init_module", "delete_module",
"get_kernel_syms", "query_module", "quotactl", "nfsservctl",
"getpmsg", "putpmsg", "afs_syscall", "tuxcall", "security",
"gettid", "readahead", "setxattr", "lsetxattr", "fsetxattr",
"getxattr", "lgetxattr", "fgetxattr", "listxattr", "llistxattr",
"flistxattr", "removexattr", "lremovexattr", "fremovexattr",
"tkill", "time", "futex", "sched_setaffinity", "sched_getaffinity",
"set_thread_area", "io_setup", "io_destroy", "io_getevents",
"io_submit", "io_cancel", "get_thread_area", "lookup_dcookie",
"epoll_create", "epoll_ctl_old", "epoll_wait_old", "remap_file_pages",
"getdents64", "set_tid_address", "restart_syscall", "semtimedop",
"fadvise64", "timer_create", "timer_settime", "timer_gettime",
"timer_getoverrun", "timer_delete", "clock_settime", "clock_gettime",
"clock_getres", "clock_nanosleep", "exit_group", "epoll_wait",
"epoll_ctl", "tgkill", "utimes", "vserver", "mbind", "set_mempolicy",
"get_mempolicy", "mq_open", "mq_unlink", "mq_timedsend", "mq_timedreceive",
"mq_notify", "mq_getsetattr", "kexec_load", "waitid", "add_key",
"request_key", "keyctl", "ioprio_set", "ioprio_get", "inotify_init",
"inotify_add_watch", "inotify_rm_watch", "migrate_pages", "openat",
"mkdirat", "mknodat", "fchownat", "futimesat", "newfstatat", "unlinkat",
"renameat", "linkat", "symlinkat", "readlinkat", "fchmodat", "faccessat",
"pselect6", "ppoll", "unshare", "set_robust_list", "get_robust_list",
"splice", "tee", "sync_file_range", "vmsplice", "move_pages",
"utimensat", "epoll_pwait", "signalfd", "timerfd_create", "eventfd",
"fallocate", "timerfd_settime", "timerfd_gettime", "accept4", "signalfd4",
"eventfd2", "epoll_create1", "dup3", "pipe2", "inotify_init1",
"preadv", "pwritev", "rt_tgsigqueueinfo", "perf_event_open", "recvmmsg",
"fanotify_init", "fanotify_mark", "prlimit64", "name_to_handle_at",
"open_by_handle_at", "clock_adjtime", "syncfs", "sendmmsg", "setns",
"getcpu", "process_vm_readv", "process_vm_writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}

Network Security

Network Isolation

Implement strong network isolation:

# NetworkPolicy for gVisor workloads
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: secure-app-network-policy
namespace: secure-apps
spec:
podSelector:
matchLabels:
app: secure-app
security-tier: high
policyTypes:
- Ingress
- Egress
ingress:
# Only allow ingress from specific namespaces
- from:
- namespaceSelector:
matchLabels:
name: trusted-gateway
ports:
- protocol: TCP
port: 8080
egress:
# Only allow egress to database and external APIs
- to:
- namespaceSelector:
matchLabels:
name: database
ports:
- protocol: TCP
port: 5432
- to: [] # Allow DNS
ports:
- protocol: UDP
port: 53
- to:
- namespaceSelector:
matchLabels:
name: external-apis
ports:
- protocol: TCP
port: 443

Service Mesh Security

Integrate with Istio for enhanced network security:

# Strict mTLS for gVisor workloads
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: gvisor-strict-mtls
namespace: secure-apps
spec:
selector:
matchLabels:
runtime: gvisor
mtls:
mode: STRICT
---
# Authorization policy for gVisor services
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: gvisor-authz
namespace: secure-apps
spec:
selector:
matchLabels:
runtime: gvisor
rules:
# Only allow authenticated requests
- from:
- source:
principals: ["cluster.local/ns/secure-apps/sa/gateway"]
requestPrincipals: ["cluster.local/ns/secure-apps/sa/gateway"]
to:
- operation:
methods: ["GET", "POST"]
when:
- key: request.headers[x-security-token]
values: ["valid-token"]

Storage Security

Secure Volume Mounting

Implement secure volume practices:

apiVersion: v1
kind: Pod
metadata:
name: secure-storage-pod
spec:
runtimeClassName: gvisor
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 2000
containers:
- name: app
image: secure-app:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
volumeMounts:
# Secrets mounted read-only with restricted permissions
- name: app-secrets
mountPath: /etc/secrets
readOnly: true
# Configuration mounted read-only
- name: app-config
mountPath: /etc/config
readOnly: true
# Writable areas on tmpfs/emptyDir
- name: tmp-data
mountPath: /tmp
- name: app-cache
mountPath: /var/cache
# Persistent data with specific mount options
- name: app-data
mountPath: /data
mountPropagation: None
volumes:
- name: app-secrets
secret:
secretName: app-secrets
defaultMode: 0400 # Read-only for owner
- name: app-config
configMap:
name: app-config
defaultMode: 0644
- name: tmp-data
emptyDir:
medium: Memory # Use tmpfs for temporary data
sizeLimit: 100Mi
- name: app-cache
emptyDir:
sizeLimit: 500Mi
- name: app-data
persistentVolumeClaim:
claimName: app-data-pvc
---
# PVC with security considerations
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: encrypted-ssd # Use encrypted storage class

Encrypted Storage

Configure encrypted persistent volumes:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: encrypted-gvisor-storage
provisioner: kubernetes.io/aws-ebs # or appropriate provisioner
parameters:
type: gp3
encrypted: "true"
kmsKeyId: "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: encrypted-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: encrypted-gvisor-storage
resources:
requests:
storage: 20Gi

Secrets Management

Secure Secret Handling

Implement proper secrets management:

# External Secrets Operator configuration
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets-manager
namespace: secure-apps
spec:
provider:
aws:
service: SecretsManager
region: us-west-2
auth:
secretRef:
accessKeyID:
name: aws-credentials
key: access-key-id
secretAccessKey:
name: aws-credentials
key: secret-access-key
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: secure-apps
spec:
refreshInterval: 5m
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: prod/database
property: password
- secretKey: api-key
remoteRef:
key: prod/external-api
property: key
---
# Pod using external secrets
apiVersion: v1
kind: Pod
metadata:
name: secure-app-with-secrets
spec:
runtimeClassName: gvisor
containers:
- name: app
image: secure-app:latest
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: database-password
- name: API_KEY
valueFrom:
secretKeyRef:
name: app-secrets
key: api-key
volumeMounts:
- name: secrets-volume
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secrets-volume
secret:
secretName: app-secrets
defaultMode: 0400

Runtime Security Monitoring

Security Event Monitoring

Monitor security events with Falco:

# Falco rules for gVisor containers
apiVersion: v1
kind: ConfigMap
metadata:
name: falco-gvisor-rules
namespace: falco
data:
gvisor-rules.yaml: |
- rule: Unexpected gVisor System Call
desc: Detect unexpected system calls in gVisor containers
condition: >
spawned_process and
container.runtime = "runsc" and
proc.name in (suspicious_binaries)
output: >
Suspicious process in gVisor container
(user=%user.name container=%container.id image=%container.image.repository proc=%proc.cmdline)
priority: WARNING

- rule: gVisor Container Privilege Escalation
desc: Detect privilege escalation attempts in gVisor containers
condition: >
spawned_process and
container.runtime = "runsc" and
(proc.name in (su, sudo, doas) or
proc.name startswith setuid or
proc.name startswith setgid)
output: >
Privilege escalation attempt in gVisor container
(user=%user.name container=%container.id proc=%proc.cmdline)
priority: CRITICAL

- rule: gVisor File System Modification
desc: Monitor unauthorized file system modifications
condition: >
open_write and
container.runtime = "runsc" and
fd.typechar = 'f' and
(fd.filename startswith /etc or
fd.filename startswith /usr/bin or
fd.filename startswith /usr/sbin)
output: >
Unauthorized file modification in gVisor container
(user=%user.name file=%fd.filename container=%container.id)
priority: WARNING

- rule: gVisor Network Connection
desc: Monitor unexpected network connections
condition: >
outbound and
container.runtime = "runsc" and
not fd.sip in (allowed_destinations)
output: >
Unexpected network connection from gVisor container
(user=%user.name dest=%fd.sip:%fd.sport container=%container.id)
priority: INFO
---
# Falco deployment for gVisor monitoring
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: falco-gvisor
namespace: falco
spec:
selector:
matchLabels:
app: falco-gvisor
template:
metadata:
labels:
app: falco-gvisor
spec:
hostNetwork: true
hostPID: true
containers:
- name: falco
image: falcosecurity/falco:latest
args:
- /usr/bin/falco
- --cri
- /host/var/run/containerd/containerd.sock
- -K
- /var/run/secrets/kubernetes.io/serviceaccount/token
- -k
- https://kubernetes.default
- -pk
volumeMounts:
- name: containerd-sock
mountPath: /host/var/run/containerd/containerd.sock
- name: falco-config
mountPath: /etc/falco
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: boot
mountPath: /host/boot
- name: lib-modules
mountPath: /host/lib/modules
- name: usr
mountPath: /host/usr
- name: etc
mountPath: /host/etc
volumes:
- name: containerd-sock
hostPath:
path: /var/run/containerd/containerd.sock
- name: falco-config
configMap:
name: falco-gvisor-rules
- name: dev
hostPath:
path: /dev
- name: proc
hostPath:
path: /proc
- name: boot
hostPath:
path: /boot
- name: lib-modules
hostPath:
path: /lib/modules
- name: usr
hostPath:
path: /usr
- name: etc
hostPath:
path: /etc

Compliance and Governance

Policy Enforcement with OPA Gatekeeper

Enforce security policies across all gVisor workloads:

# Require gVisor for sensitive workloads
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: requiregvisor
spec:
crd:
spec:
names:
kind: RequireGvisor
validation:
openAPIV3Schema:
type: object
properties:
namespaces:
type: array
items:
type: string
exemptUsers:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package requiregvisor

violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.metadata.namespace in input.parameters.namespaces
not input.review.object.spec.runtimeClassName == "gvisor"
not input.review.userInfo.username in input.parameters.exemptUsers
msg := sprintf("Pod %s in namespace %s must use gVisor runtime", [input.review.object.metadata.name, input.review.object.metadata.namespace])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireGvisor
metadata:
name: secure-namespaces-require-gvisor
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
namespaces: ["financial", "healthcare", "pci-compliant"]
exemptUsers: ["system:admin", "ci-cd-service"]
---
# Security Context Policy
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: gvisorsecuritycontext
spec:
crd:
spec:
names:
kind: GvisorSecurityContext
validation:
openAPIV3Schema:
type: object
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package gvisorsecuritycontext

violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.spec.runtimeClassName == "gvisor"
container := input.review.object.spec.containers[_]

# Check for required security context
not container.securityContext.allowPrivilegeEscalation == false
msg := sprintf("gVisor container %s must set allowPrivilegeEscalation to false", [container.name])
}

violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.spec.runtimeClassName == "gvisor"
container := input.review.object.spec.containers[_]

# Check for read-only root filesystem
not container.securityContext.readOnlyRootFilesystem == true
msg := sprintf("gVisor container %s must use read-only root filesystem", [container.name])
}

violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.spec.runtimeClassName == "gvisor"

# Check for non-root user
not input.review.object.spec.securityContext.runAsNonRoot == true
msg := "gVisor pods must run as non-root user"
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: GvisorSecurityContext
metadata:
name: enforce-gvisor-security-context
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]

Incident Response

Security Incident Playbook

Define procedures for security incidents involving gVisor workloads:

# Incident response automation
apiVersion: v1
kind: ConfigMap
metadata:
name: incident-response-scripts
namespace: security
data:
isolate-pod.sh: |
#!/bin/bash
# Isolate compromised gVisor pod

POD_NAME=$1
NAMESPACE=$2

# Apply network isolation
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: isolate-${POD_NAME}
namespace: ${NAMESPACE}
spec:
podSelector:
matchLabels:
app: ${POD_NAME}
policyTypes:
- Ingress
- Egress
# No ingress or egress rules = deny all
EOF

# Label pod as compromised
kubectl label pod ${POD_NAME} -n ${NAMESPACE} security.status=compromised

# Scale down deployment if applicable
DEPLOYMENT=$(kubectl get pod ${POD_NAME} -n ${NAMESPACE} -o jsonpath='{.metadata.ownerReferences[0].name}')
if [ "$DEPLOYMENT" != "" ]; then
kubectl scale deployment ${DEPLOYMENT} -n ${NAMESPACE} --replicas=0
fi

# Trigger forensics collection
kubectl create job forensics-${POD_NAME} --from=cronjob/forensics-collector -n security

collect-evidence.sh: |
#!/bin/bash
# Collect forensic evidence from gVisor container

POD_NAME=$1
NAMESPACE=$2
EVIDENCE_DIR="/tmp/evidence/${POD_NAME}-$(date +%Y%m%d-%H%M%S)"

mkdir -p ${EVIDENCE_DIR}

# Collect pod definition
kubectl get pod ${POD_NAME} -n ${NAMESPACE} -o yaml > ${EVIDENCE_DIR}/pod.yaml

# Collect logs
kubectl logs ${POD_NAME} -n ${NAMESPACE} --previous > ${EVIDENCE_DIR}/logs-previous.txt
kubectl logs ${POD_NAME} -n ${NAMESPACE} > ${EVIDENCE_DIR}/logs-current.txt

# Collect events
kubectl get events --field-selector involvedObject.name=${POD_NAME} -n ${NAMESPACE} -o yaml > ${EVIDENCE_DIR}/events.yaml

# Collect gVisor debug info
kubectl exec ${POD_NAME} -n ${NAMESPACE} -- runsc --debug debug --pid=1 > ${EVIDENCE_DIR}/gvisor-debug.txt 2>/dev/null || true

# Collect network connections
kubectl exec ${POD_NAME} -n ${NAMESPACE} -- netstat -tulpn > ${EVIDENCE_DIR}/network-connections.txt 2>/dev/null || true

# Collect process list
kubectl exec ${POD_NAME} -n ${NAMESPACE} -- ps aux > ${EVIDENCE_DIR}/processes.txt 2>/dev/null || true

# Create evidence archive
tar czf ${EVIDENCE_DIR}.tar.gz -C /tmp/evidence $(basename ${EVIDENCE_DIR})

echo "Evidence collected: ${EVIDENCE_DIR}.tar.gz"

Security Hardening Checklist

Use this checklist to ensure your gVisor deployments follow security best practices:

✅ Platform Configuration

  • Use KVM platform when available for hardware-level isolation
  • Use exclusive file access for highest security workloads
  • Enable seccomp filtering
  • Disable unnecessary features (mount, host networking, etc.)
  • Configure appropriate resource limits

✅ Container Security

  • Run as non-root user
  • Use read-only root filesystem
  • Drop all capabilities, add only necessary ones
  • Set allowPrivilegeEscalation to false
  • Use distroless or minimal base images
  • Implement proper health checks

✅ Network Security

  • Implement NetworkPolicies for micro-segmentation
  • Use service mesh with mTLS when available
  • Restrict egress to known destinations
  • Monitor network traffic for anomalies
  • Use encrypted communication channels

✅ Storage Security

  • Use encrypted storage classes
  • Mount secrets with restrictive permissions
  • Use tmpfs for temporary data
  • Implement proper backup and recovery
  • Regular vulnerability scanning of images

✅ Monitoring and Compliance

  • Deploy runtime security monitoring (Falco)
  • Implement policy enforcement (OPA Gatekeeper)
  • Enable audit logging
  • Regular security assessments
  • Incident response procedures defined

✅ Operational Security

  • Regular gVisor updates
  • Automated vulnerability scanning
  • Access control to Kubernetes API
  • Secrets management with external providers
  • Regular backup and disaster recovery testing

Following these security best practices ensures that you maximize the security benefits of gVisor while maintaining operational excellence.

Next Steps

Continue learning about gVisor: