Security Best Practices
Learn how to maximize the security benefits of gVisor while following industry best practices for container security.
Security Model Overview
gVisor provides multiple layers of security:
- Application Kernel Boundary: User applications interact with gVisor's kernel implementation
- System Call Filtering: Only necessary system calls reach the host kernel
- Memory Isolation: Memory-safe Go implementation prevents many attack vectors
- Namespace Isolation: Strong separation between sandboxed applications
Core Security Principles
Principle of Least Privilege
Always run containers with minimal required permissions:
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
runtimeClassName: gvisor
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
containers:
- name: app
image: secure-app:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if needed
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /var/cache/app
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
Defense in Depth
Layer multiple security controls:
apiVersion: v1
kind: Pod
metadata:
name: defense-in-depth
labels:
security.policy/enforce: "strict"
spec:
runtimeClassName: gvisor
# Pod-level security context
securityContext:
runAsNonRoot: true
runAsUser: 65534 # nobody
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: app:secure
# Container-level security context
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
capabilities:
drop:
- ALL
# Resource limits prevent DoS
resources:
limits:
memory: "512Mi"
cpu: "500m"
ephemeral-storage: "1Gi"
requests:
memory: "256Mi"
cpu: "100m"
# Health checks for integrity
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
# Controlled volume access
volumeMounts:
- name: app-data
mountPath: /data
readOnly: true
- name: tmp-data
mountPath: /tmp
volumes:
- name: app-data
secret:
secretName: app-data
defaultMode: 0400
- name: tmp-data
emptyDir:
sizeLimit: 100Mi
gVisor-Specific Security Configuration
Platform Security Considerations
KVM Platform (Most Secure)
[runsc]
# KVM provides hardware-level isolation
platform = "kvm"
# Restrict file system access
file-access = "exclusive"
# Disable potentially risky features
overlay = false
host-network = false
host-pid = false
# Enable additional protections
vfs2 = true
seccomp = true
ptrace Platform (Broad Compatibility)
[runsc]
# ptrace with security focus
platform = "ptrace"
# More restrictive settings for ptrace
file-access = "exclusive"
overlay = false
# Limit system call access
disable-mount = true
host-network = false
host-pid = false
host-ipc = false
System Call Filtering
Configure system call filtering for enhanced security:
[runsc]
# Enable seccomp-bpf filtering
seccomp = true
# Custom seccomp profile (optional)
seccomp-profile = "/etc/gvisor/seccomp-strict.json"
# Log blocked system calls
strace = true
debug-log = "/var/log/gvisor/security.log"
Custom seccomp profile example:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": [
"read", "write", "open", "close", "stat", "fstat", "lstat", "poll",
"lseek", "mmap", "munmap", "brk", "rt_sigaction", "rt_sigprocmask",
"ioctl", "access", "pipe", "select", "sched_yield", "mremap",
"msync", "mincore", "madvise", "shmget", "shmat", "shmctl",
"dup", "dup2", "pause", "nanosleep", "getitimer", "alarm",
"setitimer", "getpid", "sendfile", "socket", "connect", "accept",
"sendto", "recvfrom", "sendmsg", "recvmsg", "shutdown", "bind",
"listen", "getsockname", "getpeername", "socketpair", "setsockopt",
"getsockopt", "clone", "fork", "vfork", "execve", "exit", "wait4",
"kill", "uname", "semget", "semop", "semctl", "shmdt", "msgget",
"msgsnd", "msgrcv", "msgctl", "fcntl", "flock", "fsync",
"fdatasync", "truncate", "ftruncate", "getdents", "getcwd",
"chdir", "fchdir", "rename", "mkdir", "rmdir", "creat", "link",
"unlink", "symlink", "readlink", "chmod", "fchmod", "chown",
"fchown", "lchown", "umask", "gettimeofday", "getrlimit", "getrusage",
"sysinfo", "times", "ptrace", "getuid", "syslog", "getgid", "setuid",
"setgid", "geteuid", "getegid", "setpgid", "getppid", "getpgrp",
"setsid", "setreuid", "setregid", "getgroups", "setgroups",
"setresuid", "getresuid", "setresgid", "getresgid", "getpgid",
"setfsuid", "setfsgid", "getsid", "capget", "capset", "rt_sigpending",
"rt_sigtimedwait", "rt_sigqueueinfo", "rt_sigsuspend", "sigaltstack",
"utime", "mknod", "personality", "ustat", "statfs", "fstatfs",
"sysfs", "getpriority", "setpriority", "sched_setparam",
"sched_getparam", "sched_setscheduler", "sched_getscheduler",
"sched_get_priority_max", "sched_get_priority_min", "sched_rr_get_interval",
"mlock", "munlock", "mlockall", "munlockall", "vhangup", "modify_ldt",
"pivot_root", "_sysctl", "prctl", "arch_prctl", "adjtimex", "setrlimit",
"chroot", "sync", "acct", "settimeofday", "mount", "umount2",
"swapon", "swapoff", "reboot", "sethostname", "setdomainname",
"iopl", "ioperm", "create_module", "init_module", "delete_module",
"get_kernel_syms", "query_module", "quotactl", "nfsservctl",
"getpmsg", "putpmsg", "afs_syscall", "tuxcall", "security",
"gettid", "readahead", "setxattr", "lsetxattr", "fsetxattr",
"getxattr", "lgetxattr", "fgetxattr", "listxattr", "llistxattr",
"flistxattr", "removexattr", "lremovexattr", "fremovexattr",
"tkill", "time", "futex", "sched_setaffinity", "sched_getaffinity",
"set_thread_area", "io_setup", "io_destroy", "io_getevents",
"io_submit", "io_cancel", "get_thread_area", "lookup_dcookie",
"epoll_create", "epoll_ctl_old", "epoll_wait_old", "remap_file_pages",
"getdents64", "set_tid_address", "restart_syscall", "semtimedop",
"fadvise64", "timer_create", "timer_settime", "timer_gettime",
"timer_getoverrun", "timer_delete", "clock_settime", "clock_gettime",
"clock_getres", "clock_nanosleep", "exit_group", "epoll_wait",
"epoll_ctl", "tgkill", "utimes", "vserver", "mbind", "set_mempolicy",
"get_mempolicy", "mq_open", "mq_unlink", "mq_timedsend", "mq_timedreceive",
"mq_notify", "mq_getsetattr", "kexec_load", "waitid", "add_key",
"request_key", "keyctl", "ioprio_set", "ioprio_get", "inotify_init",
"inotify_add_watch", "inotify_rm_watch", "migrate_pages", "openat",
"mkdirat", "mknodat", "fchownat", "futimesat", "newfstatat", "unlinkat",
"renameat", "linkat", "symlinkat", "readlinkat", "fchmodat", "faccessat",
"pselect6", "ppoll", "unshare", "set_robust_list", "get_robust_list",
"splice", "tee", "sync_file_range", "vmsplice", "move_pages",
"utimensat", "epoll_pwait", "signalfd", "timerfd_create", "eventfd",
"fallocate", "timerfd_settime", "timerfd_gettime", "accept4", "signalfd4",
"eventfd2", "epoll_create1", "dup3", "pipe2", "inotify_init1",
"preadv", "pwritev", "rt_tgsigqueueinfo", "perf_event_open", "recvmmsg",
"fanotify_init", "fanotify_mark", "prlimit64", "name_to_handle_at",
"open_by_handle_at", "clock_adjtime", "syncfs", "sendmmsg", "setns",
"getcpu", "process_vm_readv", "process_vm_writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
Network Security
Network Isolation
Implement strong network isolation:
# NetworkPolicy for gVisor workloads
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: secure-app-network-policy
namespace: secure-apps
spec:
podSelector:
matchLabels:
app: secure-app
security-tier: high
policyTypes:
- Ingress
- Egress
ingress:
# Only allow ingress from specific namespaces
- from:
- namespaceSelector:
matchLabels:
name: trusted-gateway
ports:
- protocol: TCP
port: 8080
egress:
# Only allow egress to database and external APIs
- to:
- namespaceSelector:
matchLabels:
name: database
ports:
- protocol: TCP
port: 5432
- to: [] # Allow DNS
ports:
- protocol: UDP
port: 53
- to:
- namespaceSelector:
matchLabels:
name: external-apis
ports:
- protocol: TCP
port: 443
Service Mesh Security
Integrate with Istio for enhanced network security:
# Strict mTLS for gVisor workloads
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: gvisor-strict-mtls
namespace: secure-apps
spec:
selector:
matchLabels:
runtime: gvisor
mtls:
mode: STRICT
---
# Authorization policy for gVisor services
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: gvisor-authz
namespace: secure-apps
spec:
selector:
matchLabels:
runtime: gvisor
rules:
# Only allow authenticated requests
- from:
- source:
principals: ["cluster.local/ns/secure-apps/sa/gateway"]
requestPrincipals: ["cluster.local/ns/secure-apps/sa/gateway"]
to:
- operation:
methods: ["GET", "POST"]
when:
- key: request.headers[x-security-token]
values: ["valid-token"]
Storage Security
Secure Volume Mounting
Implement secure volume practices:
apiVersion: v1
kind: Pod
metadata:
name: secure-storage-pod
spec:
runtimeClassName: gvisor
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 2000
containers:
- name: app
image: secure-app:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
volumeMounts:
# Secrets mounted read-only with restricted permissions
- name: app-secrets
mountPath: /etc/secrets
readOnly: true
# Configuration mounted read-only
- name: app-config
mountPath: /etc/config
readOnly: true
# Writable areas on tmpfs/emptyDir
- name: tmp-data
mountPath: /tmp
- name: app-cache
mountPath: /var/cache
# Persistent data with specific mount options
- name: app-data
mountPath: /data
mountPropagation: None
volumes:
- name: app-secrets
secret:
secretName: app-secrets
defaultMode: 0400 # Read-only for owner
- name: app-config
configMap:
name: app-config
defaultMode: 0644
- name: tmp-data
emptyDir:
medium: Memory # Use tmpfs for temporary data
sizeLimit: 100Mi
- name: app-cache
emptyDir:
sizeLimit: 500Mi
- name: app-data
persistentVolumeClaim:
claimName: app-data-pvc
---
# PVC with security considerations
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: encrypted-ssd # Use encrypted storage class
Encrypted Storage
Configure encrypted persistent volumes:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: encrypted-gvisor-storage
provisioner: kubernetes.io/aws-ebs # or appropriate provisioner
parameters:
type: gp3
encrypted: "true"
kmsKeyId: "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: encrypted-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: encrypted-gvisor-storage
resources:
requests:
storage: 20Gi
Secrets Management
Secure Secret Handling
Implement proper secrets management:
# External Secrets Operator configuration
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets-manager
namespace: secure-apps
spec:
provider:
aws:
service: SecretsManager
region: us-west-2
auth:
secretRef:
accessKeyID:
name: aws-credentials
key: access-key-id
secretAccessKey:
name: aws-credentials
key: secret-access-key
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: secure-apps
spec:
refreshInterval: 5m
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: prod/database
property: password
- secretKey: api-key
remoteRef:
key: prod/external-api
property: key
---
# Pod using external secrets
apiVersion: v1
kind: Pod
metadata:
name: secure-app-with-secrets
spec:
runtimeClassName: gvisor
containers:
- name: app
image: secure-app:latest
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: database-password
- name: API_KEY
valueFrom:
secretKeyRef:
name: app-secrets
key: api-key
volumeMounts:
- name: secrets-volume
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secrets-volume
secret:
secretName: app-secrets
defaultMode: 0400
Runtime Security Monitoring
Security Event Monitoring
Monitor security events with Falco:
# Falco rules for gVisor containers
apiVersion: v1
kind: ConfigMap
metadata:
name: falco-gvisor-rules
namespace: falco
data:
gvisor-rules.yaml: |
- rule: Unexpected gVisor System Call
desc: Detect unexpected system calls in gVisor containers
condition: >
spawned_process and
container.runtime = "runsc" and
proc.name in (suspicious_binaries)
output: >
Suspicious process in gVisor container
(user=%user.name container=%container.id image=%container.image.repository proc=%proc.cmdline)
priority: WARNING
- rule: gVisor Container Privilege Escalation
desc: Detect privilege escalation attempts in gVisor containers
condition: >
spawned_process and
container.runtime = "runsc" and
(proc.name in (su, sudo, doas) or
proc.name startswith setuid or
proc.name startswith setgid)
output: >
Privilege escalation attempt in gVisor container
(user=%user.name container=%container.id proc=%proc.cmdline)
priority: CRITICAL
- rule: gVisor File System Modification
desc: Monitor unauthorized file system modifications
condition: >
open_write and
container.runtime = "runsc" and
fd.typechar = 'f' and
(fd.filename startswith /etc or
fd.filename startswith /usr/bin or
fd.filename startswith /usr/sbin)
output: >
Unauthorized file modification in gVisor container
(user=%user.name file=%fd.filename container=%container.id)
priority: WARNING
- rule: gVisor Network Connection
desc: Monitor unexpected network connections
condition: >
outbound and
container.runtime = "runsc" and
not fd.sip in (allowed_destinations)
output: >
Unexpected network connection from gVisor container
(user=%user.name dest=%fd.sip:%fd.sport container=%container.id)
priority: INFO
---
# Falco deployment for gVisor monitoring
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: falco-gvisor
namespace: falco
spec:
selector:
matchLabels:
app: falco-gvisor
template:
metadata:
labels:
app: falco-gvisor
spec:
hostNetwork: true
hostPID: true
containers:
- name: falco
image: falcosecurity/falco:latest
args:
- /usr/bin/falco
- --cri
- /host/var/run/containerd/containerd.sock
- -K
- /var/run/secrets/kubernetes.io/serviceaccount/token
- -k
- https://kubernetes.default
- -pk
volumeMounts:
- name: containerd-sock
mountPath: /host/var/run/containerd/containerd.sock
- name: falco-config
mountPath: /etc/falco
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: boot
mountPath: /host/boot
- name: lib-modules
mountPath: /host/lib/modules
- name: usr
mountPath: /host/usr
- name: etc
mountPath: /host/etc
volumes:
- name: containerd-sock
hostPath:
path: /var/run/containerd/containerd.sock
- name: falco-config
configMap:
name: falco-gvisor-rules
- name: dev
hostPath:
path: /dev
- name: proc
hostPath:
path: /proc
- name: boot
hostPath:
path: /boot
- name: lib-modules
hostPath:
path: /lib/modules
- name: usr
hostPath:
path: /usr
- name: etc
hostPath:
path: /etc
Compliance and Governance
Policy Enforcement with OPA Gatekeeper
Enforce security policies across all gVisor workloads:
# Require gVisor for sensitive workloads
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: requiregvisor
spec:
crd:
spec:
names:
kind: RequireGvisor
validation:
openAPIV3Schema:
type: object
properties:
namespaces:
type: array
items:
type: string
exemptUsers:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package requiregvisor
violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.metadata.namespace in input.parameters.namespaces
not input.review.object.spec.runtimeClassName == "gvisor"
not input.review.userInfo.username in input.parameters.exemptUsers
msg := sprintf("Pod %s in namespace %s must use gVisor runtime", [input.review.object.metadata.name, input.review.object.metadata.namespace])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireGvisor
metadata:
name: secure-namespaces-require-gvisor
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
namespaces: ["financial", "healthcare", "pci-compliant"]
exemptUsers: ["system:admin", "ci-cd-service"]
---
# Security Context Policy
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: gvisorsecuritycontext
spec:
crd:
spec:
names:
kind: GvisorSecurityContext
validation:
openAPIV3Schema:
type: object
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package gvisorsecuritycontext
violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.spec.runtimeClassName == "gvisor"
container := input.review.object.spec.containers[_]
# Check for required security context
not container.securityContext.allowPrivilegeEscalation == false
msg := sprintf("gVisor container %s must set allowPrivilegeEscalation to false", [container.name])
}
violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.spec.runtimeClassName == "gvisor"
container := input.review.object.spec.containers[_]
# Check for read-only root filesystem
not container.securityContext.readOnlyRootFilesystem == true
msg := sprintf("gVisor container %s must use read-only root filesystem", [container.name])
}
violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
input.review.object.spec.runtimeClassName == "gvisor"
# Check for non-root user
not input.review.object.spec.securityContext.runAsNonRoot == true
msg := "gVisor pods must run as non-root user"
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: GvisorSecurityContext
metadata:
name: enforce-gvisor-security-context
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
Incident Response
Security Incident Playbook
Define procedures for security incidents involving gVisor workloads:
# Incident response automation
apiVersion: v1
kind: ConfigMap
metadata:
name: incident-response-scripts
namespace: security
data:
isolate-pod.sh: |
#!/bin/bash
# Isolate compromised gVisor pod
POD_NAME=$1
NAMESPACE=$2
# Apply network isolation
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: isolate-${POD_NAME}
namespace: ${NAMESPACE}
spec:
podSelector:
matchLabels:
app: ${POD_NAME}
policyTypes:
- Ingress
- Egress
# No ingress or egress rules = deny all
EOF
# Label pod as compromised
kubectl label pod ${POD_NAME} -n ${NAMESPACE} security.status=compromised
# Scale down deployment if applicable
DEPLOYMENT=$(kubectl get pod ${POD_NAME} -n ${NAMESPACE} -o jsonpath='{.metadata.ownerReferences[0].name}')
if [ "$DEPLOYMENT" != "" ]; then
kubectl scale deployment ${DEPLOYMENT} -n ${NAMESPACE} --replicas=0
fi
# Trigger forensics collection
kubectl create job forensics-${POD_NAME} --from=cronjob/forensics-collector -n security
collect-evidence.sh: |
#!/bin/bash
# Collect forensic evidence from gVisor container
POD_NAME=$1
NAMESPACE=$2
EVIDENCE_DIR="/tmp/evidence/${POD_NAME}-$(date +%Y%m%d-%H%M%S)"
mkdir -p ${EVIDENCE_DIR}
# Collect pod definition
kubectl get pod ${POD_NAME} -n ${NAMESPACE} -o yaml > ${EVIDENCE_DIR}/pod.yaml
# Collect logs
kubectl logs ${POD_NAME} -n ${NAMESPACE} --previous > ${EVIDENCE_DIR}/logs-previous.txt
kubectl logs ${POD_NAME} -n ${NAMESPACE} > ${EVIDENCE_DIR}/logs-current.txt
# Collect events
kubectl get events --field-selector involvedObject.name=${POD_NAME} -n ${NAMESPACE} -o yaml > ${EVIDENCE_DIR}/events.yaml
# Collect gVisor debug info
kubectl exec ${POD_NAME} -n ${NAMESPACE} -- runsc --debug debug --pid=1 > ${EVIDENCE_DIR}/gvisor-debug.txt 2>/dev/null || true
# Collect network connections
kubectl exec ${POD_NAME} -n ${NAMESPACE} -- netstat -tulpn > ${EVIDENCE_DIR}/network-connections.txt 2>/dev/null || true
# Collect process list
kubectl exec ${POD_NAME} -n ${NAMESPACE} -- ps aux > ${EVIDENCE_DIR}/processes.txt 2>/dev/null || true
# Create evidence archive
tar czf ${EVIDENCE_DIR}.tar.gz -C /tmp/evidence $(basename ${EVIDENCE_DIR})
echo "Evidence collected: ${EVIDENCE_DIR}.tar.gz"
Security Hardening Checklist
Use this checklist to ensure your gVisor deployments follow security best practices:
✅ Platform Configuration
- Use KVM platform when available for hardware-level isolation
- Use exclusive file access for highest security workloads
- Enable seccomp filtering
- Disable unnecessary features (mount, host networking, etc.)
- Configure appropriate resource limits
✅ Container Security
- Run as non-root user
- Use read-only root filesystem
- Drop all capabilities, add only necessary ones
- Set allowPrivilegeEscalation to false
- Use distroless or minimal base images
- Implement proper health checks
✅ Network Security
- Implement NetworkPolicies for micro-segmentation
- Use service mesh with mTLS when available
- Restrict egress to known destinations
- Monitor network traffic for anomalies
- Use encrypted communication channels
✅ Storage Security
- Use encrypted storage classes
- Mount secrets with restrictive permissions
- Use tmpfs for temporary data
- Implement proper backup and recovery
- Regular vulnerability scanning of images
✅ Monitoring and Compliance
- Deploy runtime security monitoring (Falco)
- Implement policy enforcement (OPA Gatekeeper)
- Enable audit logging
- Regular security assessments
- Incident response procedures defined
✅ Operational Security
- Regular gVisor updates
- Automated vulnerability scanning
- Access control to Kubernetes API
- Secrets management with external providers
- Regular backup and disaster recovery testing
Following these security best practices ensures that you maximize the security benefits of gVisor while maintaining operational excellence.
Next Steps
Continue learning about gVisor:
- Troubleshooting - Diagnose and resolve issues
- Advanced Examples - Complex deployment scenarios