Zero Trust Security in Kubernetes: Kyverno & Istio Ambient

An Aspiring DevOps Engineer passionate about automation, CI/CD, and cloud technologies. On a journey to simplify and optimize development workflows.
Subtitle: How we transformed a GKE cluster from a "Development Playground" into a "Zero Trust Fortress" using Policy-as-Code and Sidecar-less Mesh.
Introduction: The Missing Layer
Welcome back to the Building a Production-Grade SRE Platform on Kubernetes series.
In the previous posts, we built a platform that works:
Part 1: Infrastructure (GKE)
Part 2: GitOps Engine (ArgoCD)
Part 3: Observability (LGTM)
Part 4: The CI/CD Factory
But just working isn't enough. In a shared Kubernetes environment, a single bad deployment, like a container running as root can compromise the entire node. In this post, we stop trusting developers implicitly and start enforcing rules.
We implemented strict enforcement at two critical layers:
Admission Control: Blocking insecure configurations before they even enter the cluster.
Network Security: Encrypting traffic after it enters, with zero code changes.
The Tech Stack
Governance: Kyverno (Kubernetes Native Policy Engine)
Service Mesh: Istio Ambient Mesh (Sidecar-less Architecture)
Data Plane: Ztunnel (Rust-based Zero Trust Tunnel)
Target App:
manifest-generator(Go API)
Step 1: The Foundation (Bootstrapping Security)
Consistent with our GitOps philosophy, we didn't just helm install these tools manually. We created a dedicated Security Stack application in ArgoCD to manage them.
File: kubernetes/bootstrap/security.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: security-stack
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/anantvaid/otel-platform-infra.git
targetRevision: main
path: kubernetes/platform/security
directory:
recurse: true # Automatically picks up both Kyverno and Istio folders
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Note: Both Kyverno and Istio folders are hidden for brevity, you may check the code here.

Step 2: Governance & Policy Enforcement (Kyverno)
We installed Kyverno to act as the cluster's compliance engine. Our primary goal was simple: No Root Users.

The Policy
We defined a ClusterPolicy that validates every Pod request. If a container tries to run as UID 0 (Root), Kyverno rejects it immediately.
File: kubernetes/platform/security/kyverno/disallow-root.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-root-user
annotations:
policies.kyverno.io/title: Disallow Root User
policies.kyverno.io/category: Pod Security Standards (Restricted)
spec:
validationFailureAction: Enforce
background: true
rules:
- name: check-runasnonroot
match:
any:
- resources:
kinds:
- Pod
# Critical: Don't block system components!
exclude:
any:
- resources:
namespaces:
- kube-system
- kyverno
- argocd
- istio-system
validate:
message: "Running as root is forbidden. Please set runAsNonRoot: true."
pattern:
spec:
securityContext:
runAsNonRoot: true
containers:
- =(securityContext):
=(runAsNonRoot): true
kubectl apply -f kubernetes/platform/security/kyverno/disallow-root.yaml

The "Hacker" Test
To verify the policy was active, we attempted to deploy a standard Nginx pod, which defaults to root.
kubectl run hacker-root --image=nginx --restart=Never
Result: BLOCKED.

This proved that security compliance is now code.
Step 3: Network Security (Istio Ambient Mesh)
For network security, we deployed the modern Ambient Mode of Istio. Unlike the "old" Istio which injected heavy Envoy sidecars into every pod, Ambient uses a shared Ztunnel (Zero Trust Tunnel) on every node to handle mTLS encryption.

Battle Scar: The GKE CNI Path
This was the hardest part of Phase 5. When we first deployed Istio, the istio-cni-node pods kept crashing with read-only file system errors.
The Root Cause: GKE (especially Autopilot/Standard) uses non-standard paths for CNI binaries compared to vanilla Kubernetes. The Fix: We had to patch the Helm values to point specifically to GKE's directory structure.
File: kubernetes/platform/security/istio/values.yaml
base:
global:
istioNamespace: istio-system
cni:
profile: ambient
ambient:
enabled: true
# The GKE specific fix 👇
cniBinDir: /home/kubernetes/bin
cniConfDir: /etc/cni/net.d
ztunnel:
global:
istioNamespace: istio-system
Once applied, the Ztunnels came up healthy, creating an invisible mesh across the cluster.

Step 4: Enrollment (Zero Trust in Action)
We deployed our target app, manifest-gen, and enrolled it into the mesh.
A. Adapting the Application
Because Kyverno is watching, we couldn't just deploy a plain container. We had to modify our Deployment to run as a non-root user (UID 1000).
File: kubernetes/apps/manifest-gen/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: manifest-gen
namespace: manifest-gen
spec:
template:
spec:
# Satisfying Kyverno
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: manifest-gen
image: anantvaid4/manifest-generator-api
securityContext:
runAsNonRoot: true
B. The "Invisible" Enrollment
We didn't need to restart the app or inject sidecars. We simply labeled the namespace:
kubectl label namespace manifest-gen istio.io/dataplane-mode=ambient
Instantly, the Ztunnel on the node began intercepting 100% of the traffic entering and leaving that namespace, wrapping it in mTLS.

Step 5: Verification Tests
Test A: The Transparency Test
We ran a curl pod inside the cluster to hit the API. Note that we had to override the security context even for the test pod, or Kyverno would have blocked it!
kubectl run curl-test --image=curlimages/curl --restart=Never --rm -it \
--overrides='{"spec": {"securityContext": {"runAsNonRoot": true, "runAsUser": 1000}}}' \
-- curl -v http://manifest-gen.manifest-gen.svc.cluster.local:8080
Result: HTTP 200 OK. Insight: To the app, it looked like plain HTTP. But on the wire, it was fully encrypted HBONE (HTTP-Based Overlay Network Environment) traffic.

Test B: The "Block" Test
To prove the Ztunnel was actually in charge, we applied a DENY-ALL AuthorizationPolicy.
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: manifest-gen
spec:
action: DENY
rules: []
Result: Recv failure: Connection reset by peer. Why Not 403? Because Ztunnel operates at Layer 4 (TCP). It didn't send a polite HTML error page; it slammed the connection shut. This confirmed that Zero Trust enforcement was active.
The Unexplored Territory: What We Didn't Turn On
We focused on the Foundation: Identity (mTLS) and Compliance (No Root). However, both Kyverno and Istio Ambient offer advanced capabilities that turn a Platform into a Product.
Here is what we left on the table (for now):
1. Kyverno: "Mutation" & "Generation" (Automation)
We used Kyverno purely as a Validator (blocking bad requests). But Kyverno can also be a Builder.
Mutation (The Auto-Corrector): Instead of rejecting a Pod because it's missing a label, Kyverno can inject it automatically.
Generation (Namespace-as-a-Service): Kyverno can watch for a new Namespace creation and automatically generate default resources (NetworkPolicy, ResourceQuota) inside it.
2. Istio Ambient: The "Waypoint" Proxy (Layer 7)
We strictly used the Ztunnel (Layer 4) component. It handles TCP, mTLS, and simple "Allow/Deny" logic. But Ztunnel cannot see inside the packet.
To unlock Layer 7 features (like "Allow GET /products but Block DELETE /products"), Istio Ambient uses a Waypoint Proxy. We skipped this for now to keep the footprint small (Ztunnel is a DaemonSet, Waypoints are per-identity), but it is the next logical step for granular traffic control.
The Takeaway
This phase wasn't just about typing kubectl apply or deploying helm charts. It was about resilience.
Fighting the Zombies: When removing Kyverno to test fresh installs, we hit a "Terminating" namespace loop. We had to surgically remove
finalizersusingkubectl replace --rawto clean the cluster state.Platform Awareness: Debugging the
cniBinDirpaths proved that you can't just copy-paste tutorials. You have to understand the underlying infrastructure (GKE's filesystem) to make open-source tools work.Invisible Security: We proved that developer experience doesn't have to suffer for security. Developers write code; the Platform handles the encryption.
Status: Security Framework Complete. Next Up: Progressive Delivery (Argo Rollouts).




