Skip to main content

Command Palette

Search for a command to run...

Zero Trust Security in Kubernetes: Kyverno & Istio Ambient

Published
6 min read
Zero Trust Security in Kubernetes: Kyverno & Istio Ambient
A

An Aspiring DevOps Engineer passionate about automation, CI/CD, and cloud technologies. On a journey to simplify and optimize development workflows.

Subtitle: How we transformed a GKE cluster from a "Development Playground" into a "Zero Trust Fortress" using Policy-as-Code and Sidecar-less Mesh.

Introduction: The Missing Layer

Welcome back to the Building a Production-Grade SRE Platform on Kubernetes series.

In the previous posts, we built a platform that works:

But just working isn't enough. In a shared Kubernetes environment, a single bad deployment, like a container running as root can compromise the entire node. In this post, we stop trusting developers implicitly and start enforcing rules.

We implemented strict enforcement at two critical layers:

  1. Admission Control: Blocking insecure configurations before they even enter the cluster.

  2. Network Security: Encrypting traffic after it enters, with zero code changes.


The Tech Stack

  • Governance: Kyverno (Kubernetes Native Policy Engine)

  • Service Mesh: Istio Ambient Mesh (Sidecar-less Architecture)

  • Data Plane: Ztunnel (Rust-based Zero Trust Tunnel)

  • Target App: manifest-generator (Go API)


Step 1: The Foundation (Bootstrapping Security)

Consistent with our GitOps philosophy, we didn't just helm install these tools manually. We created a dedicated Security Stack application in ArgoCD to manage them.

File: kubernetes/bootstrap/security.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: security-stack
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/anantvaid/otel-platform-infra.git
    targetRevision: main
    path: kubernetes/platform/security
    directory:
      recurse: true # Automatically picks up both Kyverno and Istio folders
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Note: Both Kyverno and Istio folders are hidden for brevity, you may check the code here.


Step 2: Governance & Policy Enforcement (Kyverno)

We installed Kyverno to act as the cluster's compliance engine. Our primary goal was simple: No Root Users.

The Policy

We defined a ClusterPolicy that validates every Pod request. If a container tries to run as UID 0 (Root), Kyverno rejects it immediately.

File: kubernetes/platform/security/kyverno/disallow-root.yaml

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-root-user
  annotations:
    policies.kyverno.io/title: Disallow Root User
    policies.kyverno.io/category: Pod Security Standards (Restricted)
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: check-runasnonroot
      match:
        any:
        - resources:
            kinds:
              - Pod
      # Critical: Don't block system components!
      exclude:
        any:
        - resources:
            namespaces:
              - kube-system
              - kyverno
              - argocd
              - istio-system
      validate:
        message: "Running as root is forbidden. Please set runAsNonRoot: true."
        pattern:
          spec:
            securityContext:
              runAsNonRoot: true
            containers:
              - =(securityContext):
                  =(runAsNonRoot): true
kubectl apply -f kubernetes/platform/security/kyverno/disallow-root.yaml

The "Hacker" Test

To verify the policy was active, we attempted to deploy a standard Nginx pod, which defaults to root.

kubectl run hacker-root --image=nginx --restart=Never

Result: BLOCKED.

This proved that security compliance is now code.


Step 3: Network Security (Istio Ambient Mesh)

For network security, we deployed the modern Ambient Mode of Istio. Unlike the "old" Istio which injected heavy Envoy sidecars into every pod, Ambient uses a shared Ztunnel (Zero Trust Tunnel) on every node to handle mTLS encryption.

Battle Scar: The GKE CNI Path

This was the hardest part of Phase 5. When we first deployed Istio, the istio-cni-node pods kept crashing with read-only file system errors.

The Root Cause: GKE (especially Autopilot/Standard) uses non-standard paths for CNI binaries compared to vanilla Kubernetes. The Fix: We had to patch the Helm values to point specifically to GKE's directory structure.

File: kubernetes/platform/security/istio/values.yaml

base:
  global:
    istioNamespace: istio-system

cni:
  profile: ambient
  ambient:
    enabled: true
  # The GKE specific fix 👇
  cniBinDir: /home/kubernetes/bin
  cniConfDir: /etc/cni/net.d

ztunnel:
  global:
    istioNamespace: istio-system

Once applied, the Ztunnels came up healthy, creating an invisible mesh across the cluster.


Step 4: Enrollment (Zero Trust in Action)

We deployed our target app, manifest-gen, and enrolled it into the mesh.

A. Adapting the Application

Because Kyverno is watching, we couldn't just deploy a plain container. We had to modify our Deployment to run as a non-root user (UID 1000).

File: kubernetes/apps/manifest-gen/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: manifest-gen
  namespace: manifest-gen
spec:
  template:
    spec:
      # Satisfying Kyverno
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
      containers:
      - name: manifest-gen
        image: anantvaid4/manifest-generator-api 
        securityContext:
          runAsNonRoot: true

B. The "Invisible" Enrollment

We didn't need to restart the app or inject sidecars. We simply labeled the namespace:

kubectl label namespace manifest-gen istio.io/dataplane-mode=ambient

Instantly, the Ztunnel on the node began intercepting 100% of the traffic entering and leaving that namespace, wrapping it in mTLS.


Step 5: Verification Tests

Test A: The Transparency Test

We ran a curl pod inside the cluster to hit the API. Note that we had to override the security context even for the test pod, or Kyverno would have blocked it!

kubectl run curl-test --image=curlimages/curl --restart=Never --rm -it \
  --overrides='{"spec": {"securityContext": {"runAsNonRoot": true, "runAsUser": 1000}}}' \
  -- curl -v http://manifest-gen.manifest-gen.svc.cluster.local:8080

Result: HTTP 200 OK. Insight: To the app, it looked like plain HTTP. But on the wire, it was fully encrypted HBONE (HTTP-Based Overlay Network Environment) traffic.

Test B: The "Block" Test

To prove the Ztunnel was actually in charge, we applied a DENY-ALL AuthorizationPolicy.

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: manifest-gen
spec:
  action: DENY
  rules: []

Result: Recv failure: Connection reset by peer. Why Not 403? Because Ztunnel operates at Layer 4 (TCP). It didn't send a polite HTML error page; it slammed the connection shut. This confirmed that Zero Trust enforcement was active.


The Unexplored Territory: What We Didn't Turn On

We focused on the Foundation: Identity (mTLS) and Compliance (No Root). However, both Kyverno and Istio Ambient offer advanced capabilities that turn a Platform into a Product.

Here is what we left on the table (for now):

1. Kyverno: "Mutation" & "Generation" (Automation)

We used Kyverno purely as a Validator (blocking bad requests). But Kyverno can also be a Builder.

  • Mutation (The Auto-Corrector): Instead of rejecting a Pod because it's missing a label, Kyverno can inject it automatically.

  • Generation (Namespace-as-a-Service): Kyverno can watch for a new Namespace creation and automatically generate default resources (NetworkPolicy, ResourceQuota) inside it.

2. Istio Ambient: The "Waypoint" Proxy (Layer 7)

We strictly used the Ztunnel (Layer 4) component. It handles TCP, mTLS, and simple "Allow/Deny" logic. But Ztunnel cannot see inside the packet.

To unlock Layer 7 features (like "Allow GET /products but Block DELETE /products"), Istio Ambient uses a Waypoint Proxy. We skipped this for now to keep the footprint small (Ztunnel is a DaemonSet, Waypoints are per-identity), but it is the next logical step for granular traffic control.


The Takeaway

This phase wasn't just about typing kubectl apply or deploying helm charts. It was about resilience.

  1. Fighting the Zombies: When removing Kyverno to test fresh installs, we hit a "Terminating" namespace loop. We had to surgically remove finalizers using kubectl replace --raw to clean the cluster state.

  2. Platform Awareness: Debugging the cniBinDir paths proved that you can't just copy-paste tutorials. You have to understand the underlying infrastructure (GKE's filesystem) to make open-source tools work.

  3. Invisible Security: We proved that developer experience doesn't have to suffer for security. Developers write code; the Platform handles the encryption.

Status: Security Framework Complete. Next Up: Progressive Delivery (Argo Rollouts).


Code & Resources

Building a Production-Grade SRE Platform on Kubernetes

Part 5 of 8

This series explores how to design and operate a production-grade SRE platform on Kubernetes, covering infrastructure, GitOps, observability, security, SLOs, service mesh, and chaos engineering.

Up next

Progressive Delivery in Kubernetes: Argo Rollouts & Istio

How we moved from Big Bang deployments to controlled Canary releases using traffic splitting and automated analysis.