Skip to main content

Command Palette

Search for a command to run...

Beyond Ingress: Building a "Keyless" Platform with GKE Gateway API

Abandoning legacy patterns for a modern stack: Workload Identity, ArgoCD, and Global Load Balancing.

Published
8 min read
Beyond Ingress: Building a "Keyless" Platform with GKE Gateway API
A

An Aspiring DevOps Engineer passionate about automation, CI/CD, and cloud technologies. On a journey to simplify and optimize development workflows.

In Part 1, we provisioned a cost-effective GKE cluster using Spot Instances and Terraform. But right now, it’s just an empty shell. To make this environment actually usable for developers, we need to solve three fundamental problems:

  1. Identity: How do we access GCP APIs (like Secret Manager) without dangerous JSON keys?

  2. Ingress: How do we expose services to the public using the modern Gateway API?

  3. GitOps: How do we deploy applications without manual kubectl apply?

In this guide, we will implement a "Keyless" architecture using Workload Identity Federation and deploy ArgoCD behind a Google Global Load Balancer.

The Challenge: This setup involves a circular dependency (The "Deadlock") between the Gateway Load Balancer and Cert-Manager. This guide includes the specific workaround to break that loop.


The Architecture

  • Ingress: Kubernetes Gateway API (GKE L7 Global Load Balancer).

  • Identity: GCP Workload Identity (No Service Account Keys).

  • GitOps: ArgoCD (Running in insecure mode to offload SSL to the Gateway).

  • Secrets: External Secrets Operator (ESO) syncing from GCP Secret Manager.


Step 1: Install Platform Binaries (Terraform)

We avoid the hassle of installing Helm charts manually. We will use Terraform to install the required controllers (ArgoCD, External Secrets, Cert-Manager) via Helm.

1. Clone the Repo & Initialize

git clone https://github.com/anantvaid/otel-platform-infra
cd otel-platform-infra/terraform
terraform init

2. Review the Configuration

We are installing ArgoCD with a critical flag: --insecure.

  • Why? To prevent an Infinite Redirect Loop.

  • The Mechanics: The Google Gateway terminates SSL (HTTPS) at the edge and forwards traffic to the ArgoCD backend via plain HTTP. By default, ArgoCD detects this insecure connection and sends a "307 Redirect" back to HTTPS. Since the user is already on HTTPS (talking to the Load Balancer), this causes an infinite loop: User -> LB (HTTP) -> ArgoCD (Redirect) -> User.

  • The Fix: The --insecure flag tells ArgoCD to accept plain HTTP traffic from the Load Balancer without forcing a redirect.

argocd.tf:

resource "helm_release" "argocd" {
  name       = "argocd"
  repository = "https://argoproj.github.io/argo-helm"
  chart      = "argo-cd"
  version    = "9.2.2"
  # ...
  set = [{
    name  = "configs.params.server.insecure"
    value = "true"
  },
  {
    name  = "server.extraArgs[0]"
    value = "--insecure"
  }]
}

We also install External Secrets and bind it to our Google Service Account using Workload Identity annotations.

external_secrets.tf:

resource "helm_release" "external_secrets" {
  # ...
  values = [
    yamlencode({
      serviceAccount = {
        annotations = {
          "iam.gke.io/gcp-service-account" = google_service_account.eso_gsa.email
        }
      }
    })
  ]
}

3. Apply the Infrastructure

terraform apply

4. Connect to the Cluster

Once complete, fetch your credentials (~/.kube/config) to start running kubectl commands.

gcloud container clusters get-credentials sre-portfolio-cluster \
    --zone us-central1-a \
    --project sre-portfolio-platform


Step 2: Verify Access (The "Proof of Life")

With the charts and CRDs installed, we should be able to reach ArgoCD. Before we complicate things with the public internet, let’s verify the installation locally.

1. Get the Initial Password

ArgoCD generates a random password on install. Let's retrieve it:

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo

2. Port-Forward the Service We will forward the ArgoCD service to our localhost to check if the UI is up.

kubectl port-forward svc/argocd-server -n argocd 8080:443

3. Check the UI Navigate to https://localhost:8080 in your browser. (You might get a certificate warning because we are using localhost; this is safe to ignore). Log in with admin and the password you just retrieved.

It’s alive! The application is running.


Step 3: The Gateway "Deadlock" Workaround

The next logical step is to expose ArgoCD to the world. Theoretically, we just apply a Gateway resource and we are done. But as I discovered, it’s not that simple.

We are using Cert-Manager with Let's Encrypt to issue SSL certificates automatically. However, we face a "Chicken and Egg" problem:

  1. The Gateway needs a TLS Secret to start the HTTPS listener.

  2. Cert-Manager needs the Gateway to be UP (on HTTP) to solve the ACME challenge and create that secret.

If you apply everything at once, the Gateway hangs in Pending. We must apply it in stages.

Phase A: The HTTP-Only Boot

1. Create the Manifests

We need to comment out the HTTPS sections initially.

kubernetes/platform/gateway-api/gateway.yaml:

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: external-gateway
  namespace: default
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: http
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: All

  ## COMMENT THIS OUT INITIALLY (Deadlock Prevention)
  # - name: https
  #   protocol: HTTPS
  #   port: 443
  #   tls:
  #     mode: Terminate
  #     certificateRefs:
  #     - name: argocd-tls
  #   allowedRoutes:
  #     namespaces:
  #       from: All

  addresses:
  - type: NamedAddress
    value: sre-portfolio-gateway-ip

kubernetes/platform/gateway-api/argocd-route.yaml:

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: argocd-route
  namespace: argocd
spec:
  parentRefs:
  - name: external-gateway
    namespace: default
    sectionName: http

  ## COMMENT THIS OUT INITIALLY
  # - name: external-gateway
  #   namespace: default
  #   sectionName: https

  hostnames:
  - "argocd.techtalkswithanant.online"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: argocd-server
      port: 80

The Critical Fix: Health Check Policy

Before we apply the Gateway, we must solve a hidden issue. The Google Load Balancer defaults to an HTTP health check. However, ArgoCD often returns redirects or 404s on the root path during startup, causing the Load Balancer to mark the backend as "Unhealthy" (502 Bad Gateway).

To fix this, we define a HealthCheckPolicy that overrides the default behavior. Instead of checking a specific URL path, we force a simple TCP check on the service port. If the port is open, the app is healthy.

argocd-health-policy.yaml

apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: argocd-health-check
  namespace: argocd
spec:
  default:
    checkIntervalSec: 30
    timeoutSec: 10
    healthyThreshold: 1
    unhealthyThreshold: 2
    logConfig:
      enabled: true
    config:
      type: TCP
      tcpHealthCheck:
        port: 8080
  targetRef:
    group: ""
    kind: Service
    name: argocd-server

2. Apply the "Insecure" Layer

Run these commands to provision the Load Balancer on Port 80.

kubectl apply -f kubernetes/platform/gateway-api/gateway.yaml
kubectl apply -f kubernetes/platform/gateway-api/argocd-route.yaml
kubectl apply -f kubernetes/platform/gateway-api/argocd-health-policy.yaml

3. Update DNS

Go to your DNS provider (e.g., Namecheap, Cloudflare) and create an A Record pointing your domain (e.g., argocd.techtalkswithanant.online) to the Static IP provisioned by Terraform (gateway_static_ip as shown below).


Step 4: Issue the Certificate

Now that Port 80 is open and DNS is propagating, Cert-Manager can solve the ACME challenge.

cluster-issuer.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: <email_id@email.com>   # Change this to your email ID
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
    - http01:
        gatewayHTTPRoute:
          parentRefs:
          - name: external-gateway
            namespace: default
            kind: Gateway

certificate.yaml:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: argocd-cert
  namespace: default
spec:
  secretName: argocd-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  commonName: argocd.techtalkswithanant.online
  dnsNames:
  - argocd.techtalkswithanant.online

1. Apply the Issuer and Certificate

kubectl apply -f kubernetes/platform/cert-management/cluster-issuer.yaml
kubectl apply -f kubernetes/platform/cert-management/certificate.yaml

2. Verify Success

Wait 3-5 minutes. Watch the certificate status move from Issuing to True.

kubectl get certificate argocd-cert -n default -w

Once this says True, the secret argocd-tls has been created.


Step 5: Enabling HTTPS (Closing the Loop)

Now that the secret exists, we can fully enable the Gateway.

1. Uncomment Manifests

Go back to gateway.yaml and argocd-route.yaml and uncomment the HTTPS sections we hid in Step 2.

2. Apply the "Secure" Layer

kubectl apply -f kubernetes/platform/gateway-api/gateway.yaml
kubectl apply -f kubernetes/platform/gateway-api/argocd-route.yaml

Wait 2-3 minutes for the Load Balancer to update its frontend configuration. You should now be able to access https://argocd.techtalkswithanant.online with a valid Let's Encrypt padlock!


Step 6: The "Keyless" Secret Store

Finally, let's prove our Identity Federation works. We will fetch a secret from GCP without ever downloading a JSON key.

1. Configure the Trust Bridge

Apply the ClusterSecretStore. This tells the cluster who it is (Workload Identity) and where to get secrets (GCP Project).

kubernetes/platform/secret-management/cluster-secret-store.yaml:

apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: gcp-secret-store
spec:
  provider:
    gcpsm:
      projectID: sre-portfolio-platform
      auth:
        workloadIdentity:
          clusterLocation: us-central1-a
          clusterName: sre-portfolio-cluster
          serviceAccountRef:
            name: external-secrets
            namespace: external-secrets
kubectl apply -f kubernetes/platform/external-secrets/cluster-secret-store.yaml

2. Sync a Secret

Create a secret called db-password in Google Secret Manager. Then apply this manifest to fetch it:

kubectl apply -f kubernetes/platform/secret-management/db-secret.yaml

3. Verify

kubectl get secret my-db-secret -o jsonpath="{.data.password}" | base64 -d; echo

It works! We have bridged the gap between GCP and GKE using purely IAM roles - zero keys involved.


SRE War Stories: The "Gotchas"

This setup wasn't smooth sailing. Here are three specific issues that tripped me up - documenting them so you don't have to suffer.

1. The "Ghost IP" Error

  • Issue: After destroying and recreating the cluster with Terraform, kubectl kept timing out.

  • Root Cause: My local kubeconfig was pointing to the old cluster IP.

  • Fix: Always refresh credentials after a re-apply:

      gcloud container clusters get-credentials sre-portfolio-cluster --zone us-central1-a
    

2. The Infinite Redirect Loop

  • Issue: The site was accessible, but the browser threw ERR_TOO_MANY_REDIRECTS.

  • Root Cause: The Google Load Balancer terminates SSL at the edge (HTTPS) and talks to the backend via HTTP. ArgoCD, by default, detects the HTTP request and sends a "307 Redirect" back to HTTPS, creating an infinite loop.

  • Fix: Configuring ArgoCD with --insecure (in argocd.tf) forces it to accept the plain HTTP traffic from the Load Balancer.

3. The "Unhealthy Upstream" (502 Error)

  • Issue: Even with the redirect fixed, the Load Balancer often marked the backend as Unhealthy, causing sporadic 502 errors.

  • Root Cause: The default GKE health check attempts an HTTP request. If ArgoCD doesn't respond exactly as expected (e.g., auth redirects), the LB assumes the pod is dead.

  • Fix: I applied a custom HealthCheckPolicy to force a simple TCP check. If Port 8080 is open, the pod is healthy.

argocd-health-policy.yaml:

kind: HealthCheckPolicy
spec:
  config:
    type: TCP
    tcpHealthCheck:
      port: 8080

Note: You may notice slight variations in resource names across screenshots, as they were captured during different deployment runs.


What's Next?

We now have a secure, GitOps-enabled platform. In Phase 3, we will tackle Observability - setting up the LGTM Stack (Loki, Grafana, Tempo, proMetheus) to get full visibility into our applications.

You can find my GitHub link here - [Link to GitHub Repo].