Back to Blog

Multi-Cloud Kubernetes: AWS, GCP, Azure Deployment Guide

August 2, 2024
14 min read

Don't put all your eggs in one cloud basket. Multi-cloud Kubernetes deployments protect against outages, prevent vendor lock-in, and let you use the best features from each provider. Here's how to do it right.

Why Go Multi-Cloud?

Avoid Vendor Lock-in

Negotiate better pricing and terms

Geographic Coverage

Deploy closer to your users worldwide

Disaster Recovery

Survive cloud-wide outages

Best-of-Breed Services

Use each cloud's strengths

Multi-Cloud Architecture

AWS

Production Workloads

EKS clusters in us-east-1, eu-west-1

GCP

Data Processing & Analytics

GKE clusters with BigQuery integration

Azure

Enterprise Integration

AKS clusters with Active Directory

Step 1: Design Cloud-Agnostic Architecture

Start with portable configurations that work everywhere:

# base/deployment.yaml - Works on any cloud
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        
      # Cloud-agnostic storage
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: app-data
          
---
# StorageClass abstraction
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs  # Changed per cloud
parameters:
  type: gp3  # AWS: gp3, GCP: pd-ssd, Azure: Premium_LRS

πŸ’‘ Pro Tip: Use Kustomize overlays or Helm values to handle cloud-specific configurations while keeping base manifests portable.

Step 2: Set Up Unified Management

Deploy a centralized control plane to manage all clusters:

# Install Rancher for multi-cloud management
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
kubectl create namespace cattle-system

helm install rancher rancher-latest/rancher   --namespace cattle-system   --set hostname=rancher.company.com   --set bootstrapPassword=admin   --set ingress.tls.source=letsEncrypt

# Import existing clusters
rancher cluster import aws-prod-cluster
rancher cluster import gcp-analytics-cluster
rancher cluster import azure-enterprise-cluster

Cluster Federation Setup

# Federate services across clouds
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: app
  namespace: production
spec:
  template:
    spec:
      replicas: 6
  placement:
    clusters:
    - name: aws-us-east-1
    - name: gcp-us-central1
    - name: azure-eastus
  overrides:
  - clusterName: aws-us-east-1
    clusterOverrides:
    - path: "/spec/replicas"
      value: 3
  - clusterName: gcp-us-central1
    clusterOverrides:
    - path: "/spec/replicas"
      value: 2

Step 3: Configure Cross-Cloud Networking

Set up secure connectivity between clouds:

# Service mesh for cross-cloud communication
# Install Istio with multi-cluster configuration

# Primary cluster (AWS)
istioctl install --set values.pilot.env.EXTERNAL_ISTIOD=true

# Remote clusters (GCP, Azure)
export DISCOVERY_ADDRESS=$(kubectl   -n istio-system get svc istio-eastwestgateway   -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

istioctl install   --set values.global.remotePilotAddress=$DISCOVERY_ADDRESS   --set values.pilot.env.EXTERNAL_ISTIOD=true

# Enable cross-cluster service discovery
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cross-network-gateway
  namespace: istio-system
spec:
  selector:
    istio: eastwestgateway
  servers:
  - port:
      number: 15443
      name: tls
      protocol: TLS
    tls:
      mode: ISTIO_MUTUAL
    hosts:
    - "*.local"
EOF

Network Architecture

VPN Connectivity

Site-to-site VPN between VPCs/VNets for secure communication

Service Mesh

Istio or Linkerd for cross-cluster service discovery and mTLS

Global Load Balancing

Route traffic to nearest healthy cluster

Step 4: Implement Data Replication

Keep data synchronized across clouds:

# PostgreSQL cross-cloud replication with Patroni
apiVersion: v1
kind: ConfigMap
metadata:
  name: patroni-config
data:
  patroni.yml: |
    scope: postgres-cluster
    namespace: /db/
    name: postgres-{POD_NAME}
    
    bootstrap:
      dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        max_wal_senders: 10
        max_replication_slots: 10
        
    # Cross-cloud standby configuration
    standby_cluster:
      host: postgres-primary.aws.company.com
      port: 5432
      primary_slot_name: cloud_replica

Cross-Cloud Data Sync Options

Database Replication

β€’ PostgreSQL: Streaming replication with Patroni

β€’ MySQL: Group replication or Galera Cluster

β€’ MongoDB: Replica sets across regions

β€’ CockroachDB: Built-in multi-region support

Object Storage Sync

β€’ Rclone for S3 ↔ GCS ↔ Azure Blob sync

β€’ MinIO for unified object storage API

β€’ Native replication (S3 Cross-Region Replication)

Application-Level Sync

β€’ Apache Kafka with MirrorMaker 2

β€’ Redis with active-active replication

β€’ Custom CDC (Change Data Capture) pipelines

Step 5: Test Disaster Recovery

Implement and test failover procedures:

# Automated failover with health checks
apiVersion: v1
kind: ConfigMap
metadata:
  name: failover-script
data:
  failover.sh: |
    #!/bin/bash
    
    # Check primary cluster health
    PRIMARY_HEALTH=$(kubectl --context=aws-prod get nodes -o json |       jq -r '.items[].status.conditions[] |       select(.type=="Ready") | .status' | grep -c True)
    
    if [ $PRIMARY_HEALTH -lt 3 ]; then
      echo "Primary cluster unhealthy, initiating failover..."
      
      # Update DNS to point to secondary
      aws route53 change-resource-record-sets         --hosted-zone-id Z123456         --change-batch '{
          "Changes": [{
            "Action": "UPSERT",
            "ResourceRecordSet": {
              "Name": "api.company.com",
              "Type": "A",
              "AliasTarget": {
                "HostedZoneId": "Z789012",
                "DNSName": "gcp-lb.company.com"
              }
            }
          }]
        }'
      
      # Scale up secondary cluster
      kubectl --context=gcp-prod scale deployment app --replicas=6
      
      # Notify team
      slack-notify "Failover completed to GCP cluster"
    fi

Multi-Cloud Cost Optimization

Spot/Preemptible Instances

Use for non-critical workloads

-70%

Reserved Instances

Commit to baseline capacity

-40%

Cloud Arbitrage

Run workloads where cheapest

-30%

Common Multi-Cloud Patterns

Active-Active

Run full stack in multiple clouds simultaneously:

  • β€’ Best for: Maximum availability
  • β€’ Challenge: Data consistency
  • β€’ Cost: Highest (full redundancy)

Active-Passive

Primary cloud with standby in secondary:

  • β€’ Best for: Cost-effective DR
  • β€’ Challenge: Keeping standby updated
  • β€’ Cost: Moderate (partial redundancy)

Cloud Bursting

Overflow to secondary clouds during peaks:

  • β€’ Best for: Variable workloads
  • β€’ Challenge: Fast scaling
  • β€’ Cost: Lowest (pay for burst only)

Essential Multi-Cloud Tools

Management Platforms

  • β€’ Rancher - Multi-cluster management
  • β€’ Anthos - Google's hybrid platform
  • β€’ Azure Arc - Microsoft's hybrid solution

Networking

  • β€’ Istio - Service mesh
  • β€’ Cilium - eBPF networking
  • β€’ Submariner - Multi-cluster networking

Storage & Data

  • β€’ Portworx - Cross-cloud storage
  • β€’ Kasten K10 - Backup & DR
  • β€’ Stork - Storage orchestration

Observability

  • β€’ Datadog - Unified monitoring
  • β€’ Grafana Cloud - Multi-cloud metrics
  • β€’ Elastic Cloud - Centralized logging

Multi-Cloud Benefits

βœ“ 99.99% Availability

Survive cloud-wide outages

βœ“ 40% Cost Reduction

Use cheapest resources from each cloud

βœ“ No Vendor Lock-in

Negotiate from position of strength

βœ“ Global Performance

Deploy anywhere your users are

Simplify multi-cloud Kubernetes with KTL.AI

KTL.AI provides unified management for AWS, GCP, and Azure Kubernetes clusters. Deploy anywhere, manage from one place, with built-in cost optimization and disaster recovery.