Multi-Cloud Kubernetes: AWS, GCP, Azure Deployment Guide
Don't put all your eggs in one cloud basket. Multi-cloud Kubernetes deployments protect against outages, prevent vendor lock-in, and let you use the best features from each provider. Here's how to do it right.
Why Go Multi-Cloud?
Avoid Vendor Lock-in
Negotiate better pricing and terms
Geographic Coverage
Deploy closer to your users worldwide
Disaster Recovery
Survive cloud-wide outages
Best-of-Breed Services
Use each cloud's strengths
Multi-Cloud Architecture
Production Workloads
EKS clusters in us-east-1, eu-west-1
Data Processing & Analytics
GKE clusters with BigQuery integration
Enterprise Integration
AKS clusters with Active Directory
Step 1: Design Cloud-Agnostic Architecture
Start with portable configurations that work everywhere:
# base/deployment.yaml - Works on any cloud
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Cloud-agnostic storage
volumes:
- name: data
persistentVolumeClaim:
claimName: app-data
---
# StorageClass abstraction
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs # Changed per cloud
parameters:
type: gp3 # AWS: gp3, GCP: pd-ssd, Azure: Premium_LRSπ‘ Pro Tip: Use Kustomize overlays or Helm values to handle cloud-specific configurations while keeping base manifests portable.
Step 2: Set Up Unified Management
Deploy a centralized control plane to manage all clusters:
# Install Rancher for multi-cloud management
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
kubectl create namespace cattle-system
helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancher.company.com --set bootstrapPassword=admin --set ingress.tls.source=letsEncrypt
# Import existing clusters
rancher cluster import aws-prod-cluster
rancher cluster import gcp-analytics-cluster
rancher cluster import azure-enterprise-clusterCluster Federation Setup
# Federate services across clouds
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: app
namespace: production
spec:
template:
spec:
replicas: 6
placement:
clusters:
- name: aws-us-east-1
- name: gcp-us-central1
- name: azure-eastus
overrides:
- clusterName: aws-us-east-1
clusterOverrides:
- path: "/spec/replicas"
value: 3
- clusterName: gcp-us-central1
clusterOverrides:
- path: "/spec/replicas"
value: 2Step 3: Configure Cross-Cloud Networking
Set up secure connectivity between clouds:
# Service mesh for cross-cloud communication
# Install Istio with multi-cluster configuration
# Primary cluster (AWS)
istioctl install --set values.pilot.env.EXTERNAL_ISTIOD=true
# Remote clusters (GCP, Azure)
export DISCOVERY_ADDRESS=$(kubectl -n istio-system get svc istio-eastwestgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
istioctl install --set values.global.remotePilotAddress=$DISCOVERY_ADDRESS --set values.pilot.env.EXTERNAL_ISTIOD=true
# Enable cross-cluster service discovery
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cross-network-gateway
namespace: istio-system
spec:
selector:
istio: eastwestgateway
servers:
- port:
number: 15443
name: tls
protocol: TLS
tls:
mode: ISTIO_MUTUAL
hosts:
- "*.local"
EOFNetwork Architecture
VPN Connectivity
Site-to-site VPN between VPCs/VNets for secure communication
Service Mesh
Istio or Linkerd for cross-cluster service discovery and mTLS
Global Load Balancing
Route traffic to nearest healthy cluster
Step 4: Implement Data Replication
Keep data synchronized across clouds:
# PostgreSQL cross-cloud replication with Patroni
apiVersion: v1
kind: ConfigMap
metadata:
name: patroni-config
data:
patroni.yml: |
scope: postgres-cluster
namespace: /db/
name: postgres-{POD_NAME}
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
hot_standby: "on"
max_wal_senders: 10
max_replication_slots: 10
# Cross-cloud standby configuration
standby_cluster:
host: postgres-primary.aws.company.com
port: 5432
primary_slot_name: cloud_replicaCross-Cloud Data Sync Options
Database Replication
β’ PostgreSQL: Streaming replication with Patroni
β’ MySQL: Group replication or Galera Cluster
β’ MongoDB: Replica sets across regions
β’ CockroachDB: Built-in multi-region support
Object Storage Sync
β’ Rclone for S3 β GCS β Azure Blob sync
β’ MinIO for unified object storage API
β’ Native replication (S3 Cross-Region Replication)
Application-Level Sync
β’ Apache Kafka with MirrorMaker 2
β’ Redis with active-active replication
β’ Custom CDC (Change Data Capture) pipelines
Step 5: Test Disaster Recovery
Implement and test failover procedures:
# Automated failover with health checks
apiVersion: v1
kind: ConfigMap
metadata:
name: failover-script
data:
failover.sh: |
#!/bin/bash
# Check primary cluster health
PRIMARY_HEALTH=$(kubectl --context=aws-prod get nodes -o json | jq -r '.items[].status.conditions[] | select(.type=="Ready") | .status' | grep -c True)
if [ $PRIMARY_HEALTH -lt 3 ]; then
echo "Primary cluster unhealthy, initiating failover..."
# Update DNS to point to secondary
aws route53 change-resource-record-sets --hosted-zone-id Z123456 --change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "api.company.com",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "Z789012",
"DNSName": "gcp-lb.company.com"
}
}
}]
}'
# Scale up secondary cluster
kubectl --context=gcp-prod scale deployment app --replicas=6
# Notify team
slack-notify "Failover completed to GCP cluster"
fiMulti-Cloud Cost Optimization
Spot/Preemptible Instances
Use for non-critical workloads
-70%
Reserved Instances
Commit to baseline capacity
-40%
Cloud Arbitrage
Run workloads where cheapest
-30%
Common Multi-Cloud Patterns
Active-Active
Run full stack in multiple clouds simultaneously:
- β’ Best for: Maximum availability
- β’ Challenge: Data consistency
- β’ Cost: Highest (full redundancy)
Active-Passive
Primary cloud with standby in secondary:
- β’ Best for: Cost-effective DR
- β’ Challenge: Keeping standby updated
- β’ Cost: Moderate (partial redundancy)
Cloud Bursting
Overflow to secondary clouds during peaks:
- β’ Best for: Variable workloads
- β’ Challenge: Fast scaling
- β’ Cost: Lowest (pay for burst only)
Essential Multi-Cloud Tools
Management Platforms
- β’ Rancher - Multi-cluster management
- β’ Anthos - Google's hybrid platform
- β’ Azure Arc - Microsoft's hybrid solution
Networking
- β’ Istio - Service mesh
- β’ Cilium - eBPF networking
- β’ Submariner - Multi-cluster networking
Storage & Data
- β’ Portworx - Cross-cloud storage
- β’ Kasten K10 - Backup & DR
- β’ Stork - Storage orchestration
Observability
- β’ Datadog - Unified monitoring
- β’ Grafana Cloud - Multi-cloud metrics
- β’ Elastic Cloud - Centralized logging
Multi-Cloud Benefits
β 99.99% Availability
Survive cloud-wide outages
β 40% Cost Reduction
Use cheapest resources from each cloud
β No Vendor Lock-in
Negotiate from position of strength
β Global Performance
Deploy anywhere your users are
Simplify multi-cloud Kubernetes with KTL.AI
KTL.AI provides unified management for AWS, GCP, and Azure Kubernetes clusters. Deploy anywhere, manage from one place, with built-in cost optimization and disaster recovery.