Executive Summary: Google Anthos provides a unified platform for managing applications across on-premises data centers, Google Cloud, and other cloud providers. This comprehensive guide explores Anthos’s enterprise capabilities, from GKE Enterprise and Config Management to Service Mesh and multi-cluster networking. After implementing hybrid cloud architectures for enterprises with complex compliance and data residency requirements, I’ve found Anthos delivers exceptional value through consistent Kubernetes management, policy-as-code governance, and unified observability across environments. Organizations should leverage Anthos for workload portability, centralized policy management, and gradual cloud migration while implementing proper cluster fleet management and security controls from the start.
Anthos Architecture: Unified Hybrid Cloud Platform
Anthos extends Google’s Kubernetes expertise beyond GCP to any environment. GKE Enterprise provides managed Kubernetes on Google Cloud with advanced features like multi-cluster management, fleet-wide policies, and integrated security. Anthos on VMware runs Kubernetes on existing VMware infrastructure, enabling organizations to modernize applications without immediate cloud migration. Anthos on bare metal eliminates the virtualization layer for maximum performance in edge and high-performance computing scenarios.
Fleet management provides a single pane of glass for clusters across all environments. Register clusters from any provider (GKE, EKS, AKS, on-premises) into a fleet for centralized management. Fleet-scoped resources like namespaces and RBAC policies apply consistently across all member clusters. This abstraction enables platform teams to manage hundreds of clusters with consistent governance while allowing application teams to deploy without environment-specific configurations.
Connect Gateway provides secure, identity-aware access to registered clusters without exposing Kubernetes APIs to the internet. Users authenticate through Google Cloud IAM, and Connect Gateway proxies requests to clusters through an outbound connection from the cluster. This architecture eliminates the need for VPNs or public endpoints while maintaining full kubectl compatibility and audit logging.
Config Management and Policy Governance
Anthos Config Management implements GitOps for fleet-wide configuration. Store Kubernetes manifests, policies, and configurations in Git repositories, and Config Management automatically syncs them to registered clusters. Config Sync handles the synchronization, supporting Kustomize and Helm for environment-specific customization. Changes flow through standard Git workflows—pull requests, reviews, and approvals—providing audit trails and rollback capabilities.
Policy Controller enforces guardrails across the fleet using Open Policy Agent (OPA) Gatekeeper. Define constraints that prevent policy violations before resources are created—block privileged containers, require resource limits, enforce naming conventions, or mandate specific labels. Policy bundles provide pre-built policies for common compliance requirements (CIS benchmarks, PCI-DSS, HIPAA). Audit mode allows testing policies without enforcement, identifying violations before enabling blocking.
Config Controller provides a managed control plane for provisioning GCP resources using Kubernetes Resource Model (KRM). Define GCP infrastructure (VPCs, Cloud SQL, GKE clusters) as Kubernetes manifests, and Config Controller reconciles the desired state with actual infrastructure. This enables infrastructure-as-code with the same GitOps workflows used for application configuration, unifying infrastructure and application management.
Production Terraform Configuration
Here’s a comprehensive Terraform configuration for Anthos with fleet management and Config Management:
# Anthos Enterprise Configuration
terraform {
required_version = ">= 1.5.0"
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
}
variable "project_id" { type = string }
variable "region" { type = string, default = "us-central1" }
# Enable required APIs
resource "google_project_service" "apis" {
for_each = toset([
"anthos.googleapis.com",
"gkehub.googleapis.com",
"anthosconfigmanagement.googleapis.com",
"mesh.googleapis.com",
"connectgateway.googleapis.com",
"gkeconnect.googleapis.com"
])
service = each.value
disable_on_destroy = false
}
# GKE cluster for production workloads
resource "google_container_cluster" "prod" {
name = "prod-cluster"
location = var.region
# Enable Autopilot for managed node pools
enable_autopilot = true
# Fleet registration
fleet {
project = var.project_id
}
# Binary Authorization
binary_authorization {
evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE"
}
# Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
# Security configuration
master_authorized_networks_config {
cidr_blocks {
cidr_block = "10.0.0.0/8"
display_name = "Internal"
}
}
# Logging and monitoring
logging_config {
enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
}
monitoring_config {
enable_components = ["SYSTEM_COMPONENTS"]
managed_prometheus {
enabled = true
}
}
}
# VPC for clusters
resource "google_compute_network" "vpc" {
name = "anthos-vpc"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnet" {
name = "anthos-subnet"
ip_cidr_range = "10.0.0.0/20"
region = var.region
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.1.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.2.0.0/20"
}
private_ip_google_access = true
}
# Fleet membership for external cluster (example)
resource "google_gke_hub_membership" "external" {
membership_id = "external-cluster"
endpoint {
gke_cluster {
resource_link = "//container.googleapis.com/projects/${var.project_id}/locations/${var.region}/clusters/external"
}
}
authority {
issuer = "https://container.googleapis.com/v1/projects/${var.project_id}/locations/${var.region}/clusters/external"
}
}
# Config Management feature
resource "google_gke_hub_feature" "configmanagement" {
name = "configmanagement"
location = "global"
depends_on = [google_project_service.apis["anthosconfigmanagement.googleapis.com"]]
}
# Config Management for prod cluster
resource "google_gke_hub_feature_membership" "prod_config" {
location = "global"
feature = google_gke_hub_feature.configmanagement.name
membership = google_gke_hub_membership.prod.membership_id
configmanagement {
version = "1.17.0"
config_sync {
source_format = "unstructured"
git {
sync_repo = "https://github.com/your-org/config-repo"
sync_branch = "main"
policy_dir = "clusters/prod"
secret_type = "token"
}
}
policy_controller {
enabled = true
template_library_installed = true
referential_rules_enabled = true
monitoring {
backends = ["PROMETHEUS"]
}
}
}
}
resource "google_gke_hub_membership" "prod" {
membership_id = "prod-cluster"
endpoint {
gke_cluster {
resource_link = google_container_cluster.prod.id
}
}
}
# Service Mesh feature
resource "google_gke_hub_feature" "servicemesh" {
name = "servicemesh"
location = "global"
depends_on = [google_project_service.apis["mesh.googleapis.com"]]
}
# Service Mesh for prod cluster
resource "google_gke_hub_feature_membership" "prod_mesh" {
location = "global"
feature = google_gke_hub_feature.servicemesh.name
membership = google_gke_hub_membership.prod.membership_id
mesh {
management = "MANAGEMENT_AUTOMATIC"
}
}
# Multi-cluster ingress
resource "google_gke_hub_feature" "multiclusteringress" {
name = "multiclusteringress"
location = "global"
spec {
multiclusteringress {
config_membership = google_gke_hub_membership.prod.id
}
}
}
# Fleet namespace for team isolation
resource "google_gke_hub_namespace" "team_a" {
scope_namespace_id = "team-a"
scope_id = google_gke_hub_scope.teams.scope_id
scope = google_gke_hub_scope.teams.name
}
resource "google_gke_hub_scope" "teams" {
scope_id = "teams"
}
# Fleet RBAC for team access
resource "google_gke_hub_scope_rbac_role_binding" "team_a_admin" {
scope_rbac_role_binding_id = "team-a-admin"
scope_id = google_gke_hub_scope.teams.scope_id
role {
predefined_role = "ADMIN"
}
group = "team-a-admins@example.com"
}
Python SDK for Fleet Management
This Python implementation demonstrates enterprise patterns for Anthos fleet management and policy compliance:
"""Anthos Fleet Manager - Enterprise Python Implementation"""
from dataclasses import dataclass
from typing import List, Dict, Optional
from google.cloud import gke_hub_v1
from google.cloud import container_v1
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class ClusterInfo:
name: str
location: str
membership_id: str
state: str
config_sync_status: str
policy_controller_status: str
@dataclass
class PolicyViolation:
cluster: str
namespace: str
resource_type: str
resource_name: str
constraint: str
message: str
class AnthosFleetManager:
"""Enterprise Anthos fleet management."""
def __init__(self, project_id: str):
self.project_id = project_id
self.hub_client = gke_hub_v1.GkeHubClient()
self.container_client = container_v1.ClusterManagerClient()
self.parent = f"projects/{project_id}/locations/global"
def list_fleet_members(self) -> List[ClusterInfo]:
"""List all clusters in the fleet."""
request = gke_hub_v1.ListMembershipsRequest(parent=self.parent)
clusters = []
for membership in self.hub_client.list_memberships(request=request):
# Get config management status
config_sync = "Unknown"
policy_controller = "Unknown"
if membership.state:
state = membership.state.code.name
else:
state = "Unknown"
clusters.append(ClusterInfo(
name=membership.name.split("/")[-1],
location=membership.endpoint.gke_cluster.resource_link.split("/")[5] if membership.endpoint.gke_cluster else "external",
membership_id=membership.name,
state=state,
config_sync_status=config_sync,
policy_controller_status=policy_controller
))
return clusters
def register_cluster(self, cluster_name: str,
cluster_location: str) -> str:
"""Register a GKE cluster to the fleet."""
membership = gke_hub_v1.Membership(
endpoint=gke_hub_v1.MembershipEndpoint(
gke_cluster=gke_hub_v1.GkeCluster(
resource_link=f"//container.googleapis.com/projects/{self.project_id}/locations/{cluster_location}/clusters/{cluster_name}"
)
)
)
request = gke_hub_v1.CreateMembershipRequest(
parent=self.parent,
membership_id=cluster_name,
resource=membership
)
operation = self.hub_client.create_membership(request=request)
result = operation.result()
logger.info(f"Registered cluster: {cluster_name}")
return result.name
def unregister_cluster(self, membership_id: str) -> bool:
"""Unregister a cluster from the fleet."""
try:
request = gke_hub_v1.DeleteMembershipRequest(
name=f"{self.parent}/memberships/{membership_id}"
)
operation = self.hub_client.delete_membership(request=request)
operation.result()
logger.info(f"Unregistered cluster: {membership_id}")
return True
except Exception as e:
logger.error(f"Failed to unregister cluster: {e}")
return False
def get_fleet_health(self) -> Dict:
"""Get overall fleet health status."""
clusters = self.list_fleet_members()
health = {
"total_clusters": len(clusters),
"healthy": 0,
"degraded": 0,
"unhealthy": 0,
"clusters": []
}
for cluster in clusters:
status = "healthy"
if cluster.state != "READY":
status = "unhealthy"
elif cluster.config_sync_status == "ERROR":
status = "degraded"
health["clusters"].append({
"name": cluster.name,
"status": status,
"state": cluster.state
})
health[status] += 1
return health
def sync_config(self, membership_id: str) -> bool:
"""Trigger config sync for a cluster."""
# This would typically interact with Config Sync API
# For now, we'll log the action
logger.info(f"Triggering config sync for: {membership_id}")
return True
def get_policy_violations(self) -> List[PolicyViolation]:
"""Get policy violations across the fleet."""
# This would query Policy Controller audit logs
# Placeholder implementation
violations = []
# In production, query Cloud Logging for policy violations
# filter = 'resource.type="k8s_cluster" AND jsonPayload.kind="ConstraintViolation"'
return violations
def apply_fleet_policy(self, policy_name: str,
policy_spec: Dict) -> bool:
"""Apply a policy across the fleet."""
# This would create a constraint template and constraint
# that Config Management syncs to all clusters
logger.info(f"Applying fleet policy: {policy_name}")
return True
def get_workload_distribution(self) -> Dict:
"""Get workload distribution across clusters."""
clusters = self.list_fleet_members()
distribution = {}
for cluster in clusters:
# In production, query each cluster for workload counts
distribution[cluster.name] = {
"deployments": 0,
"pods": 0,
"services": 0
}
return distribution
class ConfigSyncManager:
"""Manage Config Sync across fleet."""
def __init__(self, project_id: str):
self.project_id = project_id
self.hub_client = gke_hub_v1.GkeHubClient()
def get_sync_status(self, membership_id: str) -> Dict:
"""Get Config Sync status for a cluster."""
# Query the feature membership for sync status
return {
"synced": True,
"last_sync": datetime.utcnow().isoformat(),
"commit": "abc123",
"errors": []
}
def force_sync(self, membership_id: str) -> bool:
"""Force a sync for a specific cluster."""
logger.info(f"Forcing sync for: {membership_id}")
return True
Cost Optimization and Best Practices
Anthos pricing includes per-vCPU charges for managed clusters and features. GKE Enterprise provides the full Anthos feature set for GKE clusters at a per-cluster fee. For on-premises and multi-cloud deployments, Anthos charges per vCPU across all registered clusters. Optimize costs by right-sizing clusters and using Autopilot mode where appropriate to eliminate over-provisioning.
Fleet architecture significantly impacts operational efficiency. Group clusters by environment (dev, staging, prod) or by team for appropriate policy inheritance. Use fleet namespaces for multi-tenancy rather than creating separate clusters for each team. Implement cluster templates to ensure consistent configuration across new clusters.
Config Management reduces operational overhead but requires investment in GitOps practices. Start with a simple repository structure and evolve as needs grow. Use Kustomize overlays for environment-specific configuration rather than duplicating manifests. Implement policy constraints in audit mode first, then gradually enable enforcement after addressing existing violations.

Key Takeaways and Best Practices
Anthos provides a unified platform for managing Kubernetes across any environment with consistent policies and observability. Use fleet management to centralize cluster operations while maintaining environment-specific configurations through Config Management. Implement Policy Controller to enforce security and compliance guardrails across all clusters.
Start with GKE Enterprise for cloud-native workloads, then extend to on-premises or multi-cloud as requirements dictate. The Terraform and Python examples provided here establish patterns for production-ready Anthos deployments that scale from single clusters to enterprise-wide hybrid cloud architectures while maintaining security and operational efficiency.
Discover more from Code, Cloud & Context
Subscribe to get the latest posts sent to your email.