Google Vertex AI Fundamentals#

What Is Vertex AI?#

Vertex AI is Google Cloud’s managed AI platform. Anthropic’s Claude models are available as partner models through Vertex AI, giving you a GCP-native API that uses the same IAM, billing, networking, and compliance infrastructure you already use for everything else in Google Cloud.

Analogy for Cloud Foundry practitioners: Same abstraction as Bedrock – developers call an API, the platform team controls networking, access, cost, and compliance underneath. The difference is which cloud’s control plane you’re working with.

Key Properties for Enterprise Deployment#

Managed Inference, Not Model Hosting#

You don’t deploy models or manage GPU clusters. You call the Vertex AI prediction endpoint, and Google handles the rest. Google runs the Claude models in their infrastructure under a contractual relationship with Anthropic.

Data Boundary Guarantees#

  • Customer inputs and outputs are not used to train or improve foundation models
  • Data is not shared with model providers (Anthropic)
  • Data is not stored beyond immediate request processing (unless you explicitly enable logging)
  • Your code/prompts go to Google Cloud, not to Anthropic directly

Inherits the Google Cloud Security Stack#

  • IAM policies control who can invoke which models (roles/aiplatform.user)
  • Cloud Audit Logs log every API call
  • VPC Service Controls keep traffic within a defined security perimeter
  • Private Google Access keeps traffic off the public internet
  • CMEK encryption for data at rest
  • Compliance certifications: SOC 2, ISO 27001, HIPAA, FedRAMP, etc.

Pricing Model#

  • On-demand: Pay-per-token, no upfront commitment
  • Provisioned throughput: Guaranteed capacity (requires regional endpoints, not available on global)
  • Prompt caching: Same pricing structure as direct API – 90% discount on cache reads

How Claude Code Talks to Vertex AI#

Setting CLAUDE_CODE_USE_VERTEX=1 switches Claude Code from calling api.anthropic.com to using the Google Cloud Vertex AI prediction endpoint. The Claude models are the same – same Sonnet, same Opus, same capabilities – but the request path changes:

Without Vertex AI:
  developer laptop -> internet -> api.anthropic.com -> Claude model

With Vertex AI:
  developer laptop -> corporate network -> Vertex AI endpoint -> Claude model (in GCP)

With Vertex AI + VPC Service Controls + Private Google Access:
  developer laptop -> corporate network -> restricted.googleapis.com -> Vertex AI -> Claude
  (nothing touches the public internet at any point)

Environment Variables#

Required#

VariableExamplePurpose
CLAUDE_CODE_USE_VERTEX1Enables Vertex AI integration
CLOUD_ML_REGIONus-east5 or globalSets the endpoint region
ANTHROPIC_VERTEX_PROJECT_IDmy-gcp-projectIdentifies the GCP project

Optional#

VariableExamplePurpose
ANTHROPIC_MODELclaude-opus-4-6Overrides the primary model
ANTHROPIC_SMALL_FAST_MODELclaude-haiku-4-5@20251001Overrides the small/fast model
ANTHROPIC_DEFAULT_HAIKU_MODELclaude-haiku-4-5@20251001Manual Haiku upgrade (Vertex won’t auto-upgrade from 3.5 to 4.5)
CLAUDE_CODE_SKIP_VERTEX_AUTH1Skips Vertex auth (for LiteLLM proxy setups)
ANTHROPIC_VERTEX_BASE_URLURLCustom base URL (for LiteLLM/gateway pass-through)
DISABLE_PROMPT_CACHING1Disables prompt caching

Per-Model Region Overrides (when CLOUD_ML_REGION=global)#

VariableExample
VERTEX_REGION_CLAUDE_3_5_HAIKUus-east5
VERTEX_REGION_CLAUDE_4_0_SONNETus-east5
VERTEX_REGION_CLAUDE_4_0_OPUSeurope-west1

Authentication falls back through: ANTHROPIC_VERTEX_PROJECT_ID -> GCLOUD_PROJECT -> GOOGLE_CLOUD_PROJECT -> GOOGLE_APPLICATION_CREDENTIALS.

Dual-Model Usage#

Claude Code uses two models simultaneously:

  • Primary model (Sonnet or Opus): Heavy reasoning, code generation, analysis
  • Fast model (Haiku): Lightweight tasks – summarization, classification, quick checks

Both must be available in your Vertex AI project. Defaults on Vertex: Sonnet 4.5 (primary) and Haiku 4.5 (fast).

Model ID Format#

Vertex AI uses a different model ID format than Bedrock or the direct API:

ProviderFormatExample
Vertex AIclaude-{tier}-{version}@{date}claude-sonnet-4-5@20250929
Bedrock{region}.anthropic.claude-{tier}-{version}-{date}-v1:0us.anthropic.claude-sonnet-4-5-20250929-v1:0
Direct APIclaude-{tier}-{version}-{date}claude-sonnet-4-5-20250929

Current Vertex AI model IDs:

ModelVertex AI Model ID
Claude Opus 4.6claude-opus-4-6
Claude Opus 4.5claude-opus-4-5@20251101
Claude Sonnet 4.5claude-sonnet-4-5@20250929
Claude Haiku 4.5claude-haiku-4-5@20251001

Region Availability#

Regional Endpoints#

United States: us-east1, us-east4, us-east5, us-central1, us-south1, us-west1, us-west4

Europe: europe-west1, europe-west4

Asia Pacific: asia-southeast1, asia-east1

Global vs. Regional Endpoints#

FeatureGlobalRegional
RoutingDynamic, maximum availabilityFixed to specified region
Data residencyNot guaranteedGuaranteed
PricingStandard10% premium (Sonnet 4.5+ and future models)
Provisioned throughputNot supportedSupported

Gotcha: Not all Claude models support the global endpoint. If a model doesn’t support it and you have CLOUD_ML_REGION=global, you get 404 errors. Use VERTEX_REGION_<MODEL_NAME> overrides for those models.

Known Gotchas#

Model Availability Lag#

Vertex AI may lag behind Anthropic’s direct API when new model versions release. Check Model Garden for current availability. Model access may require explicit request and 24–48 hours for approval.

Haiku Version Pinning#

Claude Code will not automatically upgrade from Haiku 3.5 to Haiku 4.5 on Vertex. Set ANTHROPIC_DEFAULT_HAIKU_MODEL explicitly:

export ANTHROPIC_DEFAULT_HAIKU_MODEL='claude-haiku-4-5@20251001'

Prompt Caching Differences#

Prompt caching on Vertex AI works the same as the direct API, with two nuances:

  • Caches are isolated per GCP project (not shared across projects)
  • Google treats prompt cache hashes as “User Metadata” rather than “Customer Data” – a data governance nuance worth noting for compliance teams

Authentication#

The /login and /logout commands are disabled when using Vertex AI. Authentication is handled entirely through Google Cloud credentials (gcloud auth application-default login).

Environment Variable Presence#

Having Vertex environment variables set (regardless of value) may trigger Vertex AI detection. Unset them completely when switching providers.

VPC Service Controls and Private Google Access#

This is the GCP equivalent of AWS VPC PrivateLink. VPC Service Controls create a security perimeter around your GCP project, and Private Google Access routes traffic to Google APIs without traversing the public internet.

Architecture#

┌──────────────────────────────────────────────────────┐
│  VPC Service Controls Perimeter                      │
│                                                      │
│  ┌─────────────────┐    ┌──────────────────────────┐ │
│  │ Developer VMs / │    │ Vertex AI                │ │
│  │ GKE Cluster     │───>│ (aiplatform.googleapis)  │ │
│  │                 │    │                          │ │
│  └─────────────────┘    └──────────────────────────┘ │
│          │                                           │
│          │  Private Google Access                    │
│          │  (restricted.googleapis.com)│          │  199.36.153.4/30                          │
└──────────│───────────────────────────────────────────┘
    Cloud Interconnect / Partner Interconnect / Cloud VPN
    ┌──────────────┐
    │ Corporate    │
    │ Network      │
    └──────────────┘

Key Terraform Resources#

IMPORTANT: GCP enforces a hard limit of one Access Context Manager access policy per organization. If your org already has an access policy, you MUST use a data source to reference it instead of creating a new one. The example below assumes no existing policy – for enterprise deployments, replace the resource with data "google_access_context_manager_access_policy" "existing" and reference var.access_policy_name.

# Access policy (organization-level, one per org)
# WARNING: This will fail if your org already has an access policy
# Use data source instead: data "google_access_context_manager_access_policy" "existing"
resource "google_access_context_manager_access_policy" "policy" {
  parent = "organizations/${var.org_id}"
  title  = "Claude Code VPC-SC Policy"
}

# Access level: allow traffic from corporate IP ranges
resource "google_access_context_manager_access_level" "corporate" {
  parent = "accessPolicies/${google_access_context_manager_access_policy.policy.name}"
  name   = "accessPolicies/${google_access_context_manager_access_policy.policy.name}/accessLevels/corporate_access"
  title  = "Corporate Network Access"

  basic {
    conditions {
      ip_subnetworks = var.corporate_cidr_ranges
    }
  }
}

# Service perimeter: restrict Vertex AI to the project
resource "google_access_context_manager_service_perimeter" "vertex_perimeter" {
  parent = "accessPolicies/${google_access_context_manager_access_policy.policy.name}"
  name   = "accessPolicies/${google_access_context_manager_access_policy.policy.name}/servicePerimeters/vertex_ai_perimeter"
  title  = "Vertex AI Perimeter"

  status {
    restricted_services = ["aiplatform.googleapis.com"]
    resources           = ["projects/${var.project_number}"]
    access_levels       = [google_access_context_manager_access_level.corporate.name]
  }
}

Private Google Access DNS Configuration#

Configure DNS to resolve *.googleapis.com to restricted.googleapis.com (199.36.153.4/30). This ensures all API traffic – including Vertex AI calls – routes through Google’s private backbone rather than the public internet.

Use restricted.googleapis.com (not private.googleapis.com) when deploying with VPC Service Controls. The restricted domain only allows access to APIs supported by VPC-SC, blocking access to unsupported APIs as a defense-in-depth measure.

Enable Private Google Access on all VPC subnets so VMs and GKE nodes without external IPs can reach Vertex AI.

Dedicated GCP Project#

Anthropic recommends a dedicated GCP project for Claude Code usage. Benefits:

  • Cost attribution: All Vertex AI costs in one project, simple to track
  • IAM boundaries: Separate IAM policies from production workloads
  • Audit scoping: Cloud Audit Logs scoped to the Claude project
  • VPC-SC isolation: Simpler perimeter configuration

Validation Checklist#

  • VPC Service Controls perimeter configured with aiplatform.googleapis.com
  • DNS resolves *.googleapis.com to restricted.googleapis.com (199.36.153.4/30)
  • Private Google Access enabled on all relevant subnets
  • Cloud Audit Logs enabled for Vertex AI API calls
  • Cloud Interconnect / VPN connection verified with latency < 50ms
  • Claude Code successfully invokes model through the private path
  • No public internet egress observed in VPC flow logs during Claude Code usage

Important: VPC Service Controls changes can take up to 30 minutes to propagate after a successful API response. Plan for this during initial setup and testing.

Gateway Deployment on GCP#

The LLM gateway pattern is the same regardless of cloud provider – see LLM Gateway Design for the full rationale. This section covers GCP-specific deployment details.

Developer-Facing Configuration#

export CLAUDE_CODE_USE_VERTEX=1
export ANTHROPIC_VERTEX_BASE_URL='https://llm-gateway.internal.corp.com/vertex'
export CLAUDE_CODE_SKIP_VERTEX_AUTH=1  # Gateway handles GCP auth

Deployment Topology#

┌─────────────────────────────────────┐
│  LLM Gateway (internal service)│                                     │
│  Deployment: Cloud Run or GKE       │
│  URL: llm-gateway.internal.corp     │
│  Auth: SSO / OIDC                   │
│                                     │
│  Upstream: Vertex AI endpoint       │
(via Private Google Access)└─────────────────────────────────────┘

Credential Management via Workload Identity#

The gateway authenticates to Vertex AI using Workload Identity Federation – no service account keys needed.

For GKE:

  1. Enable Workload Identity Federation on the cluster

  2. Create a Kubernetes ServiceAccount for the gateway

  3. Grant roles/aiplatform.user directly to the Kubernetes principal:

    principal://iam.googleapis.com/projects/<PROJECT_NUMBER>/locations/global/workloadIdentityPools/<PROJECT_ID>.svc.id.goog/subject/ns/<NAMESPACE>/sa/<KSA_NAME>

For Cloud Run: Assign roles/aiplatform.user to the Cloud Run service’s service account. No additional configuration needed.

GCP-Specific Observability#

Push OpenTelemetry metrics to Cloud Monitoring:

  • Per-user token consumption
  • Latency percentiles (p50, p95, p99)
  • Error rates by model and user
  • Request volume over time
  • Budget utilization per team

Corporate Network Connectivity#

OptionBandwidthDescription
Dedicated Interconnect10–100 GbpsDirect physical connection at a Google colocation facility
Partner Interconnect50 Mbps–50 GbpsConnection through a service provider already connected to Google
HA Cloud VPNUp to 3 Gbps per tunnelEncrypted IPsec tunnels over the internet
Cross-Cloud Interconnect10–100 GbpsDirect connection between GCP and another cloud provider

All options support Cloud Router for dynamic BGP routing. For enterprise Vertex AI deployments, Dedicated or Partner Interconnect is recommended to keep AI inference traffic off the public internet.

Redundancy#

  • Redundant connections: Two Interconnect circuits from different providers, or Interconnect + HA VPN as backup
  • Monitoring: Cloud Monitoring alerts on Interconnect attachment status and VPN tunnel state
  • Failover testing: Test failover quarterly

Provider Comparison#

For a side-by-side comparison of Bedrock, Vertex AI, and Azure Foundry, see the Provider Selection table in Amazon Bedrock Fundamentals.