Phased Rollout – Cohort Strategy#
Principle#
Don’t light up 500 developers at once. Each cohort discovers different classes of issues and builds institutional knowledge for the next cohort.
Cohort 1: Power Users (25 developers, Weeks 5–6)#
Selection Criteria#
- Hand-picked across 3–4 teams representing different technology stacks
- Mix of senior engineers (who know the patterns) and enthusiastic mid-levels (who’ll push boundaries)
- At least one developer per major codebase
- Include developers who are already CLI/terminal-native
Objectives#
- Validate infrastructure end-to-end (Bedrock routing, gateway, PrivateLink)
- Test managed-settings.json enforcement – do the deny rules work? Does bypass mode stay disabled?
- Write the first project CLAUDE.md and agent_docs/ files for their repos
- Co-create the initial 5–8 org-wide skills based on real workflows
- Become internal champions who can support Cohort 2
Expected Discoveries#
- Bedrock inference profiles required for on-demand usage (common first gotcha)
- Prompt caching behavior differs from direct API
- Some model versions lag behind on Bedrock
- Specific CLAUDE.md instructions that Claude follows well vs. ignores
- MCP servers that are most valuable for the org’s toolchain
- Skills that need more or fewer steps than initially designed
- Edge cases in deny rules (false positives blocking legitimate work)
Success Metrics#
- All 25 developers able to use Claude Code through the enterprise infrastructure
- At least 3 project CLAUDE.md files written and reviewed
- At least 5 org-wide skills tested and iterated
- Zero security policy violations (deny rules holding)
- Qualitative feedback: “This makes me faster” vs. “This gets in the way”
Platform Team Commitment#
- Daily Slack channel for issues during first week
- 30-minute stand-up twice per week with Cohort 1
- Same-day turnaround on configuration issues
Cohort 2: Full Teams (100 developers, Weeks 7–9)#
Selection Criteria#
- Expand to complete teams (beyond individual developers)
- Include at least one team that’s skeptical – they’ll surface real objections
- Include the team with the most complex codebase
Objectives#
- Stress-test the LLM gateway’s rate limiting and budget model
- Validate that team-level CLAUDE.md and skills work across a full team
- Discover the actual cost model (tokens per developer per day)
- Instrument OpenTelemetry metrics
New Infrastructure Requirements#
- Gateway rate limiting tuned based on Cohort 1 usage patterns
- Per-team token budgets configured
- CloudWatch dashboards showing per-user token consumption, latency percentiles, error rates
- Cost allocation tags in AWS for team-level billing
Expected Discoveries#
- Real-world token consumption rates (expect 50K–200K tokens/developer/day depending on usage intensity)
- Teams that use Claude Code very differently (some use it for code gen, others for review, others for documentation)
- Teams that need different model access (Opus for architecture work, Sonnet for routine coding)
- Edge cases in project CLAUDE.md that only surface with diverse usage patterns
Success Metrics#
- Gateway handles 100 concurrent developers without latency spikes
- Per-team budgets prevent runaway costs
- At least 80% of developers reporting increased productivity
- Zero infrastructure outages
- Cost model validated and predictable
Platform Team Commitment#
- Slack channel continues, monitored 8am–6pm
- Weekly office hours for questions
- Bi-weekly feedback surveys
Cohort 3: Full Organization (375 developers, Weeks 10–12)#
Prerequisites (Must Be True Before Launching)#
- Gateway proven at 100-developer scale
- Managed-settings.json deployed to all developer machines via Mobile Device Management (MDM)
- Internal documentation written by Cohort 1 champions
- Onboarding guide tested with Cohort 2 (developers who weren’t hand-picked)
- Cost projections validated and approved by finance
- CISO sign-off on the security architecture still current
Rollout Approach#
- Department-by-department, not all-at-once
- Each department gets a 15-minute onboarding session led by a Cohort 1/2 champion
- Self-service documentation available for async onboarding
- Platform team monitors gateway metrics for capacity issues
Expected Discoveries#
- Long-tail support issues from developers with non-standard setups
- Teams that need custom skills not anticipated during Cohort 1/2
- Organizational patterns in how different teams use Claude Code
- Real productivity data at scale for leadership reporting
Success Metrics#
- 90%+ of developers have used Claude Code at least once within 2 weeks of access
- Weekly active usage rate > 60% after first month
- Support ticket volume declining (not growing) after first 2 weeks
- Total cost within 10% of projection
- No security incidents
Rollback Plan#
Principle#
Rollback is cohort-level, not all-or-nothing. If a cohort encounters serious issues, pause that cohort while earlier cohorts continue operating.
Rollback Triggers#
- Security incident (deny rule bypass, data exposure)
- Infrastructure instability (gateway outages, Bedrock throttling) affecting developer productivity
- Cost significantly exceeding projections (>25% over forecast)
- Widespread negative developer feedback (>40% reporting “gets in the way” rather than “makes me faster”)
Rollback Actions by Severity#
Pause (reversible): Disable the gateway route for the affected cohort’s users. Developers can’t reach Claude Code but nothing is uninstalled. Resume when the issue is resolved.
Reconfigure: Push updated managed-settings.json via MDM to tighten deny rules, change model routing, or adjust budgets. No developer action required.
Full rollback: Remove Claude Code from developer machines via MDM. Remove managed-settings.json and managed CLAUDE.md. This is the nuclear option – use only if a fundamental security concern is discovered.
What Rollback Does NOT Require#
- Reverting code already written with Claude Code – it’s in git like any other code
- Removing CLAUDE.md files from repos – they’re inert without the CLI
- Canceling the Bedrock account – infrastructure can stay warm for re-engagement
Re-Engagement After Rollback#
If you roll back a cohort, fix the root cause and re-launch with a smaller pilot (5-10 developers) before re-expanding. Don’t re-launch at the previous cohort size until the fix is validated.
Timeline Summary#
Week 1-4: Phase 0 -- Infrastructure build
Week 3-6: Phase 1 -- Platform engineering (overlapping)
Week 5-6: Cohort 1 -- 25 power users
Week 7-9: Cohort 2 -- 100 full teams
Week 10-12: Cohort 3 -- 375 remaining developers
Week 12+: Phase 3 -- Ongoing governance and optimization