vSAN 2-Node Stretched Cluster Deployment Guide
Overview
This guide documents the complete deployment of a VMware vSAN 2-node stretched cluster with witness appliance on MINISFORUM MS-A2 hosts. The configuration provides high-performance all-NVMe storage with fault tolerance and enterprise-grade features.
Final Configuration
Hardware Architecture
- 2x MINISFORUM MS-A2 hosts (data nodes)
- 1x Mac Pro Late 2013 (witness host)
- Network: Dedicated VLAN 30 for vSAN traffic
- Storage: All-NVMe configuration with cache/capacity tiers
Storage Configuration Per Host
- Cache Tier: WD_BLACK SN850X 4TB (optimized for cache performance)
- Capacity Tier: Samsung 990 PRO 4TB (high-capacity storage)
- Total Raw Capacity: 7.28TB (2 hosts × 3.64TB)
- Usable Capacity: 3.64TB (after RAID-1 mirroring)
Prerequisites
Network Configuration
- Management Network: VLAN 10 (192.168.10.0/24)
- vSAN Network: VLAN 30 (192.168.30.0/24)
- Inter-VLAN routing: Configured for management access
Host Specifications
MS-A2-01: 192.168.10.12 (vSAN: 192.168.30.12)
MS-A2-02: 192.168.10.13 (vSAN: 192.168.30.13)
Witness: 192.168.10.20 (vSAN: 192.168.30.20)
Storage Devices
# MS-A2-01 Storage Devices
Samsung 990 PRO 4TB: t10.NVMe____Samsung_SSD_990_PRO_4TB_________________72A9415145382500
WD_BLACK SN850X: t10.NVMe____WD_BLACK_SN850X_4000GB__________________E92917428B441B00
# MS-A2-02 Storage Devices
Samsung 990 PRO 4TB: t10.NVMe____Samsung_SSD_990_PRO_4TB_________________84A9415145382500
WD_BLACK SN850X: t10.NVMe____WD_BLACK_SN850X_4000GB__________________3C2917428B441B00
Phase 1: Network Infrastructure Setup
1.1 Create vSAN VMkernel Interfaces
On each MS-A2 host via vCenter:
1. Navigate to: Host → Configure → Networking → VMkernel adapters
2. Click "Add networking"
3. Select "VMkernel Network Adapter"
4. Choose "Home-DVS" distributed switch
5. Configure new port group:
- Name: vSAN
- VLAN ID: 30
6. Configure VMkernel settings:
- IP: 192.168.30.12 (MS-A2-01) or 192.168.30.13 (MS-A2-02)
- Subnet: 255.255.255.0
- Gateway: 192.168.10.1
7. Enable services: ✓ vSAN traffic
8. Complete configuration
Result: vmk6 interface created on VLAN 30 for vSAN traffic
1.2 Verify vSAN Network Connectivity
# Test network connectivity between hosts
ssh esxi-ms-a2-01 "vmkping -I vmk6 192.168.30.13"
ssh esxi-ms-a2-02 "vmkping -I vmk6 192.168.30.12"
# Expected output: 0% packet loss, <1ms latency
Phase 2: vSAN Witness Appliance Deployment
2.1 Deploy Witness Appliance
Via vCenter on Mac Pro host:
1. Deploy OVF template: VMware vSAN Witness Appliance
2. Configure deployment:
- Name: vsan-witness
- Host: esxi-mac-pro.lab.markalston.net
- Storage: Local datastore
3. Network configuration:
- Management: VM Network (192.168.10.20)
- vSAN: VM Network (will configure VLAN later)
4. Power on and complete initial setup
2.2 Configure Witness Networking
Configure VLAN 30 for vSAN traffic:
1. Edit witness VM settings
2. Add second network adapter on Home-DVS
3. Configure for VLAN 30
4. SSH to witness and configure IP:
- vmk1: 192.168.30.20/24
- Enable vSAN traffic
2.3 Add Witness to vCenter
vCenter → Hosts and Clusters → Add Host
- Host: 192.168.10.20
- Credentials: root / Cl0udFoundry!
- Certificate: Accept
Phase 3: Storage Preparation and Device Claiming
3.1 Verify Storage Devices
# Check available storage devices on each host
ssh esxi-ms-a2-01 "esxcli storage core device list | grep -E '(Samsung|WD_BLACK)'"
ssh esxi-ms-a2-02 "esxcli storage core device list | grep -E '(Samsung|WD_BLACK)'"
# Verify devices are not claimed by other systems
ssh esxi-ms-a2-01 "ls -la /vmfs/devices/disks/ | grep -E '(Samsung|WD_BLACK)'"
3.2 Clear Any Existing Partitions
# Clear Samsung 990 PRO partitions (if any exist)
ssh esxi-ms-a2-01 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____Samsung_SSD_990_PRO_4TB_________________72A9415145382500 gpt"
ssh esxi-ms-a2-02 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____Samsung_SSD_990_PRO_4TB_________________84A9415145382500 gpt"
# Clear WD_BLACK partitions (if any exist)
ssh esxi-ms-a2-01 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____WD_BLACK_SN850X_4000GB__________________E92917428B441B00 gpt"
ssh esxi-ms-a2-02 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____WD_BLACK_SN850X_4000GB__________________3C2917428B441B00 gpt"
# Rescan storage adapters
ssh esxi-ms-a2-01 "esxcli storage core adapter rescan --all"
ssh esxi-ms-a2-02 "esxcli storage core adapter rescan --all"
Phase 4: vSAN Cluster Configuration
4.1 Enable vSAN on Cluster (CLI Method)
Create new vSAN cluster:
# On MS-A2-01 (will be cluster master)
ssh esxi-ms-a2-01 "esxcli vsan cluster new"
# Get cluster UUID
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep 'Sub-Cluster UUID'"
# Example output: Sub-Cluster UUID: 52bd3208-de4b-7b12-528f-09bae1dd2054
# Join MS-A2-02 to the cluster
ssh esxi-ms-a2-02 "esxcli vsan cluster join -u 52bd3208-de4b-7b12-528f-09bae1dd2054"
# Verify cluster membership
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep 'Member Count'"
ssh esxi-ms-a2-02 "esxcli vsan cluster get | grep 'Member Count'"
# Expected: Sub-Cluster Member Count: 2
Alternative: Enable via vCenter UI:
1. Navigate to: MS-A2-Cluster → Configure → vSAN → Services
2. Click "Enable vSAN"
3. Select "Configure vSAN"
4. Choose cluster type: "2-node cluster"
5. Complete wizard
4.2 Verify vSAN Network Configuration
# Check vSAN network interfaces
ssh esxi-ms-a2-01 "esxcli vsan network list"
ssh esxi-ms-a2-02 "esxcli vsan network list"
# Expected output shows vmk6 with proper configuration
Phase 5: Disk Group Creation and Storage Claiming
5.1 Claim Storage Devices via vCenter UI
Navigate to vSAN Disk Management:
vCenter → MS-A2-Cluster → Configure → vSAN → Disk Management
Claim Disks for MS-A2-01:
1. Click "Claim Disks" or "Create Disk Group"
2. Select devices for MS-A2-01:
- Cache Tier: ✓ NVMe WD_BLACK SN850X 4000GB
- Capacity Tier: ✓ NVMe Samsung SSD 990 PRO 4TB
3. Click "Create"
4. Wait for disk group creation to complete
Claim Disks for MS-A2-02:
1. Repeat process for MS-A2-02:
- Cache Tier: ✓ NVMe WD_BLACK SN850X 4000GB
- Capacity Tier: ✓ NVMe Samsung SSD 990 PRO 4TB
2. Click "Create"
3. Wait for completion
5.2 Verify Disk Group Configuration
# Check vSAN storage configuration
ssh esxi-ms-a2-01 "esxcli vsan storage list"
ssh esxi-ms-a2-02 "esxcli vsan storage list"
# Expected output:
# - WD_BLACK: Is Capacity Tier: false (cache)
# - Samsung: Is Capacity Tier: true (capacity)
# - Both: Used by this host: true, In CMMDS: true
Phase 6: Witness Configuration for 2-Node Cluster
6.1 Configure Stretched Cluster via vCenter
vCenter → MS-A2-Cluster → Configure → vSAN → Fault Domains & Stretched Cluster
Set up stretched cluster:
1. Configuration type: ✓ Stretched cluster
2. Witness host: Select "vsan-witness.lab.markalston.net"
3. Preferred site: Choose MS-A2-01 as preferred
4. Apply configuration
6.2 Verify Witness Integration
Check witness status:
# Verify witness is part of cluster
ssh root@192.168.10.20 "esxcli vsan cluster get | grep 'Local Node Type'"
# Expected: Local Node Type: WITNESS
# Check cluster membership from all nodes
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep 'Member Count'"
ssh esxi-ms-a2-02 "esxcli vsan cluster get | grep 'Member Count'"
ssh root@192.168.10.20 "esxcli vsan cluster get | grep 'Member Count'"
# Expected: All should show Member Count: 3 (2 data + 1 witness)
Phase 7: Final Configuration and Verification
7.1 Verify vSAN Datastore
Check datastore creation:
vCenter → Storage → Datastores
- Should show "vsanDatastore" with 3.64TB capacity
- Status: Normal
- Type: vSAN
Via ESXi host:
# Check datastore status
ssh esxi-ms-a2-01 "df -h | grep vsan"
ssh esxi-ms-a2-02 "df -h | grep vsan"
# Expected output: vsanDatastore with 3.6TB capacity
7.2 vSAN Health Check
Via vCenter UI:
vCenter → MS-A2-Cluster → Monitor → vSAN → Health
- Review all health checks
- Address any warnings or errors
- Key areas: Cluster, Network, Physical disk
Via CLI:
# Check vSAN cluster health
ssh esxi-ms-a2-01 "esxcli vsan cluster get"
# Verify: Local Node Health State: HEALTHY
# Check disk health
ssh esxi-ms-a2-01 "esxcli vsan storage list | grep 'Checksum OK'"
ssh esxi-ms-a2-02 "esxcli vsan storage list | grep 'Checksum OK'"
# Expected: All disks show "Checksum OK: true"
7.3 Performance Verification
Test vSAN performance:
# Basic I/O test
ssh esxi-ms-a2-01 "vmkfstools -c 1G /vmfs/volumes/vsanDatastore/test.vmdk"
ssh esxi-ms-a2-01 "rm /vmfs/volumes/vsanDatastore/test.vmdk"
# Monitor vSAN performance in vCenter
# Navigate to: MS-A2-Cluster → Monitor → vSAN → Performance
Phase 8: Troubleshooting Common Issues
8.1 Network Partition Errors
Symptoms:
- “vSAN cluster is network partitioned” error
- Hosts showing as separate single-member clusters
Resolution:
# Reset vSAN cluster membership
ssh esxi-ms-a2-02 "esxcli vsan cluster leave"
ssh esxi-ms-a2-01 "esxcli vsan cluster leave"
# Restart vSAN services
ssh esxi-ms-a2-01 "/etc/init.d/vsanmgmtd restart"
ssh esxi-ms-a2-02 "/etc/init.d/vsanmgmtd restart"
# Recreate cluster
ssh esxi-ms-a2-01 "esxcli vsan cluster new"
# Get new cluster UUID and have MS-A2-02 join
8.2 No Disks Available for Claiming
Symptoms:
- vCenter shows no disks under “Claim Disks”
- Storage devices not visible to vSAN
Resolution:
# Check device claiming conflicts
ssh esxi-ms-a2-01 "esxcli storage core device list | grep -A 10 'Samsung\|WD_BLACK'"
# Remove any existing partitions
ssh esxi-ms-a2-01 "partedUtil setptbl /vmfs/devices/disks/[DEVICE_NAME] gpt"
# Rescan storage
ssh esxi-ms-a2-01 "esxcli storage core adapter rescan --all"
# Restart hostd if needed
ssh esxi-ms-a2-01 "/etc/init.d/hostd restart"
8.3 UI Display Issues After Reboot
Symptoms:
- Individual host storage view shows 0B capacity
- vSAN cluster view shows correct capacity
Resolution:
- This is typically a vCenter UI caching issue
- Refresh browser or re-login to vCenter
- Force storage rescan:
esxcli storage core adapter rescan --all - Wait 5-10 minutes for UI to refresh
Final Configuration Summary
Cluster Configuration
Cluster Name: MS-A2-Cluster
vSAN Version: 8.0.3
Cluster Type: 2-node stretched cluster with witness
Deduplication: Disabled
Compression: Disabled
Encryption: Disabled
Storage Layout
Total Raw Capacity: 7.28TB (2 × 3.64TB)
Usable Capacity: 3.64TB (RAID-1 mirroring)
Cache Tier: 2 × 4TB WD_BLACK SN850X (8TB total cache)
Capacity Tier: 2 × 4TB Samsung 990 PRO (8TB total capacity)
Network Configuration
Management: VLAN 10 (192.168.10.0/24)
vSAN Traffic: VLAN 30 (192.168.30.0/24)
Multicast: Disabled (Unicast mode)
Performance Characteristics
Expected Latency: <1ms (local NVMe)
Expected IOPS: 500K+ mixed workload
Expected Bandwidth: ~6GB/s per host
Fault Tolerance: Single host failure
Maintenance and Operations
Regular Health Checks
# Weekly health verification
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep Health"
ssh esxi-ms-a2-02 "esxcli vsan cluster get | grep Health"
# Monthly storage verification
ssh esxi-ms-a2-01 "esxcli vsan storage list | grep 'Checksum OK'"
ssh esxi-ms-a2-02 "esxcli vsan storage list | grep 'Checksum OK'"
Backup Procedures
# Backup vSAN configuration
# Use vCenter backup or configuration export
# Document cluster UUIDs and disk group configurations
Scaling Considerations
Adding Storage:
- Additional NVMe devices can be added to existing disk groups
- New disk groups can be created with additional devices
- Witness appliance may need storage expansion
Performance Tuning:
- Monitor cache hit ratios in vCenter
- Consider deduplication/compression if capacity becomes constrained
- Adjust vSAN policies for specific workload requirements
Related Documentation
- NVMe Memory Tiering Alternatives
- Network VLAN Configuration
- ESXi 8.0.3 Installation Guide
- Synology Storage Configuration
Document Version: 1.0
Last Updated: August 2025
Deployment Date: August 21-22, 2025
Environment: Mark’s Homelab Infrastructure