vSAN 2-Node Stretched Cluster Deployment Guide

Overview

This guide documents the complete deployment of a VMware vSAN 2-node stretched cluster with witness appliance on MINISFORUM MS-A2 hosts. The configuration provides high-performance all-NVMe storage with fault tolerance and enterprise-grade features.

Final Configuration

Hardware Architecture

  • 2x MINISFORUM MS-A2 hosts (data nodes)
  • 1x Mac Pro Late 2013 (witness host)
  • Network: Dedicated VLAN 30 for vSAN traffic
  • Storage: All-NVMe configuration with cache/capacity tiers

Storage Configuration Per Host

  • Cache Tier: WD_BLACK SN850X 4TB (optimized for cache performance)
  • Capacity Tier: Samsung 990 PRO 4TB (high-capacity storage)
  • Total Raw Capacity: 7.28TB (2 hosts × 3.64TB)
  • Usable Capacity: 3.64TB (after RAID-1 mirroring)

Prerequisites

Network Configuration

  • Management Network: VLAN 10 (192.168.10.0/24)
  • vSAN Network: VLAN 30 (192.168.30.0/24)
  • Inter-VLAN routing: Configured for management access

Host Specifications

MS-A2-01: 192.168.10.12 (vSAN: 192.168.30.12)
MS-A2-02: 192.168.10.13 (vSAN: 192.168.30.13)
Witness:  192.168.10.20 (vSAN: 192.168.30.20)

Storage Devices

# MS-A2-01 Storage Devices
Samsung 990 PRO 4TB: t10.NVMe____Samsung_SSD_990_PRO_4TB_________________72A9415145382500
WD_BLACK SN850X:     t10.NVMe____WD_BLACK_SN850X_4000GB__________________E92917428B441B00

# MS-A2-02 Storage Devices  
Samsung 990 PRO 4TB: t10.NVMe____Samsung_SSD_990_PRO_4TB_________________84A9415145382500
WD_BLACK SN850X:     t10.NVMe____WD_BLACK_SN850X_4000GB__________________3C2917428B441B00

Phase 1: Network Infrastructure Setup

1.1 Create vSAN VMkernel Interfaces

On each MS-A2 host via vCenter:

1. Navigate to: Host → Configure → Networking → VMkernel adapters
2. Click "Add networking"
3. Select "VMkernel Network Adapter"
4. Choose "Home-DVS" distributed switch
5. Configure new port group:
   - Name: vSAN
   - VLAN ID: 30
6. Configure VMkernel settings:
   - IP: 192.168.30.12 (MS-A2-01) or 192.168.30.13 (MS-A2-02)
   - Subnet: 255.255.255.0
   - Gateway: 192.168.10.1
7. Enable services: ✓ vSAN traffic
8. Complete configuration

Result: vmk6 interface created on VLAN 30 for vSAN traffic

1.2 Verify vSAN Network Connectivity

# Test network connectivity between hosts
ssh esxi-ms-a2-01 "vmkping -I vmk6 192.168.30.13"
ssh esxi-ms-a2-02 "vmkping -I vmk6 192.168.30.12"

# Expected output: 0% packet loss, <1ms latency

Phase 2: vSAN Witness Appliance Deployment

2.1 Deploy Witness Appliance

Via vCenter on Mac Pro host:

1. Deploy OVF template: VMware vSAN Witness Appliance
2. Configure deployment:
   - Name: vsan-witness
   - Host: esxi-mac-pro.lab.markalston.net
   - Storage: Local datastore
3. Network configuration:
   - Management: VM Network (192.168.10.20)
   - vSAN: VM Network (will configure VLAN later)
4. Power on and complete initial setup

2.2 Configure Witness Networking

Configure VLAN 30 for vSAN traffic:

1. Edit witness VM settings
2. Add second network adapter on Home-DVS
3. Configure for VLAN 30
4. SSH to witness and configure IP:
   - vmk1: 192.168.30.20/24
   - Enable vSAN traffic

2.3 Add Witness to vCenter

vCenter → Hosts and Clusters → Add Host
- Host: 192.168.10.20
- Credentials: root / Cl0udFoundry!
- Certificate: Accept

Phase 3: Storage Preparation and Device Claiming

3.1 Verify Storage Devices

# Check available storage devices on each host
ssh esxi-ms-a2-01 "esxcli storage core device list | grep -E '(Samsung|WD_BLACK)'"
ssh esxi-ms-a2-02 "esxcli storage core device list | grep -E '(Samsung|WD_BLACK)'"

# Verify devices are not claimed by other systems
ssh esxi-ms-a2-01 "ls -la /vmfs/devices/disks/ | grep -E '(Samsung|WD_BLACK)'"

3.2 Clear Any Existing Partitions

# Clear Samsung 990 PRO partitions (if any exist)
ssh esxi-ms-a2-01 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____Samsung_SSD_990_PRO_4TB_________________72A9415145382500 gpt"
ssh esxi-ms-a2-02 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____Samsung_SSD_990_PRO_4TB_________________84A9415145382500 gpt"

# Clear WD_BLACK partitions (if any exist)
ssh esxi-ms-a2-01 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____WD_BLACK_SN850X_4000GB__________________E92917428B441B00 gpt"
ssh esxi-ms-a2-02 "partedUtil setptbl /vmfs/devices/disks/t10.NVMe____WD_BLACK_SN850X_4000GB__________________3C2917428B441B00 gpt"

# Rescan storage adapters
ssh esxi-ms-a2-01 "esxcli storage core adapter rescan --all"
ssh esxi-ms-a2-02 "esxcli storage core adapter rescan --all"

Phase 4: vSAN Cluster Configuration

4.1 Enable vSAN on Cluster (CLI Method)

Create new vSAN cluster:

# On MS-A2-01 (will be cluster master)
ssh esxi-ms-a2-01 "esxcli vsan cluster new"

# Get cluster UUID
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep 'Sub-Cluster UUID'"
# Example output: Sub-Cluster UUID: 52bd3208-de4b-7b12-528f-09bae1dd2054

# Join MS-A2-02 to the cluster
ssh esxi-ms-a2-02 "esxcli vsan cluster join -u 52bd3208-de4b-7b12-528f-09bae1dd2054"

# Verify cluster membership
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep 'Member Count'"
ssh esxi-ms-a2-02 "esxcli vsan cluster get | grep 'Member Count'"
# Expected: Sub-Cluster Member Count: 2

Alternative: Enable via vCenter UI:

1. Navigate to: MS-A2-Cluster → Configure → vSAN → Services
2. Click "Enable vSAN"
3. Select "Configure vSAN"
4. Choose cluster type: "2-node cluster"
5. Complete wizard

4.2 Verify vSAN Network Configuration

# Check vSAN network interfaces
ssh esxi-ms-a2-01 "esxcli vsan network list"
ssh esxi-ms-a2-02 "esxcli vsan network list"

# Expected output shows vmk6 with proper configuration

Phase 5: Disk Group Creation and Storage Claiming

5.1 Claim Storage Devices via vCenter UI

Navigate to vSAN Disk Management:

vCenter → MS-A2-Cluster → Configure → vSAN → Disk Management

Claim Disks for MS-A2-01:

1. Click "Claim Disks" or "Create Disk Group"
2. Select devices for MS-A2-01:
   - Cache Tier: ✓ NVMe WD_BLACK SN850X 4000GB
   - Capacity Tier: ✓ NVMe Samsung SSD 990 PRO 4TB
3. Click "Create"
4. Wait for disk group creation to complete

Claim Disks for MS-A2-02:

1. Repeat process for MS-A2-02:
   - Cache Tier: ✓ NVMe WD_BLACK SN850X 4000GB
   - Capacity Tier: ✓ NVMe Samsung SSD 990 PRO 4TB
2. Click "Create"
3. Wait for completion

5.2 Verify Disk Group Configuration

# Check vSAN storage configuration
ssh esxi-ms-a2-01 "esxcli vsan storage list"
ssh esxi-ms-a2-02 "esxcli vsan storage list"

# Expected output:
# - WD_BLACK: Is Capacity Tier: false (cache)
# - Samsung: Is Capacity Tier: true (capacity)
# - Both: Used by this host: true, In CMMDS: true

Phase 6: Witness Configuration for 2-Node Cluster

6.1 Configure Stretched Cluster via vCenter

vCenter → MS-A2-Cluster → Configure → vSAN → Fault Domains & Stretched Cluster

Set up stretched cluster:

1. Configuration type: ✓ Stretched cluster
2. Witness host: Select "vsan-witness.lab.markalston.net"
3. Preferred site: Choose MS-A2-01 as preferred
4. Apply configuration

6.2 Verify Witness Integration

Check witness status:

# Verify witness is part of cluster
ssh root@192.168.10.20 "esxcli vsan cluster get | grep 'Local Node Type'"
# Expected: Local Node Type: WITNESS

# Check cluster membership from all nodes
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep 'Member Count'"
ssh esxi-ms-a2-02 "esxcli vsan cluster get | grep 'Member Count'"  
ssh root@192.168.10.20 "esxcli vsan cluster get | grep 'Member Count'"
# Expected: All should show Member Count: 3 (2 data + 1 witness)

Phase 7: Final Configuration and Verification

7.1 Verify vSAN Datastore

Check datastore creation:

vCenter → Storage → Datastores
- Should show "vsanDatastore" with 3.64TB capacity
- Status: Normal
- Type: vSAN

Via ESXi host:

# Check datastore status
ssh esxi-ms-a2-01 "df -h | grep vsan"
ssh esxi-ms-a2-02 "df -h | grep vsan"

# Expected output: vsanDatastore with 3.6TB capacity

7.2 vSAN Health Check

Via vCenter UI:

vCenter → MS-A2-Cluster → Monitor → vSAN → Health
- Review all health checks
- Address any warnings or errors
- Key areas: Cluster, Network, Physical disk

Via CLI:

# Check vSAN cluster health
ssh esxi-ms-a2-01 "esxcli vsan cluster get"
# Verify: Local Node Health State: HEALTHY

# Check disk health
ssh esxi-ms-a2-01 "esxcli vsan storage list | grep 'Checksum OK'"
ssh esxi-ms-a2-02 "esxcli vsan storage list | grep 'Checksum OK'"
# Expected: All disks show "Checksum OK: true"

7.3 Performance Verification

Test vSAN performance:

# Basic I/O test
ssh esxi-ms-a2-01 "vmkfstools -c 1G /vmfs/volumes/vsanDatastore/test.vmdk"
ssh esxi-ms-a2-01 "rm /vmfs/volumes/vsanDatastore/test.vmdk"

# Monitor vSAN performance in vCenter
# Navigate to: MS-A2-Cluster → Monitor → vSAN → Performance

Phase 8: Troubleshooting Common Issues

8.1 Network Partition Errors

Symptoms:

  • “vSAN cluster is network partitioned” error
  • Hosts showing as separate single-member clusters

Resolution:

# Reset vSAN cluster membership
ssh esxi-ms-a2-02 "esxcli vsan cluster leave"
ssh esxi-ms-a2-01 "esxcli vsan cluster leave"

# Restart vSAN services
ssh esxi-ms-a2-01 "/etc/init.d/vsanmgmtd restart"
ssh esxi-ms-a2-02 "/etc/init.d/vsanmgmtd restart"

# Recreate cluster
ssh esxi-ms-a2-01 "esxcli vsan cluster new"
# Get new cluster UUID and have MS-A2-02 join

8.2 No Disks Available for Claiming

Symptoms:

  • vCenter shows no disks under “Claim Disks”
  • Storage devices not visible to vSAN

Resolution:

# Check device claiming conflicts
ssh esxi-ms-a2-01 "esxcli storage core device list | grep -A 10 'Samsung\|WD_BLACK'"

# Remove any existing partitions
ssh esxi-ms-a2-01 "partedUtil setptbl /vmfs/devices/disks/[DEVICE_NAME] gpt"

# Rescan storage
ssh esxi-ms-a2-01 "esxcli storage core adapter rescan --all"

# Restart hostd if needed
ssh esxi-ms-a2-01 "/etc/init.d/hostd restart"

8.3 UI Display Issues After Reboot

Symptoms:

  • Individual host storage view shows 0B capacity
  • vSAN cluster view shows correct capacity

Resolution:

  • This is typically a vCenter UI caching issue
  • Refresh browser or re-login to vCenter
  • Force storage rescan: esxcli storage core adapter rescan --all
  • Wait 5-10 minutes for UI to refresh

Final Configuration Summary

Cluster Configuration

Cluster Name: MS-A2-Cluster
vSAN Version: 8.0.3
Cluster Type: 2-node stretched cluster with witness
Deduplication: Disabled
Compression: Disabled
Encryption: Disabled

Storage Layout

Total Raw Capacity: 7.28TB (2 × 3.64TB)
Usable Capacity: 3.64TB (RAID-1 mirroring)
Cache Tier: 2 × 4TB WD_BLACK SN850X (8TB total cache)
Capacity Tier: 2 × 4TB Samsung 990 PRO (8TB total capacity)

Network Configuration

Management: VLAN 10 (192.168.10.0/24)
vSAN Traffic: VLAN 30 (192.168.30.0/24)
Multicast: Disabled (Unicast mode)

Performance Characteristics

Expected Latency: <1ms (local NVMe)
Expected IOPS: 500K+ mixed workload
Expected Bandwidth: ~6GB/s per host
Fault Tolerance: Single host failure

Maintenance and Operations

Regular Health Checks

# Weekly health verification
ssh esxi-ms-a2-01 "esxcli vsan cluster get | grep Health"
ssh esxi-ms-a2-02 "esxcli vsan cluster get | grep Health"

# Monthly storage verification  
ssh esxi-ms-a2-01 "esxcli vsan storage list | grep 'Checksum OK'"
ssh esxi-ms-a2-02 "esxcli vsan storage list | grep 'Checksum OK'"

Backup Procedures

# Backup vSAN configuration
# Use vCenter backup or configuration export
# Document cluster UUIDs and disk group configurations

Scaling Considerations

Adding Storage:

  • Additional NVMe devices can be added to existing disk groups
  • New disk groups can be created with additional devices
  • Witness appliance may need storage expansion

Performance Tuning:

  • Monitor cache hit ratios in vCenter
  • Consider deduplication/compression if capacity becomes constrained
  • Adjust vSAN policies for specific workload requirements


Document Version: 1.0
Last Updated: August 2025
Deployment Date: August 21-22, 2025
Environment: Mark’s Homelab Infrastructure


This project is for educational and home lab purposes.