Homelab Recovery Plan - DVS Migration Issues
Current Situation
Date: 2025-07-21
Issue: Two Intel NUCs lost network connectivity during orphaned DVS removal attempts
Infrastructure Status
| Host | IP | Status | Issue | Recovery Method |
|---|---|---|---|---|
| esxi-nuc-01.markalston.net | 192.168.10.8 | ❌ Offline | Management network migration failed | Console access required |
| esxi-nuc-02.markalston.net | 192.168.10.9 | ❌ Offline | DVS uplink manipulation failed | Console access required |
| esxi-nuc-03.markalston.net | 192.168.10.10 | ✅ Online | Has orphaned DVS | vCenter GUI removal |
| macpro.markalston.net | 192.168.10.7 | ✅ Online | Hosting vCenter | No action needed |
| vcsa.markalston.net | 192.168.10.11 | ✅ Online | vCenter Server | No action needed |
Recovery Steps
Phase 1: Console Recovery (esxi-nuc-01 & esxi-nuc-02)
Required: Physical console access or IPMI/BMC remote console
For Each Affected Host:
- Access Console:
- Physical console (monitor + keyboard)
- IPMI/BMC web console
- ESXi host remote console (if available)
- Login as root and execute these commands:
# Remove any broken network interfaces
esxcli network ip interface remove -i vmk0 2>/dev/null || true
esxcli network ip interface remove -i vmk1 2>/dev/null || true
esxcli network ip interface remove -i vmk2 2>/dev/null || true
# Ensure standard vSwitch exists
esxcli network vswitch standard add -v vSwitch0 2>/dev/null || true
# Add physical NIC to standard switch
esxcli network vswitch standard uplink add -v vSwitch0 -u vmnic0 2>/dev/null || true
# Create Management Network port group
esxcli network vswitch standard portgroup add -v vSwitch0 -p "Management Network" 2>/dev/null || true
# Recreate management interface with correct IP
# For esxi-nuc-01: 192.168.10.8
# For esxi-nuc-02: 192.168.10.9
esxcli network ip interface add -i vmk0 -p "Management Network"
esxcli network ip interface ipv4 set -i vmk0 -I 192.168.10.X -N 255.255.255.0 -t static
esxcli network ip interface tag add -i vmk0 -t Management
# Set default gateway
esxcli network ip route ipv4 add -g 192.168.10.1 -n default
# Remove orphaned DVS (if still present)
esxcfg-vswitch --delete --dvswitch vc01-dvs 2>/dev/null || true
# Restart network services
/etc/init.d/hostd restart
/etc/init.d/vpxa restart
From external machine
ping 192.168.10.X # Host IP ssh root@192.168.10.X
### Phase 2: Safe DVS Removal (esxi-nuc-03)
**Method**: Use vCenter GUI (safest approach)
1. **Access vCenter**: https://vcsa.markalston.net
- Username: administrator@vsphere.local
- Password: Cl0udFoundry!
2. **Add Host to vCenter** (if not already added):
- Right-click Homelab-DC → Add Host
- Host: esxi-nuc-03.markalston.net
3. **Remove from Distributed Switch**:
- Select esxi-nuc-03 in inventory
- Go to Configure → Networking → Virtual switches
- Find orphaned DVS (vc01-dvs) with warning icon
- Right-click → "Remove from distributed switch"
- Confirm removal
4. **Verify Standard Switch Configuration**:
- Ensure vSwitch0 exists with Management Network
- Verify vmnic0 is assigned to vSwitch0
- Confirm vmk0 is on Management Network portgroup
### Phase 3: Verification and Cleanup
1. **Test All Hosts**:
```bash
# Test connectivity
ping 192.168.10.8 # esxi-nuc-01
ping 192.168.10.9 # esxi-nuc-02
ping 192.168.10.10 # esxi-nuc-03
# Test SSH access
ssh root@esxi-nuc-01.markalston.net "hostname"
ssh root@esxi-nuc-02.markalston.net "hostname"
ssh root@esxi-nuc-03.markalston.net "hostname"
- Verify Network Configuration:
# On each host esxcfg-vswitch -l # Should show only standard switches esxcli network ip interface list # Should show vmk0 on Management Network esxcli network vswitch dvs vmware list # Should return empty - Add Hosts to vCenter:
- Add all three Intel NUCs to vCenter if not already present
- Verify no warning icons on network configuration
Phase 4: Continue with Original Plan
- Create Traditional Cluster:
- Use traditional baseline management (not vLCM)
- Avoid single image management due to community VIBs
- Configure Networking:
- Create new distributed switches if needed
- Set up proper VLANs and port groups
- Configure vMotion and storage networks
- Complete Infrastructure Setup:
- Configure datastores and storage policies
- Set up HA/DRS for the cluster
- Deploy test VMs
Prevention Measures
For Future DVS Operations:
- Always Use vCenter GUI for DVS operations when possible
- Have Console Access Ready before any management network changes
- Test on One Host First before applying to all hosts
- Use Dual NICs if available for safer migrations
- Backup Network Configuration before major changes
Commands to Avoid on Single-NIC Hosts:
esxcfg-vswitch -Q/-U(DVPort uplink manipulation)esxcli network ip interface remove -i vmk0(without immediate replacement)- Direct DVS manipulation when management is on DVS
Safe Commands for Future Reference:
esxcfg-vswitch --delete --dvswitch <name>(DVS removal)esxcfg-vswitch -l(list all switches)esxcli network ip interface list(list interfaces)
Lessons Learned
- Single NIC + DVS Management = High Risk: Any CLI manipulation risks immediate connectivity loss
- vCenter GUI is Safest: Use GUI for complex network changes when possible
- Console Access is Critical: Always have console access for network changes
- Test First: Always test procedures on one host before applying to all
- Correct Commands Matter:
esxcfg-vswitchfor DVS, notesxcli
Files Modified/Created
docs/troubleshooting/dvs-migration-recovery.md- Detailed recovery proceduresdocs/recovery-plan.md- This comprehensive recovery planscripts/migrate-management-network.sh- Original migration script (flawed)scripts/migrate-single-host.sh- Single host script (caused first failure)scripts/safe-dvs-removal.sh- Safer approach (incomplete)scripts/direct-dvs-removal.sh- Direct approach (caused second failure)
Next Steps After Recovery
- Complete console recovery for esxi-nuc-01 and esxi-nuc-02
- Use vCenter to safely remove DVS from esxi-nuc-03
- Proceed with traditional cluster setup
- Document final working configuration for future reference