Isolation mode on VMware ESX servers
Last week at work, we had a network engineer look into some issued we’d been having with one of the switches in the server room. The switch in question was the main LAN switch to which all our servers are connected. Turns out, the switch needed to be rebooted to solve the problem. We didn’t think anything of it apart from the fact that the servers would be off the net for a few minutes.
The morning after we discovered, as we logged in to a few of the servers, that all virtual machines had shut down unexpectedly the night before. My first instinct was that the network engineer must have disconnected the power to the VMware servers for some odd reason and that this was the cause for the disturbance. But after some digging around, we found that none of the ESX servers had rebooted. So why had all the virtual machines shut down?
Turns out, there’s a feature in ESX called isolation mode. When you’ve setup your ESX cluster for HA (High Availability) and an ESX server loses contact with the other ESX servers and with its gateway, the server considers itself to be isolated and the default action when isolated is to shut down all virtual machines! In other words, if you have to disconnect your ESX servers from the network, look into isolation mode first.
I’m still not sure why the default is to shut down the machines. I mean, won’t forcibly shutting down a virtual machine potentially cause more problems than letting it run without a network connection?