My ramblings on the stuff that holds it all together
Category Archives: Fault Tolerance
I have a 2 node vSphere cluster running on a pair of ML115g5 servers (cheap ESX nodes, FT compatible) and I was trying to put one into maintenance mode so I could update its host profile, however it got stuck at 2% entering maintenance mode, it appeared to vMotion off the VMs it was running as expected but never passed the 2% mark.
After some investigation I noticed there were a pair of virtual machines still running on this host with FT enabled – the secondary was running on the other server ML115-1 (i.e not the one I wanted to switch to maintenance mode)
I was unable to use vMotion so that the primary and secondary VMs were temporarily running on the same ESX host (and that doesn’t make much sense anyway)
That makes sense, the client doesn’t let you deliberately do something to that host that would break the FT protection as there would be no node to run the secondary copy. incidentally this is good UI design – you have to opt-in to break something – so you just have to temporarily disable FT and should be able to proceed.
If I had a 3rd node in this cluster there wouldn’t be a problem as it would vMotion the secondary (or primary) to an alternative node automatically (shown below is how to do this manually)
However in my case all of the options to disable/turn-off FT were greyed out and you would appear to be stuck and unable to progress.
the fix is pretty simple and you just need to cancel the maintenance mode job by right-clicking in the recent tasks pane and choosing cancel, which then re-enables the menu options and allows you to proceed. Then turn-off (not disable – that doesn’t work) fault tolerance for the problematic virtual machines
The virtual machine now doesn’t have FT turned on, if you just disable FT it doesn’t resolve this problem as it leaves the secondary VM in-situ, you need to turn it off.
So, moral of the story is – if you’re stuck at 2% look for virtual machines that can’t be vMotioned off the host – if you want to use FT – a 3rd node would be a good idea to keep the VM FT’d during individual host maintenance; this is a lab environment rather than an enterprise grade production system but you could envision some 2-node clusters for some SMB users – worth bearing in mind if you work in that space.