Virtualization, Cloud, Infrastructure and all that stuff in-between
My ramblings on the stuff that holds it all together
Category Archives: FT
Using VMware Fault Tolerance to Protect a Virtualized vCenter machine
In my lab I have a virtualized vCenter installation, it works well and I’ve had no problems with this configuration in the last year.
I wanted to try to build a 2 node demo cluster for my VMUG session and needed vCenter to be protected by FT – so an individual host failure would not break vCenter during my demos.
My vCenter installation was thin-provisioned which isn’t compatible with FT so the quickest solution I found to this was to just clone it to a new VM with a fully provisioned (thick) disk.
Once completed I powered up the cloned vCenter installation whilst quickly switching off the old one to avoid any IP conflicts this worked fine and the ESX hosts didn’t really notice, I just had to re-connect my vSphere client.
I then enabled the FT features and after doing its thing I have a fully protected ESX/vCenter installation using FT.
it’s worth noting that you can only enable FT when using a vSphere client connected to vCenter – you can’t enable it if you connect directly to the ESX host itself (which is why cloning was the easiest approach for me)
VMware FT, 2 Nodes and stuck on 2% entering maintenance mode
I have a 2 node vSphere cluster running on a pair of ML115g5 servers (cheap ESX nodes, FT compatible) and I was trying to put one into maintenance mode so I could update its host profile, however it got stuck at 2% entering maintenance mode, it appeared to vMotion off the VMs it was running as expected but never passed the 2% mark.
After some investigation I noticed there were a pair of virtual machines still running on this host with FT enabled – the secondary was running on the other server ML115-1 (i.e not the one I wanted to switch to maintenance mode)
I was unable to use vMotion so that the primary and secondary VMs were temporarily running on the same ESX host (and that doesn’t make much sense anyway)
That makes sense, the client doesn’t let you deliberately do something to that host that would break the FT protection as there would be no node to run the secondary copy. incidentally this is good UI design – you have to opt-in to break something – so you just have to temporarily disable FT and should be able to proceed.
If I had a 3rd node in this cluster there wouldn’t be a problem as it would vMotion the secondary (or primary) to an alternative node automatically (shown below is how to do this manually)
However in my case all of the options to disable/turn-off FT were greyed out and you would appear to be stuck and unable to progress.
the fix is pretty simple and you just need to cancel the maintenance mode job by right-clicking in the recent tasks pane and choosing cancel, which then re-enables the menu options and allows you to proceed. Then turn-off (not disable – that doesn’t work) fault tolerance for the problematic virtual machines
The virtual machine now doesn’t have FT turned on, if you just disable FT it doesn’t resolve this problem as it leaves the secondary VM in-situ, you need to turn it off.
So, moral of the story is – if you’re stuck at 2% look for virtual machines that can’t be vMotioned off the host – if you want to use FT – a 3rd node would be a good idea to keep the VM FT’d during individual host maintenance; this is a lab environment rather than an enterprise grade production system but you could envision some 2-node clusters for some SMB users – worth bearing in mind if you work in that space.
vSphere – How to Enable FT for a Nested VM
As in my previous post; I am working on a lab with virtual ESX4 servers in it – I can vMotion VMs from a physical vSphere cluster into the virtual vSphere cluster perfectly and performance is very good (just 1 dropped ping in my testing)
One of the physical hosts belongs to www.techhead.co.uk which he has kindly lent for this joint experiment – see his posts here, here and here on running vSphere on these HP ML115g5 servers and their FT compatibility. We have some joint postings in the pipeline on guest performance with complicated apps like SQL & Exchange when protected via FT , so keep your eyes peeled.
As the physical ESX hosts themselves are FT compatible I thought I’d see if I can enable FT for a VM running inside a virtual ESX server cluster, so a VM running inside a hypervisor, inside another hypervisor..!
Our of the box, unfortunately not; as it gives the following error message 😦
Power On virtual machine Record/Replay is not supported on this CPU for this guest operating system. Vou may have an incompatible CPU, you may have specified the wrong guest operating system type, or you may have conflicting options set in your config file. See the online help fot a list of supported guest operating systems, CPUs and associated config options. Unable to enter fault tolerance mode. |
To work around this you can enable the following advanced (and likely totally unsupported) settings to enable FT on the nested VM (the default is/was false) (thanks to the comment on this post for the replay.allowBTOnly = TRUE setting!)
And here it is – Nested VM running, with FT enabled
Very nice
Later on you can see some warnings about hosts getting a bit behind, also I had some initial problems getting FT to bring up the 2nd VM properly, the UI said it was restarting and it got stuck there, I dropped the virtual ESXi host down to a single vCPU rather than two and it worked ok from then on. I decided to do this as the virtual ESXi nodes were coming up reporting 2 x Quad core CPUs; whilst the physical host only has a 1 x Quad Core CPU so I guess that was causing some confusion.
At this point both of my virtual ESXi hosts were on the same physical vSphere server, and I seemed to have problems with the secondary getting behind. (vLockstep interval)
In this instance my nested VM is running an x86 Windows 2003 unattended setup.
I vMotioned one of the virtual ESXi hosts to the second physical vSphere server (very cool in itself) and it seemed to be better for a while, I assume there was some CPU contention from the nested VM.
However in the end it flagged up similar errors, I assume this is due to the overhead of running a VM inside a hypervisor, inside another hypervisor 🙂 this is a lab setup but will prove very useful if you have to learn about this stuff or experiment with different configurations.
This is probably totally unsupported, use at your own risk – but it does work well enough to play about with in the lab.