My ramblings on the stuff that holds it all together
Category Archives: VMotion
VMware have an interesting proof of concept document posted online here, this is great progress for the platform and it can only be helped out by the close partnership with Cisco that has resulted in the NX1000V switch.
I’m no networking expert but to my understanding there are issues with extending Layer 2 networks across multiple physical locations that need to be resolved for this to be a safe configuration. to my limited understanding traditional technologies like spanning tree can present some challenges for inter-DC flat VLANs so they need to be designed carefully, maybe using MPLS as a more suitable inter-DC protocol.
The interesting part for me is that this will be the nirvana for VMware’s vCloud programme, where services can be migrated on/off-premise to/from 3rd party providers as required and without downtime. this is do-able now with some downtime via some careful planning and some tools but this proposition extends the vMotion zero downtime migration to vCloud.
As this technology and relevant VM/storage best-practice filters out of VMware and into service providers and customers this could become a supportable service offering for vCloud Service Providers.
To achieve this you still need storage access from both sites, to me the next logical step is to combine vMotion and FT technologies with some kind of host based replication or storage virtualization like the Datacore products. this will remove the dependency (and thus potential SPOF) on a single storage device for vMotion/FT.
Virtualizing/replicating the actual VM storage between different arrays and storage types (EMC—>HP, or even DAS—>EMC) and allowing (encapsulating) it over standard IP links rather than relying on complicated and proprietary array based replication and dedicated fibre connectivity is going to be a key success factor for vCloud, it’s interesting to see all the recent work on formalising FCoE along with other WAN-capable standards like iSCSI.
Some further reading on how I see “the cloud” evolving at a more practical level here
Note to remember, don’t forget to check the duplex settings on NICs handling your vMotion traffic.
My updated clustered ESX test lab is progressing (more posts on that in the next week or so)… and I’m kind of limited in that I only have an old 24-port 100Mb Cisco hub for the networking at the moment.
vMotion warns about the switch speed as a possible issue.
I had my Service Console/ vMotion NICit forced to 100/full and when I 1st tried it vMotion took 2hrs to get to 10%, I changed it to auto-negotiate whilst the task was running and it completed without breaking the vMotion task ain a couple of seconds, dropped only 1 ping to the VM I moved.
Cool, it’s not production or doing a lot of workload but useful to know despite the warning it will work even if you’ve only got an old hub for your networking, and worth remembering that Duplex mis-matches can literally add hours and days onto network transfers.
it uses replication between 2 ESX hosts to allow you to configure DRS/HA etc.
Excellent, I’m going to procure another cheap ESX host in the next couple of weeks so will post back on my experiences with setting this up, my previous plan meant I’d have to get a 3rd box to run an iSCSI server like OpenFiler to enable this functionality, but I really like this approach.
Sidenote – Xtravirt also have some other useful downloads like Viso templates and an ESX deployment appliance available here
Following on from my earlier post I upgraded my installation to the new build of 6.5. it un-installed the old build and re-installed the latest without a problem, took about 30mins and required a reboot of the host OS.
All my previously suspended XP/2003 VM’s resumed ok without a restart but needed an upgrade to the VMTools which did require a restart of the guest OS – all completed with no problems.
Now, onto installing ESX….
I used the settings from Eric’s post here to edit my .vmx file
ethernet0.virtualDev = “e1000”
monitor.virtual_exec = “hardware”
monitor_control.restrict_backdoor = “true”
Note – you need to select an x64 Linux version from the VM type drop down, if you have to go back and change it via the GUI after you’ve edited the .vmx file it overwrites the Ethernet card “e1000” setting to “vlance” so you need to edit again otherwise the ESX installer won’t find a compatible NIC and won’t install.
it was initially very slow to boot; 5mins on my dual core laptop with only one error – which was expected..
To improve the performance I changed my installation to run the non-debug version of the Workstation binaries (rename the vmware-vmx.exe to vmware-vmx-debug.exe)
note: this isn’t recommended unless you know what you are doing, VMWare will rely on the output from the debug version of the code if you need to report any issues)
It also seems to work for the installable version of ESX 3i… (although I’ve not quite figured out the point of that version yet :)).
it did fail with an error the 1st time round..
this was because I had specified an IDE disk as per the ESX instructions, I changed it to a SCSI one and it worked ok.
The ESX 3i install has a footprint of about 200Mb on disk, and ESX 3.5 uses 1.5Gb.
I’m going to keep the 3.5 install on my laptop and will try to use linked clones to maintain a couple of different versions/configs to save disk space.. I’m sure I could knock up a quick script to change the hostname/IP of each clone – if I do I’ll post it here.
Why would you want to do this? well because you can, of course 🙂 and its handy for testing patch updates and scripts for ESX management etc.
I will also try to get a ESX DRS cluster running under workstation with a couple of ESX hosts and shared storage over iSCSI using something like OpenFiler as shown here. won’t exactly be production performance, but useful for testing and demo’ing.
You’ve been able to buy solid state SAN technology like the Tera-RAMSAN from TMS which gives you up to 1Tb of storage, presented over 4Gb/s fibre channel or Infiniband @10Gb/s… with the cost of flash storage dropping its going to soon fall in to the realms of affordability (from memory a year ago 1Tb SSD SAN was about £250k, so would assume that’s maybe £150k now – would be happy to see current pricing if anyone has it though).
If you were able to combine this with a set of ESX hosts dual-connected to the RAMSAN and traditional equipment (like an HP EVA or EMC Clariion) over a FC or iSCSI fabric then you could possibly leverage the new Storage vMotion features that are included in ESX 3.5 to achieve a 2nd level of performance and load levelling for a VM farm.
It’s pretty common knowledge that you can use vMotion and the DRS features to effectively load level or average VM CPU and memory load across a number of VMWare nodes within a cluster.
Using the infrastructure discussed above could add a second tier of load balancing without downtime to a DRS cluster. If a VM needs more disk throughput or is suffering from latency then you could move them to/from the more expensive solid-state storage tiers to FC-SCSI or even FATA disks, this ensures you are making the best use of fast, expensive storage vs. cheap, slow commodity storage.
Even if Virtual Center doesn’t have a native API for exposing this type of functionality or criteria for the DRS configuration you could leverage the plug-in or scripting architecture to use a manager of managers (or here) to map this across an enterprise and across multiple hypervisors (Sun, Xen, Hyper V)
I also see EMC integrating flash storage into the array itself, would be even better if you could transparently migrate LUNS to/from different arrays and disk storage without having to touch ESX at all.
Note: This is just a theory I’ve not actually tried this – but am hoping to get some eval kit and do a proof on concept…
As the Hoff posts here and on VMTN here. the proposed vulnerability that you can manipulate and possibly compromise a VM during a VMotion process isn’t exactly major, it’s clever.. but – like anything if you don’t follow the best-practice recommendations then you expose yourself to these risks… same reason they recommend you lock your server room or don’t have blank passwords – this attack is akin to gaining physical access to the hardware or being able to sniff a physical switch port – in this instance, it’s “virtual” hardware.
VMWare have always recommended keeping the VMotion traffic on a separate VLAN or network.
the other vulnerability where VMTools can be compromised on is different, but again preventable.. and not enabled on server instances of VMWare.