My ramblings on the stuff that holds it all together
Category Archives: VMWare
Following on from my last post on problems entering maintenance mode with FT-enabled VMs, I seem to have found another one – if you have the rather excellent AppSpeed product deployed on an ESX cluster and you want to put a host into maintenance mode it gets stuck at 2% as it can’t move the AppSpeed probe VM onto an alternative host
If you try to manually vMotion the problematic probe off to another host in the cluster you get the following error
If you shutdown or suspend the AppSpeed probe VM then the switch to maintenance mode continues as expected.
This would make sense as it plugs directly into a dedicated vSwitch on that host to monitor network traffic so vMotioning it off wouldn’t be of any use – assuming the other nodes in the cluster are also running AppSpeed probes.
However it would be great if there was a more automated way to handle this? guess it’s tricky as on one hand its great that AppSpeed doesn’t rely on any ESX-host agents and is essentially self-contained with probes running as VM appliances but on the other hand the probe doesn’t know the guest is being put into maintenance mode so should be shut down/suspended rather than vMotioned to an alternative host.
There is integration with the vCenter server via a plug-in so maybe in future versions that could trap a maintenance mode event and initiate (or suggest) shutting down the AppSpeed probes.
I have a 2 node vSphere cluster running on a pair of ML115g5 servers (cheap ESX nodes, FT compatible) and I was trying to put one into maintenance mode so I could update its host profile, however it got stuck at 2% entering maintenance mode, it appeared to vMotion off the VMs it was running as expected but never passed the 2% mark.
After some investigation I noticed there were a pair of virtual machines still running on this host with FT enabled – the secondary was running on the other server ML115-1 (i.e not the one I wanted to switch to maintenance mode)
I was unable to use vMotion so that the primary and secondary VMs were temporarily running on the same ESX host (and that doesn’t make much sense anyway)
That makes sense, the client doesn’t let you deliberately do something to that host that would break the FT protection as there would be no node to run the secondary copy. incidentally this is good UI design – you have to opt-in to break something – so you just have to temporarily disable FT and should be able to proceed.
If I had a 3rd node in this cluster there wouldn’t be a problem as it would vMotion the secondary (or primary) to an alternative node automatically (shown below is how to do this manually)
However in my case all of the options to disable/turn-off FT were greyed out and you would appear to be stuck and unable to progress.
the fix is pretty simple and you just need to cancel the maintenance mode job by right-clicking in the recent tasks pane and choosing cancel, which then re-enables the menu options and allows you to proceed. Then turn-off (not disable – that doesn’t work) fault tolerance for the problematic virtual machines
The virtual machine now doesn’t have FT turned on, if you just disable FT it doesn’t resolve this problem as it leaves the secondary VM in-situ, you need to turn it off.
So, moral of the story is – if you’re stuck at 2% look for virtual machines that can’t be vMotioned off the host – if you want to use FT – a 3rd node would be a good idea to keep the VM FT’d during individual host maintenance; this is a lab environment rather than an enterprise grade production system but you could envision some 2-node clusters for some SMB users – worth bearing in mind if you work in that space.
I am setting a trial of the new vCenter Chargeback product on my lab environment, and have followed the instructions to configure the SQL database (new DB and new account with database owner permissions) however when I try to configure the Windows application I get errors from the jdbc component as follows;
“The user is not associated with a trusted SQL Server connection”
If I try with the appliance version of the application it ignores the slash in the DOMAIN\USER syntax for the database permissions and puts in DOMAINUSER, which obviously doesn’t work.
for now I have configured it using SQL authentication and that works ok isn’t ideal from a management point of view, would be good to understand why this is, as the appliance issue looks like a bug to me.
VMware have an interesting proof of concept document posted online here, this is great progress for the platform and it can only be helped out by the close partnership with Cisco that has resulted in the NX1000V switch.
I’m no networking expert but to my understanding there are issues with extending Layer 2 networks across multiple physical locations that need to be resolved for this to be a safe configuration. to my limited understanding traditional technologies like spanning tree can present some challenges for inter-DC flat VLANs so they need to be designed carefully, maybe using MPLS as a more suitable inter-DC protocol.
The interesting part for me is that this will be the nirvana for VMware’s vCloud programme, where services can be migrated on/off-premise to/from 3rd party providers as required and without downtime. this is do-able now with some downtime via some careful planning and some tools but this proposition extends the vMotion zero downtime migration to vCloud.
As this technology and relevant VM/storage best-practice filters out of VMware and into service providers and customers this could become a supportable service offering for vCloud Service Providers.
To achieve this you still need storage access from both sites, to me the next logical step is to combine vMotion and FT technologies with some kind of host based replication or storage virtualization like the Datacore products. this will remove the dependency (and thus potential SPOF) on a single storage device for vMotion/FT.
Virtualizing/replicating the actual VM storage between different arrays and storage types (EMC—>HP, or even DAS—>EMC) and allowing (encapsulating) it over standard IP links rather than relying on complicated and proprietary array based replication and dedicated fibre connectivity is going to be a key success factor for vCloud, it’s interesting to see all the recent work on formalising FCoE along with other WAN-capable standards like iSCSI.
Some further reading on how I see “the cloud” evolving at a more practical level here
In the lab I am currently working with I have a set of vSphere 4 ESXi installations running as a virtual machine and configured in an HA cluster – this is a great setup for testing VM patches, and general ops procedures or learning about VMware HA/DRS/FT etc. (this lab is running on a pair of ML115 g5 servers but would work equally on just one
Everything installed ok and I can ping the virtual ESX servers from the vCenter host that manages the cluster (the warning triangle is that there is no management network redundancy – I can live with that in this lab.
All ESX hosts (physical and virtual) are connected via iSCSI to a machine running OpenFiler and the storage networking works ok, however when I configure the vMotion & FT private networks between the VM ESX hosts I cannot ping the vMotion/FT IP addresses using vmkping – indicating that there were some communication problems, normally this would be a VLAN issue or some routing but in this instance all the NICs and IP addresses for my lab reside on a flat 10.0.0.0/8 network (it’s not production, just a lab).
After some digging I came across this post for running ESX full as a VM, and noted the section on setting the vSwitch to promiscuous mode so I tried that with the vSwitch on the physical ESX host that the two ESXi VMs were running on;
And now the two Virtual ESXi nodes can communicate via vmkping
Problem solved and I can now vMotion nested VMs between each virtual ESX host – very clever!
If you have a home lab setup or want to get going with learning VMware’s new vSphere product you will need an x64 capable machine to run it on, although it does also run under VMware Workstation too – even supporting nested VMs and physical ESX to virtual ESX vMotion! unfortunately it won’t run on my trusty old HP D530 desktops which I’ve used to run ESX 3.5 over the last year or so.
My lab setup uses a couple of HP ML110 servers, they are low-cost and pretty capable boxes, for example they both have 8Gb of RAM and cost me less than £350 GBP each with RAM and disks (although I’ve added storage from my spares pile).
Linkage to Servers Plus £199 +VAT servers here (www.serversplus.com) if you tell them vinf.net or techhead.co.uk sent you they may cut you a deal on delivery as they have done in the past (no promises as I’ve not had a chance to speak to them).
A note of caution if you are looking to try out the cool FT features of vSphere you will need to purchase specific CPUs, which may be more expensive – there is a good list of compatible CPUs on Erics blog here and some more reading here
Check before you buy you can lookup the manufacturers part code to check with CPU each model has – or check with the supplier.
The CPUs I have in my dual-core Xeon ML110G5 is not compatible with FT 😦
but it does look like the AMD quad-cores may be compatible, but check 1st – don’t take my word for it I HAVE NOT TRIED IT but I would like to if someone wants to donate one 🙂
UPDATE: the ML110G5 with the AMD Quad Core CPU IS VMware FT compatible – see link here for more details; I am ordering one now!
If you are interested – here are some performance charts from my home lab running vSphere RC on an HP ML110 with 8Gb RAM and 2 x 160Gb SATA HDD’s whilst doing various load tests of Exchange 2007 and Windows 2008 with up to 500 concurrent heavy profile users (these stats are not particularly scientific but give you an idea of what these boxes can do, I’ve been more than happy with mine and I would recommend you get some for your lab)
These are some general screengrabs, note there are lots of warnings showing – this is what happens when you thin-provision all your VM’s and then one fills up rapidly making the VMFS volume itself run out of space – you have been warned!
I’m running 15 VMs on one ML110, the 2nd box only has 1 VM on it as I wanted to see how far I could push one box, I’ve not found a real limit yet! it runs a mix of Windows 2003/2008 virtual machines, and it doesn’t generally break a sweat – note the provisioned vs. used space columns – Thin Provisioning 🙂 and I’m also over-subscribing the RAM significantly.
Getting ESX (in it’s various versions) to run under VMware Workstation has proven to be a very popular article on this blog, if you are a consultant who has to do product demos of VI3/vSphere or are studying for your VCP it’s a very useful thing to be able to do on your own laptop rather than rely on remote connections or lugging around demo kit.
Good news; the RC build of vSphere will boot under the latest VMware Workstation build (6.5.2) without any of the .vmx hackery you had to do in previous versions and it seems quite fast to boot.
Bad news: the RC build of vSphere needs at least 2GB of RAM to boot, this is a problem for a laptop with 4GB of RAM as it means you can only really run one at a time.
Luckily: Duncan Epping (or VCDX 007; licenced to design :)) has discovered how you can hack the startup script to allow it to run in less than 2GB of RAM – details here, this isn’t officially supported – but it does work.
In the interests of science I did some experimentation with VM’s with various amounts of decreasing RAM to see what the bare minimum RAM you can get away with for a VM’d version of vSphere RC.
The magic number seems to be 768Mb of RAM, if you allocate less than this to the VM then it results in a Purple Screen of Death (PSOD) at boot time.
Note – this may change for the GA/RTM final version – but these are my findings for RC
The relevant section of my /etc/vmware/init/init.d/00.vmnix file looks like the following (note it won’t actually boot with 512mb assigned to the VM)
Some screen captures of the vSphere RC boot process below
And finally the boot screen once it’s finished – it takes 2-3 mins with 768Mb of RAM on my laptop to get to this boot screen.
I am doing this on a Dell D620 with 4Gb RAM and Intel VT enabled in the BIOS, running Vista x86 and VMware Workstation v6.5.2 build 156735
I haven’t tried, but I assume I can’t power on VM’s under this instance of vSphere but I can connect them to a vCenter 4 machine and practice with all the management and configuration tools.
Thanks to this post from Eric and following on from my last post on the subject of network diagrams here is a list of places to go and download good quality official Visio stencils for doing VMware related diagramming.
PPT objects http://viops.vmware.com/home/docs/DOC-1338
Visio objects http://viops.vmware.com/home/docs/DOC-1346
Some examples of the objects they contain are below:
I particularly like the “build your own” – which is a quick way of doing stack/consolidation diagrams in a uniform way.
it’s a shame that you can’t ungroup these sort of shapes and split the components out or edit the text though
This was a very interesting session, it wasn’t on my printed programme so I assume it was re-organised from somewhere else but as I was in the area I went to it rather than my planned session. The presentation was given by VMware’s CIO and was set to cover how VMware use virtualization internally, to deliver normal business services to it’s internal users.
after some scene setting of what VMware technology can do for consolidation and workload management (would think most people attending VMworld on the 2nd day would know this already.. but) he moved into describing what VMware use internally and how they have been through the same evolution as their customers, their stated goal is to move to cloud services to make their own operations more efficient and flexible (well they would, wouldn’t they :)).
VMware have expanded very rapidly over the last 10 years taking on lots of staff and opening up global offices, data centres and labs; he confessed that often solutions had been put in place in haste and this had led to a growing pain in management and stability.
VMWare run a large ERP system based around Oracle ERP and RAC to run their core business systems as well as a Microsoft Exchange based messaging.
VMware are starting to make heavy use of the VDI scenario with 550 users at the moment, they don’t silo VDI to particular job roles and users are a mix of engineers, sales and administrative staff.
The standard client is a Wyse thin-client with a 24” LCD monitor and this is the CEO’s primary machine 🙂 on the back end the standard hardware configuration is he HP c7000 blade chassis with Cisco 3020 blade switches uplinked into Cisco 3750 L3 switches, storage is provided by an EMC CX3-80.
VMware say this VDI configuration saves them c.$900USD over a typical notebook setup per user.
he confessed that VMware haven’t virtualized 100% of their internal IT (yet) 2 application services still remain on physical servers;
- VMware Capacity Planner – which is due to be virtualized in Q1’09
- Oracle RAC – which is due to be virtualized in Q2’09
I thought it was quite ironic that capacity planner still lives on a physical box but is responsible for the demise of so many physical servers on customer sites 🙂
There were some interesting diagrams on the Blade layout for the Oracle RAC and Exchange systems; which I will try to download and post if allowed, but in the meantime it runs on 2 x HP c7000 blade chassis with 4Gb RAM allocated per CPU core.
VMware also had a physical Exchange 2003 server until last year; as part of a migration to Exchange 2007 they implemented 14 virtualized Exchange 2007 servers; 11 mailbox nodes – mostly in a CCR configuration and the remainder in HT and CAS roles; split across 4 ESX blades (pretty sure he said 4).
Typical mailbox sizes are under 2Gb but it was refreshing to hear that like anywhere else they had some challenging mailbox sizes, particularly the execs 🙂
One government customer has over 750k mailboxes running under virtualized instances of Exchange, which is good to hear.
As part of a general consolidation programme VMware introduced some standardised hardware configurations; HP c7000 series blades and a mix of EMC CX3-80 and EMC DMX4 SAN’s (the latter being for the more demanding enterprise applications like Oracle ERP) – I would expect VMware IT get the luxury of a very healthy staff discount from EMC when designing such solutions:)
They consolidated down from 6 main data centres to 2, a tier 4 primary and a tier 2 DR site; they are implementing SRM for DC failover and relocation.
They have just opened a new 1500 rack “green” data centre in WA (wherever that is..) to host their vast R&D facility which as they need to test builds against lots of vendors kit as well as two internal cloud facilities.
The new DC takes full advantage of hot/cold aisle and passive air cooling and recycled building materials. in-fact due to the climate and cooling tech they only need to run the chillers for 3 months out of the year which vastly reduces cooling and thus power costs. In addition, power comes from a pair of redundant (yes, really!) hydro-electric power stations, I believe he said they were paying c.$0.02 KW/Hr for this “green” energy and are working on certifying it for LEED Platinum – which I assume is some EPA type programme.
In terms of supporting this environment they have achieved a level of 145 virtual machines per system administrator; which is pretty high; in general terms that have realised an overall 10:1 server consolidation ratio and have (honestly) experienced only 1 server crash in the last year which was attributed to VMware ESX.
Nothing like eating your own dogfood I guess, interesting to hear that VMware have been through the same challenges as most other businesses in terms of growth and consolidation – it certainly adds some credibility to their message over and above what they have done with customers. it would have been interesting to have some coverage of how their development and lab and R&D systems work but I guess that could be considered more sensitive in such a competitive market.
Well day 2 got underway with the much anticipated keynote session from Steve Herrod who is CTO and VP of R&D or “technical stuff”.
He covered some of the previous announcements and did manage to clarify that vSphere is the implementation of VDC-OS (so it’s the new name for Virtual Infrastructure).
Steve Herrod let on that he was watching twitter during the other keynotes and adjusted his presentation accordingly 🙂
There were some examples of Oracle OLTP application scaling that have been done in vSphere;
- <15% overhead on 8 way vCPU VM
- 24k DB transactions/sec
Some example stats of disk I/O were shown that acheiving 250MB/sec of disk I/O took 510 disk spindles to saturate I/O… the point being that you’ll need a very large amount of hardware before you start running into disk/VM bus performance issues, and this is constantly increasing.
Virtualizing Exchange is another area where VM’ing can take advantage of multi-core processors for large enterprise apps; break into multiple virtualized mailbox servers to make best use of multi-core hardware; Exchange doesn’t really use the CPU horsepower of modern kit – it’s more about disk I/O (and as they showed this isn’t a practical blocker).
Steve ran over the components of vSphere again, adding a bit more detail – I won’t cover them again but they are
vStorage – extensible via API, storage vendors write their own thin provisioning or snapshot interfaces that hook into VMware.
vNetwork – Distributed vSwitch maintains network state in vMotion
vSphere = scale, 64TB RAM in cluster
Power thrifty (CPU power management features)
vShield zones follows vm around DRS – DMZ for groups of VMs (demos tomorrow + breakout)
vCenter HA improvements with VC heartbeat, today 60% of people running VC on physical box to isolate management tools from the execution platform, this delivers high availability for them.
vCenter Server heartbeat which provide an Active/passive cluster solution (but not using MSCS) and configuration change replication/rollback; works over WAN or LAN – IP based with floating IP address, efficient WAN transfers.
Monitors/provides HA for the following components;
- vCenter database
- Licencing server
- Upgrade manager
vCenter Scalability; 50% increase in capacity with 3k vms and 300 hosts per vCenter, in addition the VI client can now aggregate up to 10 vCenter servers in a single UI, with search functionality, can report/search.
vCenter host profiles can enforce and replicate configuration changes across multiple hosts and monitor for deviations (profile compliance)– the UI looks much like update manager.
The VI client performance looks much better in the demo 🙂 let’s hope it’s like that in real-life!
Biggest and most useful announcement for me was that vCenter on Linux is now available and shipping as a bet virtual appliance – just download and go – no more dependency on a Windows host to run VC, I will definitely be trying this out and you can download it yourself here.
In terms of vCloud, the federation and long-distance vMotion sound a bit like science fiction – but there was the same opinion of vMotion when it was first announced – look at it now, VMware know how to do this stuff 🙂
Long-distance vMotion is the eventual goal but there are some challenges to overcome in engineering a reliable solution, but in the meantime SRM can deliver a similar sort of overall service, automating DR failover with array based replication and an electronic, scripted run-book.
long-distance vMotion has some other interesting usecases, enabling a follow the sun model for support and IT services – I’ve written about this previously here – this is a great goal and I would expand this suggestion to include follow the power, where you choose to move services around globally to take advantage of the most cost-efficient power, local support etc.
VMWare building an extensible and customisable portal for cloud providers based on Lab Manager which is likley to be bundled as a product.
The vCenter vCloud plug-in was demoed, this was more advanced that I had anticipated, with the target scenario being you can use one VI client to manage services across multiple clouds.
It stores auth details for each (cloud accounts) type (vCloud, drop down) works over web services API to provision/change etc
They showed how you can drag and drop a VM to and from the cloud.
this federation allows you to pick different types of cloud, for example providers that offer a Desktop as a Service (DaaS) type cloud, or one that runs entirely on “green” energy sources.
this is another key initiative and focus of investment within VMware, building up the VDI offering(s) and providing centralised desktops as well as offline/distributed scenarios in future via the Client Virtualization Platform (CVP) – some of my more off the wall thoughts on that here
- Central management
- Online/offline scenarios
- Linked clone
- Thick client push full VM down to machine
- Patching is challenge – master disk + linked clones
- Thin-app; makes patching/swapping out underlying OS easier as apps are in a “bubble”.
- Leveraging ACE server; lock USB etc.
- CVP – client checks back to central policy server (polling)
- allows for self-destruct or leased virtual desktop, can’t run away with apps/data
VMware are making heavy investment in PCoIP- providing 3d graphics online offline for high-demand apps (video/graphics) Jerry Chan demoed some of the PCoIP solutions they are working to using Google Earth, whilst impressive – Brian Madden has covered these in more detail here but I did notice that Steve said vClient which is the 1st time I have heard that name.
Finally, there was some coverage of the mobile phone VM platform, which whilst I see what they are aiming for and the advantages of it to a Telco (single platform to test apps against), it’s personally of less interest to me. I do hope that VMware don’t go all Microsoft and start spreading themselves into every market just because they can need to have a presence (live search, live everything etc), rather than focusing on good, core products. Whilst they are the 1st people I’ve heard of seriously working on this I don’t know how it will pan out – but will keep an open mind, I suppose a sandboxed, secured corporate phone build with a VoIP app, some heavy crypto and a 3G connection controlled under a hypervisor could be appealing to certain types of govt. “organisations”.
All in, a very good keynote session – much better focused at the main demographic of the conference (techies, well me anyway :)) and there are some good sessions scheduled for today.