Virtualization, Cloud, Infrastructure and all that stuff in-between
My ramblings on the stuff that holds it all together
Category Archives: VMworld Europe 2009
Hands-On Lab 12: Cisco Nexus 1000v Switch
This lab was very popular but I got there early this morning so didn’t have to wait – it takes you through configuring the new Cisco Nexus Virtual Switch, I was keen to understand how this works and how it integrates with vSphere.
It works as thus;
vSphere ESX is likely to ship with 3 types of virtual switch;
- vSwitch – the normal vSwitch that has always been there
- Virtual Distributed Switch – an enhanced vSwitch that can share a configuration and state across multiple ESX hosts; administered via the vSphere client (ex- VIC)
- NX1000v Virtual Switch from Cisco
The NX1000v will be included in the ESX build but is a separately licenced product which you will buy from VMware (via some kind of OEM agreement); you will enable it via a licence key and there are two components;
- VEM – Virtual Ethernet Module – runs inside the hypervisor but you don’t see it as a ‘normal’ VM – think of it in the same way as the service console is an ‘internal’ VM in ESX 3.5
- VSM – Virtual Supervisor Module – this is what you use to administer the VEM, and it’s IOS – you can use all normal Cisco IOS commands and it’s downloadable as an .OVF but it has been mentioned that it will also be available as a physical device – maybe a blade in one of the the bigger Nexus chassis?
You can only carry out basic configuration via the vSphere Client, most of it is done via the IOS CLI or your Cisco compatible configuration manager tools – it really is the same as a physical Cisco switch – just virtualized my lab had some problems which whilst the Cisco hands-on lab guys were trying to fix, port-group config was set on the vSwitch but wasn’t propagating to the vSphere UI/ESX config… they couldn’t fix it in time and I restarted the lab on an alternative machine which worked fine; this is still a pre-release implementation so it’s not suprising – but it does suggest that there is some back-end process on the VEM/VSM that synchronises configuration with ESX.
the HoL walks through configuring a port group on the NX1000v and then applying advanced ACL’s to it, for example to filter RPC traffic. the UI gives quite a lot of information about port status and traffic – but most of the interface is via the IOS CLI.
All in, an interesting lab – as good as the presentation sessions were, it makes it much easier to understand *how* these things work at a practical level when you get your hands on the UI.
The basic proposition is; if you don’t have “network people” or just need basic switch capabilities then the vSwitch and vDistributed switch suit the understanding and needs of most “server people” just fine, but if you need more advanced management and configuration tools or need to have “network people” support the ESX switching infrastructure then this is the way to go.
Hands-On Lab 01 – vSphere Features Overview
I decided to venture into some hands-on labs today, after hearing about all the new features over the last couple of days it was nice to finally get my hands on them!
The lab was set to cover the following areas of potential new functionality* in vSphere;
vStorage plug-in – pluggable drivers from storage vendors to enable enhanced snapshot functionality or improved multi-pathing with their arrays.
Hot-cloning of a running VM – handy.
Host profiles and compliance management – this was quite a nice feature you define a host profile, or copy one from an existing host – it was a bit reminiscent of the Windows Group Policy Management Console in some ways – you can link profiles to individual ESX hosts or to a cluster/DC object.
Storage vMotion via the GUI – functionality has been there since v3.5 but now has no reliance on a 3rd party GUI plug-in or command line.
Online VMFS expansion – handy, so if you can extend a LUN from your array you can grow the VMFS into it online without downtime, up until now the only alternative was downtime or storage vMotion to a brand-new LUN, or to use extents which are not as safe.
Creating a vApp – this feature is similar to VM teaming in VMware Workstation but with the first of many functional additions.
- The main target scenario for vApps are multi-teir applications where you may have a database back-end and a front-end web server. you can define start-up and shutdown order.
- There are vApp networking settings where you appear to be able to define IP address allocations, private DHCP pools etc.
- It has an interface which is the same as the normal resource pool UI, so you can define reservations for a vApp (or collection of VM’s so you can provide a consistent service level.
- There wasn’t much else in there yet – but VMware have said they will be adding more features in later releases.
Configuring the distributed virtual switch (vDS)– this was an interesting lab, based around the built-in vDS which comes free with ESX, you can define port groups and uplink groups which are automatically propagated around all members of the vDS.
You have to assign the vDS to particular hosts, I’m not sure if you can attach it at a cluster or DC level – I have a separate post on the vDS and the Cisco NX1000V in the pipeline, for now know that you have 3 switch options
-
vSwitch (same as previous ESX versions)
-
Virtual Distributed Switch – distributed across multiple hosts (maybe only included in Higher editions of ESX?)
-
and the Cisco NX1000V – which is a separately licenced add-on.
You can migrate normal vSwitch configurations into the vDS via the UI and it’s pretty simple to use.
Configuring VMware Fault Tolerance (FT) – this was a great lab and a great new feature you just right-click on a VM and enable FT, it then automatically hot-clones a copy of the VM and keeps it in lockstep, where all of the CPU instructions executed on 1 VM and shipped across the network to the secondary copy which shows up as VM_NAME (Secondary) in the UI.
Once FT is enabled the summary screen shows you details of any lag between the protected VM and it’s secondary instance.
The lab gets you to kill the primary and the failover was instant as far as I could tell with the very simple Debian OS we were protecting, it then automatically re-clones the secondary copy to re-establish FT, very cool. I’m looking forward to getting my hands on a real copy and putting it though it’s paces.
Overall the vSphere client (as it’s now renamed* in this lab at least) feels much quicker and responsive than previous versions.
Interestingly the back-end ESX lab environment is implemented as ESXi4 instances running as a virtual machines, which is a brilliant way to do test and development work with ESX (some of my previous posts on this here). It has been hinted that this will be officially supported, we had to switch to a physical ESX farm to do the FT lab as it has specific hardware and CPU requirements, for which they were using HP DL385 servers and the back-end storage was EMC.
*There were plenty of disclaimers over any product names being placeholders, so whilst I mention ESXi4 that does not constitute any kind of legal confirmation from VMware as to what was or will be called. It does hint that the ESXi and ESX with service-console model could continue through the next major release – I did hear one VMware chap refer to “ESX classic” which I would assume is the with service console version 🙂
VMWorld Europe Day 2: Wrap-up, a good day despite the curious lack of forks..
All in a very good and busy day today – excellent keynote and some very interesting sessions; so-far I’ve only managed to write up a couple of them (links below) once I’ve clarified a few points I’ll write up the remainder.
- How VMware IT use VMware internally
- vExpert Award for vinf.net
- Cisco Nexus Switch Answers – vShield too?
- Day 2 Keynote
I discovered the press room today and obtained access via my bloggers pass, it was very handy to take an hour out to write up some of the earlier blog posts in somewhat breathless English,the “virtual firehose” phrase has never been so true there is simply way more to take in that I could ever hope to digest and write up in detail.
Lack of an official vSphere/ESX4 release date at VMworld has been a bit disappointing and I guess VMware will be adopting a “when it’s ready” policy. This is admirable, but surely isn’t helping them in maintaining market share – IT investment in infrastructure, training, projects is all about budget planning and dates and also helps out Microsoft with their looming Windows Server 2008 R2 release; vSphere will move the game on further but Microsoft will continue to gain traction and the longer VMware leave it the more they fuel.
The VMWorld Europe party “Cloud9” was this evening and it was a grand affair – much better than any of the TechEd parties I’ve been to in recent years – VMware do tend to go all-out in making the events great (SF 2007 was amazing) Microsoft always seem to split it by country – whereas VMware group everyone together which makes for a much better event.
There was a a live band, two girls playing electric violins, lots of classic arcade games and lots of drink and food – but curiously, a distinct lack of forks or other such cutlery. They were later discovered hidden away at the far end of the room!
I sincerely hope we won’t have to wait until the next VMworld in September to have a general release date of vSphere, maybe VMware are going all Apple on us 😦
How VMware IT use VMware Internally
This was a very interesting session, it wasn’t on my printed programme so I assume it was re-organised from somewhere else but as I was in the area I went to it rather than my planned session. The presentation was given by VMware’s CIO and was set to cover how VMware use virtualization internally, to deliver normal business services to it’s internal users.
after some scene setting of what VMware technology can do for consolidation and workload management (would think most people attending VMworld on the 2nd day would know this already.. but) he moved into describing what VMware use internally and how they have been through the same evolution as their customers, their stated goal is to move to cloud services to make their own operations more efficient and flexible (well they would, wouldn’t they :)).
VMware have expanded very rapidly over the last 10 years taking on lots of staff and opening up global offices, data centres and labs; he confessed that often solutions had been put in place in haste and this had led to a growing pain in management and stability.
VMWare run a large ERP system based around Oracle ERP and RAC to run their core business systems as well as a Microsoft Exchange based messaging.
VMware are starting to make heavy use of the VDI scenario with 550 users at the moment, they don’t silo VDI to particular job roles and users are a mix of engineers, sales and administrative staff.
The standard client is a Wyse thin-client with a 24” LCD monitor and this is the CEO’s primary machine 🙂 on the back end the standard hardware configuration is he HP c7000 blade chassis with Cisco 3020 blade switches uplinked into Cisco 3750 L3 switches, storage is provided by an EMC CX3-80.
VMware say this VDI configuration saves them c.$900USD over a typical notebook setup per user.
he confessed that VMware haven’t virtualized 100% of their internal IT (yet) 2 application services still remain on physical servers;
- VMware Capacity Planner – which is due to be virtualized in Q1’09
- Oracle RAC – which is due to be virtualized in Q2’09
I thought it was quite ironic that capacity planner still lives on a physical box but is responsible for the demise of so many physical servers on customer sites 🙂
There were some interesting diagrams on the Blade layout for the Oracle RAC and Exchange systems; which I will try to download and post if allowed, but in the meantime it runs on 2 x HP c7000 blade chassis with 4Gb RAM allocated per CPU core.
VMware also had a physical Exchange 2003 server until last year; as part of a migration to Exchange 2007 they implemented 14 virtualized Exchange 2007 servers; 11 mailbox nodes – mostly in a CCR configuration and the remainder in HT and CAS roles; split across 4 ESX blades (pretty sure he said 4).
Typical mailbox sizes are under 2Gb but it was refreshing to hear that like anywhere else they had some challenging mailbox sizes, particularly the execs 🙂
One government customer has over 750k mailboxes running under virtualized instances of Exchange, which is good to hear.
As part of a general consolidation programme VMware introduced some standardised hardware configurations; HP c7000 series blades and a mix of EMC CX3-80 and EMC DMX4 SAN’s (the latter being for the more demanding enterprise applications like Oracle ERP) – I would expect VMware IT get the luxury of a very healthy staff discount from EMC when designing such solutions:)
They consolidated down from 6 main data centres to 2, a tier 4 primary and a tier 2 DR site; they are implementing SRM for DC failover and relocation.
They have just opened a new 1500 rack “green” data centre in WA (wherever that is..) to host their vast R&D facility which as they need to test builds against lots of vendors kit as well as two internal cloud facilities.
The new DC takes full advantage of hot/cold aisle and passive air cooling and recycled building materials. in-fact due to the climate and cooling tech they only need to run the chillers for 3 months out of the year which vastly reduces cooling and thus power costs. In addition, power comes from a pair of redundant (yes, really!) hydro-electric power stations, I believe he said they were paying c.$0.02 KW/Hr for this “green” energy and are working on certifying it for LEED Platinum – which I assume is some EPA type programme.
In terms of supporting this environment they have achieved a level of 145 virtual machines per system administrator; which is pretty high; in general terms that have realised an overall 10:1 server consolidation ratio and have (honestly) experienced only 1 server crash in the last year which was attributed to VMware ESX.
Nothing like eating your own dogfood I guess, interesting to hear that VMware have been through the same challenges as most other businesses in terms of growth and consolidation – it certainly adds some credibility to their message over and above what they have done with customers. it would have been interesting to have some coverage of how their development and lab and R&D systems work but I guess that could be considered more sensitive in such a competitive market.
Answers on the Cisco Nexus vSwitch – what is it and is vShield the same?
Just seen this post and was particularly interested in how the Cisco vSwitch works – it is shipped as part of ESX, and enabled/unlocked by a licence key, you need to download an OVF virtual appliance to manage it.
That answers one of the big things I’ve been meaning to find out whilst I’m here; I also attended a session on vShield zones and came away with a mixed bag of thoughts – is it a baked-in part of the next version of ESX or is it run in a virtual machine? – I have resolved to head for the hands-on Labs to try it out for myself; hopefully I will get time.
VMworld Europe Day 2: Keynote
Well day 2 got underway with the much anticipated keynote session from Steve Herrod who is CTO and VP of R&D or “technical stuff”.
He covered some of the previous announcements and did manage to clarify that vSphere is the implementation of VDC-OS (so it’s the new name for Virtual Infrastructure).
Steve Herrod let on that he was watching twitter during the other keynotes and adjusted his presentation accordingly 🙂
vSphere
There were some examples of Oracle OLTP application scaling that have been done in vSphere;
- <15% overhead on 8 way vCPU VM
- 24k DB transactions/sec
Some example stats of disk I/O were shown that acheiving 250MB/sec of disk I/O took 510 disk spindles to saturate I/O… the point being that you’ll need a very large amount of hardware before you start running into disk/VM bus performance issues, and this is constantly increasing.
Virtualizing Exchange is another area where VM’ing can take advantage of multi-core processors for large enterprise apps; break into multiple virtualized mailbox servers to make best use of multi-core hardware; Exchange doesn’t really use the CPU horsepower of modern kit – it’s more about disk I/O (and as they showed this isn’t a practical blocker).
Steve ran over the components of vSphere again, adding a bit more detail – I won’t cover them again but they are
vStorage – extensible via API, storage vendors write their own thin provisioning or snapshot interfaces that hook into VMware.
vNetwork – Distributed vSwitch maintains network state in vMotion
vSphere = scale, 64TB RAM in cluster
Power thrifty (CPU power management features)
vShield zones follows vm around DRS – DMZ for groups of VMs (demos tomorrow + breakout)
vCenter HA improvements with VC heartbeat, today 60% of people running VC on physical box to isolate management tools from the execution platform, this delivers high availability for them.
vCenter Server heartbeat which provide an Active/passive cluster solution (but not using MSCS) and configuration change replication/rollback; works over WAN or LAN – IP based with floating IP address, efficient WAN transfers.
Monitors/provides HA for the following components;
- vCenter database
- Licencing server
- Upgrade manager
vCenter Scalability; 50% increase in capacity with 3k vms and 300 hosts per vCenter, in addition the VI client can now aggregate up to 10 vCenter servers in a single UI, with search functionality, can report/search.
vCenter host profiles can enforce and replicate configuration changes across multiple hosts and monitor for deviations (profile compliance)– the UI looks much like update manager.
The VI client performance looks much better in the demo 🙂 let’s hope it’s like that in real-life!
Biggest and most useful announcement for me was that vCenter on Linux is now available and shipping as a bet virtual appliance – just download and go – no more dependency on a Windows host to run VC, I will definitely be trying this out and you can download it yourself here.
vCloud
In terms of vCloud, the federation and long-distance vMotion sound a bit like science fiction – but there was the same opinion of vMotion when it was first announced – look at it now, VMware know how to do this stuff 🙂
Long-distance vMotion is the eventual goal but there are some challenges to overcome in engineering a reliable solution, but in the meantime SRM can deliver a similar sort of overall service, automating DR failover with array based replication and an electronic, scripted run-book.
long-distance vMotion has some other interesting usecases, enabling a follow the sun model for support and IT services – I’ve written about this previously here – this is a great goal and I would expand this suggestion to include follow the power, where you choose to move services around globally to take advantage of the most cost-efficient power, local support etc.
VMWare building an extensible and customisable portal for cloud providers based on Lab Manager which is likley to be bundled as a product.
The vCenter vCloud plug-in was demoed, this was more advanced that I had anticipated, with the target scenario being you can use one VI client to manage services across multiple clouds.
It stores auth details for each (cloud accounts) type (vCloud, drop down) works over web services API to provision/change etc
They showed how you can drag and drop a VM to and from the cloud.
this federation allows you to pick different types of cloud, for example providers that offer a Desktop as a Service (DaaS) type cloud, or one that runs entirely on “green” energy sources.
Virtual Desktop
this is another key initiative and focus of investment within VMware, building up the VDI offering(s) and providing centralised desktops as well as offline/distributed scenarios in future via the Client Virtualization Platform (CVP) – some of my more off the wall thoughts on that here
Key points;
- Central management
- Online/offline scenarios
- Linked clone
- Thick client push full VM down to machine
- Patching is challenge – master disk + linked clones
- Thin-app; makes patching/swapping out underlying OS easier as apps are in a “bubble”.
- Leveraging ACE server; lock USB etc.
- CVP – client checks back to central policy server (polling)
- allows for self-destruct or leased virtual desktop, can’t run away with apps/data
VMware are making heavy investment in PCoIP- providing 3d graphics online offline for high-demand apps (video/graphics) Jerry Chan demoed some of the PCoIP solutions they are working to using Google Earth, whilst impressive – Brian Madden has covered these in more detail here but I did notice that Steve said vClient which is the 1st time I have heard that name.
Finally, there was some coverage of the mobile phone VM platform, which whilst I see what they are aiming for and the advantages of it to a Telco (single platform to test apps against), it’s personally of less interest to me. I do hope that VMware don’t go all Microsoft and start spreading themselves into every market just because they can need to have a presence (live search, live everything etc), rather than focusing on good, core products. Whilst they are the 1st people I’ve heard of seriously working on this I don’t know how it will pan out – but will keep an open mind, I suppose a sandboxed, secured corporate phone build with a VoIP app, some heavy crypto and a 3G connection controlled under a hypervisor could be appealing to certain types of govt. “organisations”.
All in, a very good keynote session – much better focused at the main demographic of the conference (techies, well me anyway :)) and there are some good sessions scheduled for today.
More later.
VMware Client Hypervisor (CVP) – Grid Application Thoughts
Today VMware announced the client hypervisor they are producing and a collaboration with Intel on the hardware support (VT) and management (vPro), Citrix made a similar announcement last month (some analysis from the trusty Brian Madden here).
If the client side device is now running a hypervisor this would presumably extend the same encapsulation principles from datacentre/server virtualization to the desktop; where more than one OS instance could run on a client; for example a Linux and a Windows VM side by side, sharing data or isolated for security/compliance reasons – network traffic securely routed or encapsulated to keep it separate.
With most PC hardware that’s probably still a lot of computing horsepower around the estate that is underused or idle while the user goes to lunch, or doing lightweight tasks.
Grid based applications are much discussed in the banking/geophysical world as they need to crunch vast amounts of data and are well suited to horizontal scaling. On an Internet scale, there are distributed grids like SETI or Folding@Home – crunching towards a common goal.
What if you have a centralised server than can stream down virtual appliances that run such applications and thus distributed services – isolated from the user through the hypervisor, resource controlled so that they process in the background or when the CPU is idle or by a central “resource policy”.
What if you could then sell this compute capacity back to a “grid” provider – which federates and dispatches grid jobs;
of course, you can technically do this now because multi-tasking has been standard on most desktop operating systems since the late 80’s but security has always been a concern, what if that “grid” application contains malicious code or a bug which can leak data from your machine or the corporate network – this problem hasn’t really been solved to-date, Java etc. provide sandboxes but they depend on a lot of components from the core OS stack and don’t address network isolation.
Now you have an option to provide a high level of instance and network isolation between business systems and grid/public applications by using a client hypervisor – much in the same way that VMware ESX is the foundation for a multi-tenant cloud through vSwitches & Private VLANs etc.
Take that idea to the next level, what if you could distribute your server workload around your desktop estate rather than maintain a large central compute facility?
High-availability through something like VMware FT and DRS/HA make features of the underlying hardware like RAID, redundant power supplies less of a focus point, arguably you are providing high availability at the hypervisor/software level rather than big-iron.
You could also do something like provide a peer to peer file system leveraging local storage on the device to provide local LAN access to files from caches – the hypervisor isolates the virtual appliance from the end-user to divide administrative access to systems and services.
There is a lot of capacity in this “desktop cloud”… and maybe some smart ways to use it, conventional IT thinking says this is a bit wacky but I definitely think there is something in it….thoughts?
VMworld Europe Day 1: Wrap-Up
The first official day kicked off at VMworld, I covered the keynote this morning and have written up the more interesting sessions that I attended now that I have access to power again 🙂
Crowding isn’t as bad as I’d anticipated and getting about is pretty easy, the aircon could do with being a bit cooler as it got a bit sticky towards lunchtime. Queues to sessions are manageable and they have opened up bigger rooms & auditoriums than were used on Partner day. I was relieved to see that most of the queues you see are waiting for the session to open – I’ve not seen many people turned away from the sessions I attended.
I spent some time in some private meetings with Microsoft & VMware today around general virtualization things – reception drinks were popular in the solutions exchange and I think I eat way too much 🙂
The following are the more detailed posts I’ve done on sessions I attended;
Because I can’t possibly write everything up (well, it’s a decision between sleep and blogging…) here are some links to other bloggers with good content
vCenter Data Recovery http://www.virtuallifestyle.nl/2009/02/vmware-vcenter-data-recovery/
A view from afar http://rogerlunditblog.blogspot.com/2009/02/vmworld-europe-2009-tuesday-view-from.html
if you are at VMworld there are some interesting vendors in the solution exchange, I recommend you check out;
HP – Flex 10 blade interconnects on display
Novell/PlateSpin have a large stand covering their management & migration product suites
Zeus – software based traffic manager (more info here)
Veeam win the award for most lurid green (and sheer number of people on their stand 🙂
ioko – because I work for them and I’ve put a lot of effort into this whole vCloud thing 🙂
If you’re not here in Cannes I will endeavour to post up some of the interesting bits from my discussions with these vendors, maybe even a video 🙂
More tomorrow, must sleep.
DC14 – Overview of 2009 VMware Datacenter Products (VMworld Europe 2009)
This session was discussing new features in vSphere, or is it VDC-OS, I’m a bit confused about that one – vSphere is the new name for “Virtual Infrastructure”? that would make sense for me.
As usual this session is prefixed with a slide that all material presented is not final, and is not a commitment – things may change etc. – at least VMware point this out for the less aware people who then come and complain when something has changed at GA 🙂 this is my take on what was said… don’t sue me either 🙂
vApp is an OVF based container format to describe a virtual machine (os+app+data = workload) and what resources it needs, what SLA needs to be met etc. I like this concept.
in later releases it will also include security requirements – they use the model that vApp is like a barcode that describes a workload, the back-end vCenter suite knows how to provision and manage services to meet the requirements expressed by the vApp (resource allocation, HA/FT usage, etc.) and does so when you import the vApp.
There was some coverage of VMware Fault Tolerance (FT) using the lockstep technology, this has been discussed at length by Scott here however if I understood correctly it was said that at launch there would be some limitations; its going to be limited to 1 vCPU until a latter update, or maybe they meant experimental support at GA, with full support at a later update (update 1 maybe?) perhaps someone else at the session can clarify, otherwise there will hopefully be more details in the day 2 keynote by Steven Herrod tomorrow.
There is likely to be c.10% performance impact for VMware FT hosts due to the lockstep overhead (this was from an answer to a delegate question, rather than in the slides).
Ability to scale-up virtual machines through hot add vRAM and vCPU as well as hot-extension of disks.
The vShphere architecture is split into several key components (named using the vPrefix that is everywhere now!:))
vCompute – scaling up the capabilities and scale of individual VMs to meet high-demand workloads.
VMDirectIO – allowing direct hardware access from within a VM; for example – a VM using a physical NIC to do TCP offload etc. – the VM has the vendor driver installed rather than VMXNET etc. to increase performance (looks to have DRS/vMotion implications)
Support for 8 way vSMP (and hot-add)
255Gb RAM for a VM
up to 40GB/s network speed within a VM.
vStorage – improved storage functionality
Thin-provisioning for pragmatic allocation of storage, can use storage vMotion to move data to larger LUNs if required without downtime – monitoring is key here – vCenter integration.
Online disk grow – increase disk size without downtime.
<2ms latency for disk I/O
API for snapshot access, enabling ISV solutions, custom bolt-ons
Storage Virtual Appliances – this is interesting to me, but no real details yet
vNetwork
Distributed Network vSwitch – some good info here – configure once, push config out to all hosts
3rd party software switches (Cisco 1000V)
vServices
vShield - which is a self-learning and configuring firewall service and firewall/trust zones to enforce security policies
vSafe – a framework for ISV’s to plug in functionality like VM deep-inspection, essentially doing brain-surgery on a running VM via an API.
Last point before I had to leave early for a vendor meeting was about Power – vSphere has support for power management technology like SpeedStep and core sleeping and DPM (Distributed Power Management) is moving from experimental to mainstream support. This is great as long as you make sure your data centre power feed can deal with surge capacity should you need to spin up extra hosts quickly; for example at a DR site when you invoke a recovery plan. This needs thought and sizing, rather than oversubscribing power because you think you can get away with it (or don’t realise DPM is sending your servers to sleep); otherwise you may be tripping some breakers and having to find the torches when you have to “burst”.
DC02 – Best Practices for Lab Manager (VMworld Europe 2009)
This was an interesting session; I’ve played a bit with Lab Manager but definitely intend to invest more time in it this year, key things for me were;
There are approx 1000 deployments of Lab Manager at customers, a large percentage in Europe.
You need to bear in mind VMFS constraints on the number of allowed hosts when using DRS with Lab Manager, LM typically provisions and de-provisions lots of VMs so size hosts and clusters accordingly. Consider the storage bandwidth/disk groups etc. The self-service element could easily let this get out of control with over-zealous users, implement storage leases to avoid this (use it or loose it!)
Real-life Lab manager implementations have typically been for the following uses;
- Training – I hadn’t personally considered this use-case before but it’s popular
- Demo environments – McAfee using LM to run their online product demo environments, some custom code to expose the VM console outside of VI into a browser.
- Development – VMware make heavy use of Lab manager for their own dev environments, they have build end-end automation via the SOAP API to integrate with smoke test tools and commercial tools like Mercury etc. builds go through automated smoke tests with the whole environment being captured with the bug in-situ and notifications and links sent to the relevant teams for investigation – excellent stuff; would be good to see a more detailed case-study on how this has been built.
Multi-site Lab Manager implementations are tricky – and need manual template copies or localised installations of LM; may be addressed in future releases.
When backing up Lab Manager hosted VMs think about what you are backing up; guest-based backup tools (Symantec/NTBackup etc.) will expand out the data from each VM and will consume extra storage – Lab manager uses Linked-clones so the actual storage used on the VMFS is pretty efficient.
Ideally use SAN based snapshots on the whole VMFS (or disk tree), and not individual VMDK backups – no file/VM granularity but there is a good reason for this; because linked-clones are so inter-dependent you need to backup the whole chain together otherwise you risk consistency issues (maximum number of linked clones is 30)
VMware say there is no real performance penalty for using linked clones, SAN storage processors can cache the linked/differential parts of the VMDK files very efficiently (due to smaller size fitting in cache I guess?)
There is a tool called SSMove which can move virtual disk trees (linked-clone base disk + all children) between VMFS volumes – not Storage vMotion aware; needs downtime to that VM (and it’s children) to carry out.
There is a concept of organizations within Lab Manager which allows you to separate out access between multiple teams accessing the same Lab manager server and infrastructure.
Network Fencing is a useful feature in Lab Manager, it means you have multiple environments running with identical or conflicting IP address spaces; it automatically deploys a virtual appliance which functions as a NAT and router between the environments to keep traffic separate but allow end-user access by automatically NAT’ing inbound connections to the appropriate environment/container.
All in there are some good features being added into Lab Manager but it would be really good to see VMware working with PlateSpin to integrate the two products tighter, out of the box Lab Manager doesn’t have a facility to import physical machines via P2V – VMware are focused on end-end VM lifecycle solutions but PlateSpin could bring a lot to the table by keeping lab copies of physical servers refreshed; and conversely the ability to sync workload (OS/app/data) changes from development systems back out to physical machines (or other hypervisors – more on PlateSpin and it’s X2X facilities in a previous post here).