My ramblings on the stuff that holds it all together
Category Archives: Virtual Grid
There is a lot of talk about delivering cloud or elastic computing platforms, a lot of CxO’s are taking this all in and nodding enthusiastically, they can see the benefits.. so make it happen!….yesterday.
You can build your own cloud, and be choosy about what you give to others. building your own cloud makes a lot of sense, it’s not always cheap but its the kind of thing you can scale up (or down..) with a bit of up-front investment, in this article I’ll look at some of the practical; and more infrastructure focused ways in which you can do so.
Your “cloud platform” is essentially an internal shared services system where you can actually and practically implement a “platform” team that operates and capacity plans for the cloud platform; they manage it’s availability and maintenance day-day and expansion/contraction.
You then have a number of “service/application” teams that subscribe to services provided by your cloud platform team… they are essentially developers/support teams that manage individual applications or services (for example payroll or SAP, web sites etc.), business units and stakeholders etc.
Using the technology we discuss here you can delegate control to them over most aspects of the service they maintian – full access to app servers etc. and an interface (human or automated) to raise issues with the platform team or log change requests.
I’ve seen many attempts to implement this in the physical/old world and it just ends in tears as it builds a high level of expectation that the server/infrastructure team must be able to respond very quickly to the end-“customer” the customer/supplier relationship is very different… regardless of what OLA/SLA you put in place.
However the reality of traditional infrastructure is that the platform team can’t usually react as quick as the service/application teams need/want/expect because they need to have an engineer on-site, wait for an order and a delivery, a network provisioning order etc. etc (although banks do seems to have this down quite well, it’s still a delay.. and time is money, etc.)
Virtualization and some of the technology we discuss here enable the platform team to keep one step ahead of the service/application teams by allowing them to do proper capacity planning and maintain a pragmatic headroom of capacity and make their lives easier by consolidating the physical estate they manage. This extra headroom capacity can be quickly back-filled when it’s taken up by adopting a modular hardware architecture to keep ahead of the next requirement.
Traditional infrastructure = OS/App Installations
- 1 server per ‘workload’
- Silo’d servers for support
- Individually underused on average = overall wastage
- No easy way to move workload about
- Change = slow, person in DC, unplug, uninstall, move reinstall etc.
- HP/Dell/Sun Rack Mount Servers
- Cat 6 Cables, Racks and structured cabling
The ideal is to have an OS/app stack that can have workloads moved from host A to host B; this is a nice idea but there are a whole heap of dependencies with the typlical applications of today (IIS/apache + scripts, RoR, SQL DB, custom .net applications). Most big/important line of business apps are monolithic and today make this hard. Ever tried to move a SQL installation from OLD-SERVER-A to SHINY-NEW-SERVER-B? exactly. *NIX better at this, but not that much better.. downtime required or complicated fail over.
This can all be done today, virtualization is the key to doing it – makes it easy to move a workload from a to b we don’t care about the OS/hardware integration – we standardise/abstract/virtualize it and that allows us to quickly move it – it’s just a file and a bunch of configuration information in a text file… no obscure array controller firmware to extract data from or outdated NIC/video drivers to worry about.
Combine this with server (blade) hardware, modern VLAN/L3 switches with trunked connections, and virtualised firewalls then you have a very compelling solution that is not only quick to change, but makes more efficient use of the hardware you’ve purchased… so each KW/hr you consume brings more return, not less as you expand.
Now, move this forward and change the hardware for something much more commodity/standardised
Requirement: Fast, Scalable shared storage, filexible allocation of disk space and ability to de-duplicate data, reduce overhead etc, thin provisioning.
Solution: SAN Storage, EMC Clariion, HP-EVA, Sun StorageTek, iSCSI for lower requirements, or storage over single Ethernet fabric – NetApp/Equalogic
Requirement: Requirement Common chassis and server modules for quick, easy rip and replace and efficient power/cooling.
Solution: HP/Sun/Dell Blades
Requirement: quick change of network configurations, cross connects, increase & decrease bandwidth
Solution: Cisco switching, trunked interconnects, 10Gb/bonded 1GbE, VLAN isolation, quick change enabled as beyond initial installation there are fewer requirements to send an engineer to plug something in or move it, Checkpoint VSX firewalls to allow delegated firewall configurations or to allow multiple autonomous business units (or customers) to operate from a shared, high bandwidth platform.
Requirement: Ability to load balance and consolidate individual server workloads
Solution: VMWare Infrastructure 3 + management toolset (SCOM, Virtual Centre, Custom you-specific integrations using API/SDK etc.)
Requirement: Delegated control of systems to allow autonomy to teams, but within a controlled/auditable framework
Solution: Normal OS/app security delegation, Active Directory, NIS etc. Virtual Center, Checkpoint VSX, custom change request workflow and automation systems which are plugged into platform API/SDK’s etc.
the following diagram is my reference architecture for how I see these cloud platforms hanging together
As ever more services move into the “cloud” or the “mesh” then integrating them becomes simpler, you have less of a focus on the platform that runs it – and just build what you need to operate your business etc.
In future maybe you’ll be able to use the public cloud services like Amazon AWS to integrate with your own internal cloud, allowing you to retain the important internal company data but take advantage of external, utility computing as required, on demand etc.
I don’t think we’ll ever get to.. (or want) to be 100% in a public cloud, but this private/internal cloud allows an organisation to retain it’s own internal agility and data ownership.
I hope this post has demonstrated that whilst, architecturally “cloud” computing sounds a bit out-there, you can practically implement it now by adopting this approach for the underlying infrastructure for your current application landscape.
it uses replication between 2 ESX hosts to allow you to configure DRS/HA etc.
Excellent, I’m going to procure another cheap ESX host in the next couple of weeks so will post back on my experiences with setting this up, my previous plan meant I’d have to get a 3rd box to run an iSCSI server like OpenFiler to enable this functionality, but I really like this approach.
Sidenote – Xtravirt also have some other useful downloads like Viso templates and an ESX deployment appliance available here
Link here – good visualisation about 10mins in of how their new Chicago data centre is laid out internally.
With virtualisation breaking the traditional hardware/OS ties; this is becoming an increasingly appealing way of managing commodity compute grid resources for large organisations. Mike makes some good points about the de-comissioning of servers on a large scale where you are adding 10’s of thousands on a regular basis – you need to take them out at some point too, and that’s time consuming. at this scale of operation It’s more efficient to make the the container and/or datacentre the field replaceable unit (as I discussed a while back) in this scenario.
Also interesting point that water consumption may be the next environmental touch paper for legislation and disclosure for IT shops.
You’ve been able to buy solid state SAN technology like the Tera-RAMSAN from TMS which gives you up to 1Tb of storage, presented over 4Gb/s fibre channel or Infiniband @10Gb/s… with the cost of flash storage dropping its going to soon fall in to the realms of affordability (from memory a year ago 1Tb SSD SAN was about £250k, so would assume that’s maybe £150k now – would be happy to see current pricing if anyone has it though).
If you were able to combine this with a set of ESX hosts dual-connected to the RAMSAN and traditional equipment (like an HP EVA or EMC Clariion) over a FC or iSCSI fabric then you could possibly leverage the new Storage vMotion features that are included in ESX 3.5 to achieve a 2nd level of performance and load levelling for a VM farm.
It’s pretty common knowledge that you can use vMotion and the DRS features to effectively load level or average VM CPU and memory load across a number of VMWare nodes within a cluster.
Using the infrastructure discussed above could add a second tier of load balancing without downtime to a DRS cluster. If a VM needs more disk throughput or is suffering from latency then you could move them to/from the more expensive solid-state storage tiers to FC-SCSI or even FATA disks, this ensures you are making the best use of fast, expensive storage vs. cheap, slow commodity storage.
Even if Virtual Center doesn’t have a native API for exposing this type of functionality or criteria for the DRS configuration you could leverage the plug-in or scripting architecture to use a manager of managers (or here) to map this across an enterprise and across multiple hypervisors (Sun, Xen, Hyper V)
I also see EMC integrating flash storage into the array itself, would be even better if you could transparently migrate LUNS to/from different arrays and disk storage without having to touch ESX at all.
Note: This is just a theory I’ve not actually tried this – but am hoping to get some eval kit and do a proof on concept…
Interesting article here on how Cisco have made heavy use of virtualization within their new ASR series router platform, Linux underneath and 40 core CPUs!
This type of approach does make me wonder if we will get to the stage of running traditional “network” and “storage” services as VM’s under a shared hypervisor with traditional “servers”.. totally removing the dependency on dedicated or expensive single-vendor hardware.
Commodity server blade platforms like the HP or Sun blade systems are so powerful these days, with flexible interconnect/expansion options this type of approach makes a lot of sense to me and is totally flexible.
Maybe one day it will go the other way and all your Windows boxen will run inside a Cisco NX7000 lol!
On reflection maybe all those companies have too much of a vested interest in vendor lock-in and hardware sales to make this a reality!
Always an interesting point to bear in mind and useful in expectation setting for developers. You may want a dedicated CPU/core – but do you really need all of that CPU all of the time? in most cases I would guess not; and if you do need that level of performance – shouldn’t you be considering a physical platform rather than a virtual one?
Martin’s post here prompted me to blog something I’ve been meaning to do for a while.
Virtualization projects and services are cool; we all understand the advantages in power/cooling and the flexibility it can bring to our infrastructures.
But what about support, if you are a service provider (internal or outsourcing) you normally need to be able to offer an end-end SLA on your services. typically this would be backed off against a vendor like Microsoft or Oracle via one of their premium support arrangements.
From what I see in the industry, with most software vendors especially Microsoft there is almost no way a service provider can underwrite an SLA as application/OS vendors give themselves significant scope to say “unsupported configuration” if you are running it under a hypervisor or other VM technology… Microsoft use the term commercially reasonable in their official policy – who decides what this is?
I would totally accept that a vendor would not guarantee performance under a hypervisor – that’s understandable and we have tools to analyse, monitor and improve (Virtual Centre, MOM, DRS, increase resources etc.). but too many vendors seem to use it as a universal “get out of jail free card”.
Issues of applications with dependency on physical hardware aside (fax cards, realtime CPU, DSP, PCI cards etc.) In my entire career working with VM technology I’ve only ever seen one issue that could be directly attributed to being caused by virtualization – and to be fair that was really a VMTools issue; rather than VMWare itself.
Microsoft have an official list of their applications that are not supported here – why is this? speech server I could maybe understand as it would probably be timer/DSP sensitive – but the rest? Sharepoint? I know for a fact ISA does work under VMWare as I use it all the time.
Microsoft Virtual Server support policy http://support.microsoft.com/kb/897613
Support policy for Microsoft software running in non-Microsoft hardware virtualization software http://support.microsoft.com/kb/897615/
Exchange is specifically excluded (depending on how you read the articles)
· On the Exchange Server 2007 System requirements page it only mentioned Unified messaging as being unsupportable in a virtual environment http://technet.microsoft.com/en-us/library/aa996719.aspx
· Yet on TechNet it is clear stated that “Neither Exchange 2007 nor Exchange 2007 SP1 is supported in production in a virtual environment” http://technet.microsoft.com/en-us/library/bb232170(EXCHG.80).aspx
Credit due to a colleague for pulling together the relevant Microsoft linkage
But I know it….
a) works fully – I do it all the time.
b) Lots of people are doing this in production with lots of users (many people at VMWorld US last year)
c) VMWare have a fully-supportable x64 hypervisor – It’s just MS that don’t
Dont’ tell/ask – 99% of the time a tech support rep won’t know its running under VMWare/a.n.other hypervisor so why complicate matters by telling them – could of course back-fire on you!
Threaten – “If you won’t support under VMWare we’ll use one of your competitors applications”; however this only really works if you are the US govt. or Globocorp Inc. or operate in a very niche application market.
Mitigate – reflect this uncertainty in an SLA, best-endeavours etc. this would kill most virtualization efforts in their tracks for an enterprise customer.
The same support issue has been around for a long time; Citrix/Terminal Services, application packaging, automated installations, etc. are treated as “get out of jail free cards” by support organisations…
But whilst there are some technical constraints (usually only affecting badly written apps) with terminal services and packaging, virtualization changes the game and should make it simpler for a vendor to support as there is no complex runtime integration with a host OS + bolt-ons/hacks it’s just an emulated CPU/disk/RAM you can do whatever you like within it.
So – the open debate; what do you do? and how do you manage it?
There is a new site here (disclaimer: it does seem to be promoting a commercial service, but has some useful information that has been put into the public domain); describing some methods to roll your own P2V backup approach; I’ve not read in detail yet; but looks like Frane Borozan has solved some of the challenges I’ve encountered in the past automating the Free VMWare Convertor tool.
When I get some time I will revisit my build a better test lab series (and update it!) I hope to be able to integrate some of Frane’s ideas.
Thanks to Techhead for passing on the link; we worked together on the platform underlying the Build a better test lab series and he did a lot of work on the P2V and post-P2V automation tasks – he’s got a lot of handy scripts for doing this on an HP platform
Virtualized DR is going to be big this year; I have a long line of customers with this high on their list of priorities… Both for cross site 100% VMWare implementations and for the ability to backup/restore physical platforms to VMWare grid in a DR situation.
It just makes so much sense; no delay whilst racking & stacking recovery kit or problems restoring to different hardware etc. your admin’s can even do it from home – which can have some significant advantages in the event of a natural disaster like Katrina or floods like we had over the last couple of years in the UK
PlateSpin Forge is something we are seriously looking at as well as Symantec Backup Exec System Recovery Server Edition (who win a prize for extending the longest, most annoying product name! despite acquiring it from Veritas).
Will be an interesting year; I’m sure Sungard and all those recovery centre facilities will be moving to a grid/resource rental model rather than pure rack/floor space and retained hardware on-contract.