Virtualization, Cloud, Infrastructure and all that stuff in-between

My ramblings on the stuff that holds it all together

Monthly Archives: February 2011

Hardware is Hard, Software is Easy. is 2011 the year of the VSA?

 

I have done a lot of lab-work with Virtual Storage Appliances, mainly because proper shared storage is hard to come-by for lab-time so I’ve used the following for the last few years running inside Virtual Machines

OpenFiler

FreeNAS

Celerra VSA Uber

HP LeftHand

Nexentastor

The more vendors that release software versions of their kit or emulators are high on my list of things to watch as IMHO it shows they are looking-ahead

Traditional storage vendors have made a very good living in the last decade selling custom, high-performance silicon – but this comes at a cost – designing custom ASICs and code takes time, because it involves high-tech fabrication technologies, even if these are outsourced it’s very expensive and time-consuming.

It’s also harder to “turn the ship” if the market moves as a vendor has significant resources committed to product development.

Mainframes have also maintained a similar position and have seen their market share eroded by commodity x86 hardware that combined with clever software delivers the same solutions with less hardware-vendor lock-in and typically a lower cost.

Software is easy – well, relatively easy to change when compared to hardware so R&D cycles can be shorter, more agile and respond quicker to market changes.

Changes/upgrades to custom chips have development lifecycles in multiple years, and once a chip is burnt/fabricated and shipped to the masses it’s harder to make changes if a problem is found – x86 builds up on a well-used and field-proven architecture typically adopting a scale-out architecture over standardised interconnects (Infiniband/Ethernet) to achieve higher performance – why re-invent the wheel?

There will always be edge-cases where ultra low-latency interconnects can only be provided over on-die CPU traces – but for general compute, network, storage – but as x86 and it’s ancillary interconnect technologies march ever faster, can equivalent functionality not be achieved using clever software on common hardware rather than raw physics and men in white-coats?

As this cycle continues – can storage vendors continue to make those margins, respond to customer requirements and keep ahead of the competition when they are tied to a custom silicon architecture – is it more advantageous to move to a commodity platform to deliver their solutions?

Using “clever” software like a hypervisor to abstract a commodity x86 hardware architecture means you can push storage functions like snapshots, cloning, replication higher up the stack and make them less specific to hardware vendor X’s track/cylinder or backplane protocol Y

Building in x86 also means you can be selective about how you deploy – on bare hardware, or with a hypervisor like ESXi – both use-cases are equally valid and the cost to change between the two is minimal (in development terms)

EMC are already committed to an x86 scale-out architecture for their platforms for this reason and even if the badge on the outside says EMC it’s just commodity kit with clever software, rather than custom firmware running on custom chips and I expect all the competition are considering if being a niche edge-case player or a high-performance general storage player is a better business play.

The Open-source community also have some excellent projects in this space which are being spun out into commercial products, traditional storage vendors beware!

Virtual Storage Appliances (VSA) are the next logical step in de-coupling storage services from hardware.

Disclosure: I work for VMware, of whom EMC are a majority shareholder – however this isn’t an advert – it’s my opinion and experience.

Of ITIL and Cloud

 

I have been watching this conversation about ITIL, virtualization and cloud play out over the last year; some very enthusiastic bloggers loudly bash ITIL for how unsuitable it is in the modern cloud world, change control for vMotion? do you update your CMDB when you vMotion? lunacy? how? tools? methods? consultancy? snake-oil? £££?

I spent a good chunk of my role before VMware as a project-based architect with significant interfaces to an ITIL-heavy managed-services team, they embraced ITIL and took it to the core of the operations side of the business and had the scars as well as the trophies to show for it.

I have seen it help, and I have seen it hinder – but I think the core problem that people seem to have with ITIL is that they just don’t understand it or they are afraid it’s hard-work.

Wake-up; it is hard-work, but it’s hard work for a reason.

ITIL is not prescriptive, it doesn’t tell you how to do things so you change your business to fit around it; the truly successful ITIL organizations are those that understand that it’s a FRAMEWORK, you can pick & choose which parts apply or deliver benefit to their business and discard those that don’t.

ITIL also comes at a cost, ITIL is about best-practice, information sharing and planning and auditability/accountability; this means systems, software and people time to make that happen – but that cost is also about reducing risk and providing accountability/auditability, yes it does slow down your reaction time by adding a layer of process and approval but the trade-off is that when things do go wrong you know who did what, when, why and who said it was ok to do it (accountability) rather than an unmanaged mess.

What does that deliver…?

More expensive operations (people time = £££, tools = £££)

More informed operations and business (downtime, Intellectual property retained = £££)

The two functions of ITIL that I see raise the most heckles are Change Control (and the notorious Change Advisory Board (CAB) meeting) and the Configuration Management Database (CMDB) that I will tackle in-turn;

 

Change Control is about communications and planning, those CAB meetings are there to disseminate information about what is going to happen and gain buy-in of stakeholders, however obscure you may think your dependency on the change being implemented you have had your opportunity to air an opinion and contribute to the go/no-go based on your service requirements, it’s your responsibility to your service to engage with this process and not see it as a hindrance – neither IT or business stand still, nor should you.

ITIL also makes techies stand back and think about what they are doing to do before they do it – because you make them document it and explain it in English (or $LOCALE) to the people that matter (the stake-holders), not just allow them to get all Jackie-Chan with the CLI. As techies, it’s all too easy to believe in your own command-line fu and forget that you are fallible and may have missed a critical dependency or conveyed the gravity and risk of what you are going to do to that customer.

Sometimes as a techie, ITIL-induced CAB is your friend; this is your chance to convey the risk of something you have been asked to do, it’s your way of saying “you won’t spend £££ on redundant storage for this service migration, thus if this goes wrong you will be down for X hours at a cost of £Y”, that’s a very useful and practical way to put things in to perspective for the stake and budget-holder and lubricate the flow of extra contingency budget to avert a potential disaster, and if it does go wrong you’ve CYOA.

The CMDB is just a database (or in some cases many databases), so what if you don’t have a single all-seeing and all-knowing CMDB?, there may be very valid reasons to maintain multiple CMDB’s – for example some equipment may be owned/managed by service providers and some by internal IT – this isn’t new it’s an age-old business IT problem – in the real-world (i.e business) it’s solved by building interfaces, API’s and views – why not treat your mythical and so hard to manage CMDB as a meta-database, an index of where to go and find the relevant info (or better still build an API to do it for you).

And stop relying on people to populate the CMDB correctly – build tools to do it automatically, leverage that API and have hosts check themselves in and out of the cloud, or between clouds, or between clouds and internal infrastructure – this isn’t a problem with ITIL, this is a problem with doing things manually.

 

Evolution

I honestly don’t see ITIL as a blocker for cloud, systems and people just get smarter to support quicker change and deliver lower-cost of operations, for example;

  • A list of pre-approved automated changes and a notification list when they are implemented – like adding more storage, adding hosts, vMotion, storage tiering etc. but that keep a detailed audit-trail.
  • A budget of pre-approved changes/actions based on typical usage – this allows systems to trap/manage explosions of requests that could be caused by a problem
  • Automated voting tools for change-approval/veto, rather than CAB conference calls/meetings and an agreed escalation process
  • systems that register/de-register themselves in a CMDB when changes happen – rather than relying on someone to do it manually, implementing some sort of heartbeat to age-out hosts that die or are removed outside of the process.

Applications are changing for the cloud, application frameworks are freeing code from underlying infrastructure – great, maybe this means you don’t have to worry about infrastructure, servers, networks, storage in the great public cloud (it’s SEP), but you still leverage ITIL for things like release-management and change-control within the bits you manage/care about.

This doesn’t mean it isn’t the same old ITIL in the cloud – it’s just ITIL principals with tools/enlightened people.

Speed of Change and instant gratification are one of the much-touted benefits of cloud, but let’s put that into perspective, how often does your business really need a server/application NOW – i.e in 3 mins? and if you do – how well thought out is that deployment, how long before it becomes a critical but home-grown business app that you can’t un-weave from the rest of the business (how often have you seen spredsheet-applications and Access DB’s worm their way into your own business processes?

If you implement the sort of light-weight approval change/control I discuss here does it really matter if it takes an hour to go through an approval cycle and everyone knows what’s going on, approval could even be automated if you are given that level of pre-approved changes.

With that I’ll sign-off with a simple warning; bear in mind the more automated you make things, the easier it is for people to ignore them or feel disenfranchised from the activity. An electronic approval becomes a task rather than a face-face decision for which they were accountable in a meeting/CAB – people are still human after-all and it’s the stupid system’s fault isn’t it?

image