My ramblings on the stuff that holds it all together
Of ITIL and Cloud
I have been watching this conversation about ITIL, virtualization and cloud play out over the last year; some very enthusiastic bloggers loudly bash ITIL for how unsuitable it is in the modern cloud world, change control for vMotion? do you update your CMDB when you vMotion? lunacy? how? tools? methods? consultancy? snake-oil? £££?
I spent a good chunk of my role before VMware as a project-based architect with significant interfaces to an ITIL-heavy managed-services team, they embraced ITIL and took it to the core of the operations side of the business and had the scars as well as the trophies to show for it.
I have seen it help, and I have seen it hinder – but I think the core problem that people seem to have with ITIL is that they just don’t understand it or they are afraid it’s hard-work.
Wake-up; it is hard-work, but it’s hard work for a reason.
ITIL is not prescriptive, it doesn’t tell you how to do things so you change your business to fit around it; the truly successful ITIL organizations are those that understand that it’s a FRAMEWORK, you can pick & choose which parts apply or deliver benefit to their business and discard those that don’t.
ITIL also comes at a cost, ITIL is about best-practice, information sharing and planning and auditability/accountability; this means systems, software and people time to make that happen – but that cost is also about reducing risk and providing accountability/auditability, yes it does slow down your reaction time by adding a layer of process and approval but the trade-off is that when things do go wrong you know who did what, when, why and who said it was ok to do it (accountability) rather than an unmanaged mess.
What does that deliver…?
More expensive operations (people time = £££, tools = £££)
More informed operations and business (downtime, Intellectual property retained = £££)
The two functions of ITIL that I see raise the most heckles are Change Control (and the notorious Change Advisory Board (CAB) meeting) and the Configuration Management Database (CMDB) that I will tackle in-turn;
Change Control is about communications and planning, those CAB meetings are there to disseminate information about what is going to happen and gain buy-in of stakeholders, however obscure you may think your dependency on the change being implemented you have had your opportunity to air an opinion and contribute to the go/no-go based on your service requirements, it’s your responsibility to your service to engage with this process and not see it as a hindrance – neither IT or business stand still, nor should you.
ITIL also makes techies stand back and think about what they are doing to do before they do it – because you make them document it and explain it in English (or $LOCALE) to the people that matter (the stake-holders), not just allow them to get all Jackie-Chan with the CLI. As techies, it’s all too easy to believe in your own command-line fu and forget that you are fallible and may have missed a critical dependency or conveyed the gravity and risk of what you are going to do to that customer.
Sometimes as a techie, ITIL-induced CAB is your friend; this is your chance to convey the risk of something you have been asked to do, it’s your way of saying “you won’t spend £££ on redundant storage for this service migration, thus if this goes wrong you will be down for X hours at a cost of £Y”, that’s a very useful and practical way to put things in to perspective for the stake and budget-holder and lubricate the flow of extra contingency budget to avert a potential disaster, and if it does go wrong you’ve CYOA.
The CMDB is just a database (or in some cases many databases), so what if you don’t have a single all-seeing and all-knowing CMDB?, there may be very valid reasons to maintain multiple CMDB’s – for example some equipment may be owned/managed by service providers and some by internal IT – this isn’t new it’s an age-old business IT problem – in the real-world (i.e business) it’s solved by building interfaces, API’s and views – why not treat your mythical and so hard to manage CMDB as a meta-database, an index of where to go and find the relevant info (or better still build an API to do it for you).
And stop relying on people to populate the CMDB correctly – build tools to do it automatically, leverage that API and have hosts check themselves in and out of the cloud, or between clouds, or between clouds and internal infrastructure – this isn’t a problem with ITIL, this is a problem with doing things manually.
I honestly don’t see ITIL as a blocker for cloud, systems and people just get smarter to support quicker change and deliver lower-cost of operations, for example;
- A list of pre-approved automated changes and a notification list when they are implemented – like adding more storage, adding hosts, vMotion, storage tiering etc. but that keep a detailed audit-trail.
- A budget of pre-approved changes/actions based on typical usage – this allows systems to trap/manage explosions of requests that could be caused by a problem
- Automated voting tools for change-approval/veto, rather than CAB conference calls/meetings and an agreed escalation process
- systems that register/de-register themselves in a CMDB when changes happen – rather than relying on someone to do it manually, implementing some sort of heartbeat to age-out hosts that die or are removed outside of the process.
Applications are changing for the cloud, application frameworks are freeing code from underlying infrastructure – great, maybe this means you don’t have to worry about infrastructure, servers, networks, storage in the great public cloud (it’s SEP), but you still leverage ITIL for things like release-management and change-control within the bits you manage/care about.
This doesn’t mean it isn’t the same old ITIL in the cloud – it’s just ITIL principals with tools/enlightened people.
Speed of Change and instant gratification are one of the much-touted benefits of cloud, but let’s put that into perspective, how often does your business really need a server/application NOW – i.e in 3 mins? and if you do – how well thought out is that deployment, how long before it becomes a critical but home-grown business app that you can’t un-weave from the rest of the business (how often have you seen spredsheet-applications and Access DB’s worm their way into your own business processes?
If you implement the sort of light-weight approval change/control I discuss here does it really matter if it takes an hour to go through an approval cycle and everyone knows what’s going on, approval could even be automated if you are given that level of pre-approved changes.
With that I’ll sign-off with a simple warning; bear in mind the more automated you make things, the easier it is for people to ignore them or feel disenfranchised from the activity. An electronic approval becomes a task rather than a face-face decision for which they were accountable in a meeting/CAB – people are still human after-all and it’s the stupid system’s fault isn’t it?