Virtualization, Cloud, Infrastructure and all that stuff in-between

My ramblings on the stuff that holds it all together

Category Archives: vTARDIS

vTARDIS Cloud

Following on from my recent VMworld Europe user award I have mentioned that I’ve been working on a scaled out version of the vTARDIS, this post will act as the index for this project, there is a lot of ground to cover in terms of it’s configuration.

Disclosure/Disclaimer – I am a VMware employee, this project is not an official VMware effort, project, fling or even a thing – it’s my private-time work, documented for the community

Very little of this is an officially supported configuration, particularly the use of nested ESX – to re-iterate, this is not a VMware supported, recommended or blessed configuration – but it works well enough for my own needs – your mileage may vary and no warranty is granted, expressly or otherwise.

This is not a solution for production use, it’s suitable for lab/study work and actual performance is limited by the laws of physics

If you run into difficulties with any of this please feel free to drop me a line via the comments section of this post, however I do have a full-time day job at VMware, I’ll help where I am able.

What is vTARDIS? – see this post for details of the original vTARDIS project

What is vTARDIS.cloud?

A small, low cost physical infrastructure which is capable of supporting several multi-node ESX clusters. It provides an infrastructure representative of enterprise-grade vSphere/vCD deployments through heavy over-subscription of physical hardware as well as providing “production” home services like media streaming, data storage, DNS, DHCP etc.

Why?

My original home lab has been scaled out to support my new position at VMware and my VCDX/VCAP studies, a core part of my work is VMware vCloud Director (herein refered to as vCD) so my lab reflects that. Additionally my wife is trying to continue her IT studies, so it’s helpful to have a self-service portal for building out virtual machines for learning.

You very rarely have a large number of ESX hosts and shared storage to experiment with, testing scripts, rebuilding hosts, changing configurations. This provides a representation of a large vSphere/vCD deployment so you can carry out such work to support studies, or pre-production work.

What does it look like? – photo

The vTARDIS.cloud lives in my geek-cabin which is my home office (more info on that here) and now takes up most of a full rack.

image

What does it look like? – High-level architecture

the following diagram illustrates the layout of the vTARDIS.cloud

image

The key configurations and components of the design which I will post further details on are as follows (+more to follow);

  • Stateless ESXi deployment – Using autodeploy VM to PXE boot and configure large numbers of {virtual} ESXi hosts
  • Script to deploy large numbers of VMs and create DHCP reservations
  • Using the Distributed Virtual Switch with nested ESX – share a single dvSwitch between physical and nested ESX hosts (complicated virtual wiring!)
  • Remote Access to your home lab with a virtual appliance
  • vMotion between nested VMs
  • vMotion between nested ESX and physical hosts
  • Configuring the Cisco 3500 XL switch with VLAN, trunk ports for ESX
  • HA Layer 3 routing for the lab using Vyatta virtual appliance and FT
  • Using Distributed Power Management (DPM) with your home lab
  • Enabling Self-service with vCD
  • Backup on a budget

How much did you spend?

I cannot say, as my wife will probably kill me Smile I’ve acquired most of the hardware over the years or from eBay/factory outlet stores so it’s been a gradual expansion rather than an upfront cost. But still, it’s all been out of my own pocket – there are no sponsors or generous donations of kit (if you are reading this and would like to donate some equipment, read the disclaimer at the start and if you’d still like to talk drop me a line)

Item Approx Cost (£GBP) Status
Cisco 3500 XL 100mb switch (48 ports) £100 (eBay) in-use, VLAN-trunks from ESX hosts and office workstations connectivity
Netgear GS487T 48 port gigabit switch £100 (eBay) Spares (decent switch but too noisy for use in office)
Linksys SLM2008 8 port gigabit switch £90 (Amazon) vMotion/vStorage networks
Iomega IX4-200d 8Tb NAS in RAID5 configuration £1,000 (online, ouch!) in-use, critical, like it a lot but v.expensive
Multiple USB2 drives 500Gb-1Tb varies in-use plugged into IX4 for backup
2 x HP ML110G4 Intel Xeon, 8Gb £200 each in-use (management cluster)
special online deals, now defunct Sad smile
3 x HP ML115 G5 AMD Quad Core, 8Gb RAM, dual port GbE Intel NIC £2-300 for each serverwith RAM (varying deals)
80-100 for 8Gb RAM
40 for dual port Intel GbE NIC (job-lot on eBay)
in-use (resource cluster)
now EoL – hopefully they won’t die!
42U Rack (no-brand) free holding up servers Smile
1 x HP D530 SFF Desktop PC, 4Gb RAM, 500Gb SATA £90 (eBay) in-reserve, was ESX 3.5 host (non x64 CPU)
HP TFT 15” rack mount monitor free from skip at customer in-use
HP 4 port PS/2 KVM free from skip at customer in-use
128Gb Kingston SSD £200 (Amazon) UberVSA virtual SAN storage (was in original vTARDIS project; since cannibalised)
64Gb Transcend SSD £100 (Amazon – a while ago) UberVSA virtual SAN storage
Compaq ML570 G1, quad Xeon CPU, 12Gb RAM, external disk array multiple 18Gb SCSI disks, SmartArray £400 eBay (4 or 5 years ago) retired, non-x64 and too power hungry (was power-sucken-cluster)
[open to offers!]
spider refuge
Compaq DL360 G1, single Xeon CPU, 4Gb RAM, 2 x 18Gb HDD £500 eBay (a long time ago) retired, non-x64 and too power hungry (was power-sucken-cluster)
[open to offers!]
spider refuge
Compaq DL320 G1 – unknown spec free, from customer refresh a long time ago retired and faulty, spider refuge
Sun Netra free from a customer refresh a long time ago retired, was old firewall, spider refuge
Compaq 2 drive DLT tape-loader free from a customer refresh a long time ago retired, and probably faulty by now, spider refuge

How much does it cost to run?

This uses approx 600w of power 24/7 – it’s not that cheap here in the UK, I estimate about £6-700 per year, DPM certainly helps to reduce the power consumption of the resource cluster when it’s less-busy, although as a side-benefit the vTARDIS acts as passive heating for my garden office during the winter, that’s “green”, right?

vTARDIS wins Best of Show at VMworld Europe 2010

 

Wow, what can I say, my vTARDIS project has won 2 awards at VMworld Europe 2010 in the following user categories;

There is some good coverage of the VMworld event on the the searchVirtualDataCentre.co.uk site here

image

I’d like to thank <#insert <paltro/gwenneth.h>.. 🙂

But seriously I appreciate this recognition for the vTARDIS project which has burnt many of my brain-cells and personal-time over the last 12-months, as well as airport-stress as I had to convince the TSA that I wasn’t some 24-inspired nut-job shipping a suitcase-nuke round the US with me for BriForum, the Charlotte(US) VMUG and various London, UK VMUGs.

Here is a picture of it in it’s off-the-shelf Marks & Spencer shipping container (a.k.a suitcase) note:

image image

Note cool “my datacenter is bigger than yours” sticker courtesy of Solarwinds

Trying to understand what vTARDIS is is hard for many people, and it’s even harder to explain sometimes, but the concept is basically trying to build a complex, enterprise type vSphere implementation on as little hardware as possible for testing/training, but hopefully the following diagram (and the original post) explain it better at a technical level

image

That-said,  I particularly like how TechTarget (who sponsor the awards) phrase it..

"This is the kind of bonkers-crazy stuff that has made the virtualisation community the bedrock of innovation. The only limitation is people’s imagination, and Gallagher’s vTARDIS demonstrates imagination in spades."

Winner: vTARDIS (Transportable Awfully Revolutionary Data Centre of Invisible Servers)
IT project owner: Simon Gallagher
Vendors and technology used: VMware Inc. vSphere 4.0 and 4.1
Vyatta Core
Openfiler
Microsoft Windows Server 2008 R2
Hewlett-Packard Co. ML115 G5
Advanced Micro Devices Inc. (AMD) quad-core processors
IT project: Gallagher’s lab features low storage latency and solid performance. Gallagher’s configuration also pushes beyond the "official" use of VMware technology by using solid-state drives to reduce disk I/O and "nested VMware ESX" instances, which give the appearance of owning many ESX hosts when the entire infrastructure actually sits on one physical box. His configuration runs eight virtual ESX hosts and nearly 60 virtual machines on just the one physical server, rather than multiple PCs and storage appliances.
What the judges said: "No other entry showed the same degree of doing a lot with so little."

I hope it stands as an example of how flexible VMware technology is and what  you can do with a bit of imagination and some good, hard-graft.

But things don’t stand still in the IT world, and nor do they in my mad-scientist home-lab, look out soon for posts on further developments which are running now;

vTARDIS v2 : 20 node PXE booting, DHCP configured ESXi cluster with powershell provisioning script on a single physical 500GBP server.

vTARDIS.cloud :  3 x 20 node ESXi cluster, DPM enabled, VMware vCloud Director, Chargeback, EMC Celerra VSA, on 3 x physical 300GBP hosts plus iomega IX4-200, 2-node Management cluster pod.

Whilst in the last couple of weeks I started working directly for VMware in the cloud practice, my vTARDIS project was started about a year ago and was demonstrated at many VMUGs and events (including VMworld SF 2010) in that time.

All of the equipment, power, space, brainpower and cooling for this project have been paid for entirely out of my own pocket/cranium, I do not receive any kind of sponsorship for this work from my current or previous employers, and it has been completed on my own (personal) time, so to invoke the Paltro convention I’d definitely like to thank my family for their tolerance and patience whilst I have gnashed my teeth at powershell and danced way beyond edges of supportability, and in many cases physics!

Stay tuned, so much more arcane geekery to come…!

Come see the vTARDIS at VMworld on Monday

 

I am presenting a joint session on affordable lab/SMB environments with Eric Siebert and Simon Seagrave on Monday at 12:00pm, Moscone West room 2007 (V18328: Building an affordable vSphere environment for a lab or small business).

I am covering nested ESX functionality, whilst I haven’t physically transported the vTARDIS all the way to the US this time I am doing demos (hopefully live), so if you want to see how to build an 8 node cluster with shared storage and layer 3 networking on a single low-cost server this is the session for you

This nested ESX functionality that in in vSphere 4 (unsupported as far as I know.. but it works) is what enables most of the hands-on labs.

vTARDIS screenshot – each vmesxi-nn.lab node is really a virtual machine (see the manufacturer field below), but vCenter doesn’t care, and they are all running on a single $600 PC server with just 8Gb of physical RAM (over commit – yeah!)

image

image

If you want to see how to do this cool stuff and a whole lot more, come to the session 🙂

8 Node ESXi Cluster running 60 Virtual Machines – all Running from a Single 500GBP Physical Server

 

I am currently presenting a follow-up to my previous vTARDIS session for the London VMware Users Group where I demonstrated a 2-node ESX cluster on cheap PC-grade hardware (ML-115g5).

The goal of this build is to create a system you can use for VCP and VCDX type study without spending thousands on normal production type hardware (see the slides at the end of this page for more info on why this is useful..) – Techhead and I have a series of joint postings in the pipeline about how to configure the environment and the best hardware to use.

As a bit of a tangent I have been seeing how complex an environment I can get out of a single server (which I have dubbed v.T.A.R.D.I.S: Nano Edition) using virtualized ESXi hosts, the goals were;

  • Distributed vSwitch and/or Cisco NX100V
  • Cluster with HA/DRS enabled
  • Large number of virtual machines
  • Single cheap server solution
  • No External hardware networking (all internal v/dvSwitch traffic)

The main stumbling block I ran into with the previous build was the performance of the SATA hard disks I was using, SCSI was out of my budget and SATA soon gets bogged down with concurrent requests which makes it slow; so I started to investigate solid state storage (previous posts here).

By keeping the virtual machine configurations light and using thin-provisioning I hoped to squeeze a lot of virtual machines into a single disk, previous findings seem to prove that cheap-er consumer grade SSD’s can support massive amount of IOps when compared to SATA (Eric Sloof has a similar post on this here)

So, I voted with my credit card and purchased one of these from Amazon – it wasn’t “cheap” at c.£200 but it will let me scale my environment bigger than I could previously manage which means less power, cost, CO2 and all the other usual arguments you try to convince yourself that a gadget is REQUIRED.

So the configuration I ended up with is as follows;

1 x HP ML115G5, 8Gb RAM, 144Gb SATA HDD c.£300 (see here) but with more RAM
1 x 128Gb Kingston 2.5” SSDNow V-Series SSD c£205

I installed ESX4U1 classic on the physical hardware then installed 8 x ESXi 4U1 instances as virtual machines inside that ESX installation

image

This diagram shows the physical server’s network configuration

image

In order for virtualized ESXi instances to talk to each other you need to update the security setting on the physical host’s vSwitch only as shown below;

image

This diagram shows the virtual network configuration within each virtualized ESXi VM with vSwitch and dvSwitch config side-side.

image

I then built a Windows 2008R2 Virtual Machine with vCenter 4 Update 1 as a virtual machine and added all the hosts to it to manage

I clustered all the virtual ESXi instances into a single DRS/HA cluster (turning off admission control as we will be heavily oversubscribing the resources of the cluster and this is just a lab/PoC setup

image

Cluster Summary – 8 x virtualized ESXi instances – note the heavy RAM oversubscription, this server only has 8Gb of physical RAM – the cluster thinks it has nearly 64Gb

image

image

I then built an OpenFiler Virtual Machine and hooked it up to the internal vSwitch so that the virtualized ESXi VMs can access it via iSCSI, it has a virtual disk installed on the SSD presenting a 30Gb VMFS volume over iSCSI to the virtual cluster nodes (and all the iSCSI traffic is essentially in-memory as there is no physical networking for it to traverse.

image

Each virtualized ESXi node then runs a number of nested virtual machines (VM’s running inside VMs)

In order to get Nested virtual machines to work; you need to enable this setting on each virtualized ESXi host (the nested VM’s themselves don’t need any special configuration)

image

Once this was done and all my ESXi nodes were running and settled down, I have a script to build out a whole bunch of nested virtual machines to execute on my 8-node cluster. the VM’s aren’t anything special – each has 512Mb allocated to it and won’t actually boot past the BIOS because my goal here is just to simulate a large number of virtual machines and their configuration within vCenter, rather than meet an actual workload – remember this is a single server configuration and you can’t override the laws of physics, there is only really 8Gb or RAM and 4 CPU cores available.

Each of the virtual machines was connected to a dvSwitch for VM traffic – which you can see here in action (the dvUplink is actually a virtual NIC on the ESXi host).

image

image

I power up the virtual machines in batches of 10 to avoid swamping the host, but the SSD is holding up very well against the I/O

With all 60 of the nested VMs and virtualized ESXi instances loaded these are the load stats

image

 

 

 

 

 

 

 

 

I left it to idle overnight and these are the performance charts for the physical host; the big spike @15:00 was the scripts running to deploy the 60 virtual machines

image

Disk Latency

image

Physical memory consumption – still a way to go to get it to 8Gb – who says oversubscription has no use? 🙂

image image

So, in conclusion – this shows that you can host a large number of virtual machines for a lab setup, this obviously isn’t of much use in a production environment because as soon as those 60VM’s actually start doing something they will consume real memory and CPU and you will run out of raw resources.

The key to making this usable is the solid state disk – in my previous experiments I found SATA disks just got soaked under load and caused things like access to the VMFS to fail (see this post for more details)

Whilst not a production solution, this sort of setup is ideal for VCP/VCDX study as it allows you to play with all the enterprise level features like dvSwitch and DRS/HA that really need more than just a couple of hosts and VMs to understand how they really work. for example; you can power-off one of the virtual ESXi nodes to simulate a host failure and invoke the HA response, similarly you can disconnect the virtual NIC from the ESXi VM to simulate the host isolation response.

Whilst this post has focused on non-production/lab scenarios it could be used to test VMware patch releases for production services if you are short on hardware and you can quite happily run Update manager in this solution.

If you run this lab at home it’s also very power-efficient and quiet, there are no external cables or switches other than a cross-over cable to a laptop to run the VI Client and administer it; you could comfortably have it in your house without it bothering anyone – and with an SSD there is no hard disk noise under load either 🙂

Thin-provisioning also makes good use of an SSD in this situation as this screenshot from a 30Gb virtual VMFS volume shows.

image

The only thing  you won’t be able to play around with seriously in this environment is the new VMware FT feature – it is possible to enable it using the information in this post and learn how to enable/disable but it won’t remain stable and the secondary VM will loose sync with the primary after a while as it doesn’t seem to work very well as a nested VM. If you need to use FT for now you’ll need at least 2 physical FT servers (as shown in the original vTARDIS demo)

If you are wondering how noisy it it at power-up/down TechHead has this video on YouTube showing the scary sounding start-up noise but how quiet it gets once the fan control kicks-in.

ML115 G5 Start-up Noise

Having completed my VCP4 and 3 I’m on the path to my VCDX and next up is the enterprise exam so this lab is going to be key to my study when the vSphere exams are released.