Subscribe to my RSS Feed
Join 2,575 other subscribers
My ramblings on the stuff that holds it all together
Now that VMware are moving away from ESX classic (with service console) to the ESXi model I have experienced a couple of issues recently that got me wondering if NFS will be a more appropriate model for VM storage going forward. in recent versions of ESX (3.5 and 4) NFS has moved away from just being recommended for .ISO/template storage and has some big names behind it for production VM storage.
I’m far from a storage expert, but I know enough to be dangerous… feel free to comment if you see it differently.
“out of band” Access speed
Because VMFS is a proprietary block-storage file system you are only able to access it via an ESX host you can’t easily go direct (VCB…maybe, but it’s not easy), in the past this hasn’t been too much of an issue; however whilst building a new ESXi lab environment on standard hardware I found excessive transfer times using the Datastore browser in the VI Client, 45mins+ to copy a 1.8GB .ISO file to a VMFS datastore, or import virtual machines and appliances; even using Veeam FastSCP didn’t make a significant difference.
I spent ages checking out network/duplex issues but in desperation I tried it against ESX classic (based on this blog post I found) installed on the same host and that transfer time was less than 1/2 (22mins) – which still wasn’t brilliant – but I cranked out Veeam FastSCP and did it in 6mins!
So, lesson learnt? relying on the VI client/native interfaces to transfer large .ISO files or VMs into datastores slow and you have to go via the Hypervisor layer, which oddly doesn’t seem optimized for this sort of access. Veeam FastSCP fixes most of this – but only on ESX classic as it has some service-console cleverness that just isn’t possible on ESXi.
With ESX classic going away in favour of ESXi, there will need to be an alternative for out of band access to datastores – either direct access or an improved network stack for datastore browsing
This is important where you manage standalone ESX hosts (SME), or want to perform a lot of P2V operations as all of those transfers use this method.
In the case of using NFS, given appropriate permissions you can go direct to the volume holding the VMs using a standard network protocol which is entirely outside of the ESX/vCenter. upload/download transfers thus are at at the speed of the data mover or server hosting the NFS mount point so are not constrained by ESX.
To me, Fibre Channel was always more desirable for VM storage as it offered lossless bandwidth up to 4Gb/s (now 8Gb/s) but Ethernet (which is obviously required to serve NFS) now has 10Gb/s bandwidth and loss-less technology like FCoE, some materials put NFS about 10% slower than VMFS – considering the vast cost difference between dedicated FC hardware and commodity Ethernet/NAS storage I think that’s a pretty marginal difference when you factor in the simplicity of managing NFS vs. FC (VLANs, IPs vs. Zoning, Masking etc.).
FCoE maybe addresses the balance and provides the best solution to performance and complexity but doesn’t really address the out of band access issue I’ve mentioned here as it’s a block-storage protocol.
If you have a problem with your vCenter/ESX installation you are essentially locked out of access to the virtual machines, it’s not easy to just mount up the VMFS volume on a host with a different operating system and pull out/recover the raw virtual machines.
With NFS you have more options in this situation, particularly in small environments.
Storage Host Based Replication
For smaller environments SAN-SAN replication is expensive, and using NFS presents some interesting options for data replication across multiple storage hosts using software solutions.
I’d love to hear your thoughts..
I am currently presenting a follow-up to my previous vTARDIS session for the London VMware Users Group where I demonstrated a 2-node ESX cluster on cheap PC-grade hardware (ML-115g5).
The goal of this build is to create a system you can use for VCP and VCDX type study without spending thousands on normal production type hardware (see the slides at the end of this page for more info on why this is useful..) – Techhead and I have a series of joint postings in the pipeline about how to configure the environment and the best hardware to use.
As a bit of a tangent I have been seeing how complex an environment I can get out of a single server (which I have dubbed v.T.A.R.D.I.S: Nano Edition) using virtualized ESXi hosts, the goals were;
The main stumbling block I ran into with the previous build was the performance of the SATA hard disks I was using, SCSI was out of my budget and SATA soon gets bogged down with concurrent requests which makes it slow; so I started to investigate solid state storage (previous posts here).
By keeping the virtual machine configurations light and using thin-provisioning I hoped to squeeze a lot of virtual machines into a single disk, previous findings seem to prove that cheap-er consumer grade SSD’s can support massive amount of IOps when compared to SATA (Eric Sloof has a similar post on this here)
So, I voted with my credit card and purchased one of these from Amazon – it wasn’t “cheap” at c.£200 but it will let me scale my environment bigger than I could previously manage which means less power, cost, CO2 and all the other usual arguments you try to convince yourself that a gadget is REQUIRED.
So the configuration I ended up with is as follows;
|1 x HP ML115G5, 8Gb RAM, 144Gb SATA HDD||c.£300 (see here) but with more RAM|
|1 x 128Gb Kingston 2.5” SSDNow V-Series SSD||c£205|
I installed ESX4U1 classic on the physical hardware then installed 8 x ESXi 4U1 instances as virtual machines inside that ESX installation
This diagram shows the physical server’s network configuration
In order for virtualized ESXi instances to talk to each other you need to update the security setting on the physical host’s vSwitch only as shown below;
This diagram shows the virtual network configuration within each virtualized ESXi VM with vSwitch and dvSwitch config side-side.
I then built a Windows 2008R2 Virtual Machine with vCenter 4 Update 1 as a virtual machine and added all the hosts to it to manage
I clustered all the virtual ESXi instances into a single DRS/HA cluster (turning off admission control as we will be heavily oversubscribing the resources of the cluster and this is just a lab/PoC setup
Cluster Summary – 8 x virtualized ESXi instances – note the heavy RAM oversubscription, this server only has 8Gb of physical RAM – the cluster thinks it has nearly 64Gb
I then built an OpenFiler Virtual Machine and hooked it up to the internal vSwitch so that the virtualized ESXi VMs can access it via iSCSI, it has a virtual disk installed on the SSD presenting a 30Gb VMFS volume over iSCSI to the virtual cluster nodes (and all the iSCSI traffic is essentially in-memory as there is no physical networking for it to traverse.
Each virtualized ESXi node then runs a number of nested virtual machines (VM’s running inside VMs)
In order to get Nested virtual machines to work; you need to enable this setting on each virtualized ESXi host (the nested VM’s themselves don’t need any special configuration)
Once this was done and all my ESXi nodes were running and settled down, I have a script to build out a whole bunch of nested virtual machines to execute on my 8-node cluster. the VM’s aren’t anything special – each has 512Mb allocated to it and won’t actually boot past the BIOS because my goal here is just to simulate a large number of virtual machines and their configuration within vCenter, rather than meet an actual workload – remember this is a single server configuration and you can’t override the laws of physics, there is only really 8Gb or RAM and 4 CPU cores available.
Each of the virtual machines was connected to a dvSwitch for VM traffic – which you can see here in action (the dvUplink is actually a virtual NIC on the ESXi host).
I power up the virtual machines in batches of 10 to avoid swamping the host, but the SSD is holding up very well against the I/O
With all 60 of the nested VMs and virtualized ESXi instances loaded these are the load stats
I left it to idle overnight and these are the performance charts for the physical host; the big spike @15:00 was the scripts running to deploy the 60 virtual machines
Physical memory consumption – still a way to go to get it to 8Gb – who says oversubscription has no use? 🙂
So, in conclusion – this shows that you can host a large number of virtual machines for a lab setup, this obviously isn’t of much use in a production environment because as soon as those 60VM’s actually start doing something they will consume real memory and CPU and you will run out of raw resources.
The key to making this usable is the solid state disk – in my previous experiments I found SATA disks just got soaked under load and caused things like access to the VMFS to fail (see this post for more details)
Whilst not a production solution, this sort of setup is ideal for VCP/VCDX study as it allows you to play with all the enterprise level features like dvSwitch and DRS/HA that really need more than just a couple of hosts and VMs to understand how they really work. for example; you can power-off one of the virtual ESXi nodes to simulate a host failure and invoke the HA response, similarly you can disconnect the virtual NIC from the ESXi VM to simulate the host isolation response.
Whilst this post has focused on non-production/lab scenarios it could be used to test VMware patch releases for production services if you are short on hardware and you can quite happily run Update manager in this solution.
If you run this lab at home it’s also very power-efficient and quiet, there are no external cables or switches other than a cross-over cable to a laptop to run the VI Client and administer it; you could comfortably have it in your house without it bothering anyone – and with an SSD there is no hard disk noise under load either 🙂
Thin-provisioning also makes good use of an SSD in this situation as this screenshot from a 30Gb virtual VMFS volume shows.
The only thing you won’t be able to play around with seriously in this environment is the new VMware FT feature – it is possible to enable it using the information in this post and learn how to enable/disable but it won’t remain stable and the secondary VM will loose sync with the primary after a while as it doesn’t seem to work very well as a nested VM. If you need to use FT for now you’ll need at least 2 physical FT servers (as shown in the original vTARDIS demo)
If you are wondering how noisy it it at power-up/down TechHead has this video on YouTube showing the scary sounding start-up noise but how quiet it gets once the fan control kicks-in.
Having completed my VCP4 and 3 I’m on the path to my VCDX and next up is the enterprise exam so this lab is going to be key to my study when the vSphere exams are released.
Following on from my recent blog posts about the various ways to configure ML115 G5 servers to run ESX, I thought I would do some further experimenting on some older hardware that I have.
I have a Dell D620 laptop with dual-core CPU and 4Gb of RAM which is now no longer my day-day machine, because of the success I had with SSD drives I installed a 64Gb SSD in this machine
I followed these instructions to install ESXi 4 Update 1 to a USB Lego brick flash drive (freebie from EMC a while ago and plays nicely to my Legogeekdom). I can then boot my laptop from this USB flash drive to run ESXi.
I am surprised to say it worked 1st time, booted fully and even supports the on-board NIC!
So, there you go – another low-cost ESXi server for your home lab that even comes with its own hot-swappable built-in battery UPS 🙂
The on-board SATA disk controller was also detected out of the box
A quick look on eBay and D620’s are going for about £250, handy!
Here is a screenshot of the laptop running a nested copy of ESXi, interestingly I also told the VM it had 8Gb of RAM, when it only has 4Gb of physical RAM.
Ok, so vSphere (ESX4) has only just been released, but what would you like to see in the next major version? Hyper V R2 will be out soon, and I would expect it’s successor within a further 18 months. whilst vSphere is a technically better product now Microsoft are going to be throwing a significant amount of resource at building up the Hyper V product line so VMware need to keep innovating to be significantly ahead.
As the VMware vendor and partner ecosystem grows will it stifle growth in the core product? – I see this happening with Microsoft – they don’t want to produce an all singing and dancing core product as there are literally thousands of ISV’s that they don’t necessarily want to put out of business; so Microsoft core products are “good-enough” but for more advanced features you turn to an ISV (think Terminal Services & Citrix)
So, open question really – here’s my starter for 10 – What would you like to see in ESX 5?
Host Based Replication
SAN storage brings a single point of failure; even with all the best HA controllers and disk arrangements, it’s still one unit –human error or a bad firmware could corrupt all your disks – you can buy a 2nd one and do replication but that’s expensive (twice as expensive infact) and failover can require downtime (automated with SRM etc.).. and what if you need to physically move it to another datacentre? that’s a lot of risk.
In this previous post I proposed a slightly different architecture, leveraging the FT features for a branch office solution – that same model could mean a more distributed architecture with n+1, 2 or 3 x ESX nodes running FT’d VMs for high availability on cheap, commodity hardware – using DAS storage and replicating over standard IP networks.
if you look at companies like Amazon, Google etc. their cloud platforms leverage virtualization (Xen) but I would bet they don’t rely on enormous SANs to run them, they use DAS storage and replication, they expect individual (or even datacentre) failures and can work around them by keeping multiple copies of everything – but they don’t have an expensive storage model – they use cheap commodity kit and provide the HA in the software – with some enhancements the FT feature could provide an equivalent;
Host based replication also makes long-distance clustering more realistic – relying on plain old IP to do the replication, rather than proprietary SAN-SAN replication (previous thoughts on this here)
Microsoft have already moved in this direction with core products like Exchange and SQL, Exchange CCR and SQL Mirroring are pure-IP based replication technologies that address the issues with traditional single copy clusters
Now, with VMware being owned by EMC I could see this as being something of a problem but I hope they can see the opportunity here, you can achieve some of this using storage virtual machines (like Openfiler+Replication in a VM, or Datacore).
Stateless ESX Nodes
A mode where nodes can be PXE booted (or from firmware like ESXi) and have their configurations assigned/downloaded – no manual installs, all DHCP (or reserved DHCP) addressing
when combined with cheap, automatically provisioned and managed virtualization nodes with commodity DAS storage, you could envisage the following scenario..
You can imagine a policy-driven intelligent load and availability controller (vCenter 5) which ensures there are always copies of a VM on at least 2 or 3 physical machines in more than one location
This is getting a bit sci-fi, but the foundations in infrastructure and technology are being laid now with high-speed interconnects like Infiniband…
With more operating systems and applications starting to optimize for multi-core and hot-add CPU and memory, a very advanced hypervisor scheduler combined with very fast host interconnects like Infiniband or 10GbE could see actual CPU load and memory access being distributed across multiple physical hypervisors;
For example; imagine a 24 vCPU SQL Server virtual machine with 1Tb of vRAM having it’s code executed across 10 quad-CPU physical hosts. effectively multi-core processing but across multiple physical machines – moving what currently happens within the a single physical CPU and bus across the network between disparate machines.
The advantage of this is that developers would only have to write apps that work within current SMP technology – the hypervisor masks the complexity of doing this across multiple hosts, CPUs and networks with a high degree of caching and manages concurrency between processes.
You could combine this with support for hot-add CPU and memory features for apps that could scale massively on-demand and then down again, without having to engineer complex layer 7 type solutions.
Anyway, and please note this is pure personal conjecture rather than anything I have heard from VMware or elsewhere – enough from me; what would YOU like to see…?
In the lab I am currently working with I have a set of vSphere 4 ESXi installations running as a virtual machine and configured in an HA cluster – this is a great setup for testing VM patches, and general ops procedures or learning about VMware HA/DRS/FT etc. (this lab is running on a pair of ML115 g5 servers but would work equally on just one
Everything installed ok and I can ping the virtual ESX servers from the vCenter host that manages the cluster (the warning triangle is that there is no management network redundancy – I can live with that in this lab.
All ESX hosts (physical and virtual) are connected via iSCSI to a machine running OpenFiler and the storage networking works ok, however when I configure the vMotion & FT private networks between the VM ESX hosts I cannot ping the vMotion/FT IP addresses using vmkping – indicating that there were some communication problems, normally this would be a VLAN issue or some routing but in this instance all the NICs and IP addresses for my lab reside on a flat 10.0.0.0/8 network (it’s not production, just a lab).
After some digging I came across this post for running ESX full as a VM, and noted the section on setting the vSwitch to promiscuous mode so I tried that with the vSwitch on the physical ESX host that the two ESXi VMs were running on;
And now the two Virtual ESXi nodes can communicate via vmkping
Problem solved and I can now vMotion nested VMs between each virtual ESX host – very clever!
Getting ESX (in it’s various versions) to run under VMware Workstation has proven to be a very popular article on this blog, if you are a consultant who has to do product demos of VI3/vSphere or are studying for your VCP it’s a very useful thing to be able to do on your own laptop rather than rely on remote connections or lugging around demo kit.
Good news; the RC build of vSphere will boot under the latest VMware Workstation build (6.5.2) without any of the .vmx hackery you had to do in previous versions and it seems quite fast to boot.
Bad news: the RC build of vSphere needs at least 2GB of RAM to boot, this is a problem for a laptop with 4GB of RAM as it means you can only really run one at a time.
Luckily: Duncan Epping (or VCDX 007; licenced to design :)) has discovered how you can hack the startup script to allow it to run in less than 2GB of RAM – details here, this isn’t officially supported – but it does work.
In the interests of science I did some experimentation with VM’s with various amounts of decreasing RAM to see what the bare minimum RAM you can get away with for a VM’d version of vSphere RC.
The magic number seems to be 768Mb of RAM, if you allocate less than this to the VM then it results in a Purple Screen of Death (PSOD) at boot time.
Note – this may change for the GA/RTM final version – but these are my findings for RC
The relevant section of my /etc/vmware/init/init.d/00.vmnix file looks like the following (note it won’t actually boot with 512mb assigned to the VM)
Some screen captures of the vSphere RC boot process below
And finally the boot screen once it’s finished – it takes 2-3 mins with 768Mb of RAM on my laptop to get to this boot screen.
I am doing this on a Dell D620 with 4Gb RAM and Intel VT enabled in the BIOS, running Vista x86 and VMware Workstation v6.5.2 build 156735
I haven’t tried, but I assume I can’t power on VM’s under this instance of vSphere but I can connect them to a vCenter 4 machine and practice with all the management and configuration tools.
I have had a lab/test setup at home for over 15 years now, it’s proven invaluable to keep my skills up to date and help me with study towards the various certifications I’ve had to pass for work, plus I’m a geek at heart and I love this stuff 🙂
over the years it’s grown from a BNC based 10mbit LAN running Netware 3/Win 3.x, through Netware 4/NT4, Slackware Linux and all variants of Windows 200x/RedHat.
Around 2000 I started to make heavy use of VMware Workstation to reduce the amount of hardware I had (8 PCs in various states of disrepair to 2 or 3 homebrew PCs) in latter years there has been an array of cheap server kit on eBay and last time we moved house I consolidated all the ageing hardware into a bargain eBay find – a single Compaq ML570G1 (Quad CPU/12Gb RAM and an external HDD array) which served fine until I realised just how much our home electricity bills were becoming!
Note the best practice location of my suburban data centre, beer-fridge providing hot-hot aisle heating, pressure washer conveniently located to provide fine-mist fire suppression; oh and plenty of polystyrene packing to stop me accidentally nudging things with my car. 🙂
I’ve been using a pair of HP D530 SFF desktops to run ESX 3.5 for the last year and they have performed excellently (links here here and here) but I need more power and the ability to run 64 bit VMs (D530’s are 32-bit only) I also need to start work on vSphere which unfortunately doesn’t look like it will run on a D530.
So I a acquired a 2nd-hand ML110 G4 and added 8Gb RAM – this has served as my vSphere test lab to-date, but I now want to add a 2nd vSphere node and use DRS/HA etc. (looks like no FT for me unfortunately though) – Techhead put me onto a deal that Servers Plus are currently running so I now have 2 x ML110 servers 🙂 they are also doing quad-core AMD boxes for even less money here – see Techhead for details of how to get free delivery here
In the past my labs have grown rather organically as I’ve acquired hardware or components have failed; being as this time round I’ve had to spend a fair bit of my own money buying items I thought it would be a good idea to design it properly from the outset 🙂
The design goals are:
The design challenges are:
Luckily I’m looking to start from scratch in terms of my VM-estate (30+) most of them are test machines or something that I want to build separately, data has been archived off so I can start with a clean slate.
The 1st pass at my design for the ESX 3.5 cluster looks like the following
I had some problems with the iSCSI VLAN, and after several days of head scratching I figured out why; in my network the various VLANs aren’t routable (my switch doesn’t do Layer 3 routing). For iSCSI to work the service console needs to be accessible from the iSCSI VKernel port. In my case I resolved this by adding an extra service console on the iSCSI VLAN to get round this problem and discovery worked fine immediately
I also need to make sure the Netgear switch had the relevant ports set to T (Tag egress mode) for the VLAN mapping to work – there isn’t much documentation on this on the web but this is how you get it to work.
The vSwitch configuration looks like the following – note these boxes only have a single GbE NIC, so all traffic passes over them – not ideal but performance is acceptable.
iSCSI SAN – OpenFiler
In this instance I have implemented 2 OpenFiler VMs, one on each D530 machine, each presenting a single 200Gb LUN which is mapped to both hosts
Techhead has a good step-by-step how to setup an OpenFiler here that you should check out if you want to know how to setup the volumes etc.
I made sure I set the target name in Openfiler to match the LUN and filer name so it’s not too confusing in the iSCSI setup – as shown below;
if it helps my target naming convention was vm-filer-X-lun-X which means I can have multiple filers, presenting multiple targets with a sensible naming convention – the target name is only visible within iSCSI communications but does need to be unique if you will be integrating with real-world stuff.
Storage Adapters view from an ESX host – it doesn’t know the iSCSI target is a VM that it is running 🙂
Because I have a non routed L3 network my storage is all hidden in the 103 VLAN, to administer my OpenFiler I have to use a browser in a VM connected to the storage VLAN, I did play around with multi-homing my OpenFilers but didn’t have much success getting iSCSI to play nicely, it’s not too much of a pain to do it this way and I’m sure my storage is isolated to a specific VLAN.
The 3.5 cluster will run my general VMs like Windows domain controllers, file servers and my SSL VPN, they will vMotion between the nodes perfectly. HA won’t really work as the back-end storage for the VM’s live inside an OpenFiler, which is a VM – but it suits my needs and storage vMotion makes online maintenance possible with some advanced planning.
Performance from VM’d OpenFilers has been pretty good and I’m planning to run as many as possible of my VMs on iSCSI – the vSphere cluster running on the ML110’s will likley use the OpenFilers as their SAN storage.
This is the CPU chart from one of the D530 nodes in the last 32hrs whilst I’ve been doing some serious storage vMotion between the OpenFiler VM’s it hosts.
That’s it for now, I’m going to build out the vSphere side of the lab shortly on the ML110’s and will post what I can (subject to NDA, although GA looks to be close)
As a result of a power outage last week my home lab needed a reboot as my 2 x ESX D530 boxes didn’t have auto-power on setting set in BIOS, so I dutifully braved the snow to get to the garage and power them on manually.
However nothing came back online.. ESX started but my VMs didn’t auto-restart as it couldn’t find them.
The run up to xmas was a busy month and I had vague recollections of being in the midst of using storage vMotion to move all my VMs away from local storage to an OpenFiler VM in preparation for some testing.
However, in my rush to get things working the OpenFiler box didn’t have a static IP address set and was using DHCP (see where this is going…?)
So my domain controller/DNS/DHCP and Virtual Centre server were stored on the OpenFiler VM which my ESX box was running and accessed over iSCSI. As such when ESX started it couldn’t locate the iSCSI volume hosting the VM and couldn’t start anything.
OpenFiler couldn’t start its web admin GUI if it couldn’t get an IP address, nor would it mount the shared volumes.
Once I’d figured out what was going on, it was simple enough to get things going again;
However at this point I would have expected to be able to set a static IP address and resolve the issue for the future, however I couldn’t see any NICs in the OpenFiler config screen (see screenshot below)
I thought this was a bit odd, and maybe I was looking in the wrong part of the UI, but sure enough it was the correct place.
I tried updating it to the most recent software releases via the handy system update feature, which completed ok (no reboot required – beat that Windows Storage Server! :)) but still no NICs showing up, even after a couple of reboots to be absolutely sure.
Then, I stumbled across this thread and it seems this may be a bug (tracker here) following Jason’s suggestion I used the nano text editor via the VI remote console to edit the /opt/openfiler/var/www/includes/network.inc file on the OpenFiler VM as follows;
I then refreshed the system tab in my browser session and the NICs show up;
note as part of my initial troubleshooting I added a 2nd virtual NIC to the VM, but the principal should apply regardless.
And I can now set a static IP etc.
I had to reboot my ESX host to get all my VM’s back from being inaccessible, I’m sure there is a cleverer way to do that, but in my case I wanted to test that the start-up procedure worked as expected now that I’ve set a static IP and re-jigged the start-up sequence so that OpenFiler starts before any other VMs that are dependent on it for their storage.
As noted here and here, VMWare have had ESX 3.5u2 certified under Microsoft’s SVVP programme, this is excellent news and will knock down one of the long standing barriers to greater adoption of virtualisation as I wrote about here – support.
Most notably for me this means blessed support of Exchange 2007sp1 running under ESX!
Excellent work to get this done so quickly – MS only announced the SVVP programme a short while ago.
Official list of MS products supported under VMWare is here.
Just incase you ever wondered what it looks like here is a screendump..
this is the VMWare equivalent of Microsoft’s BSOD (Blue Screen of Death)
I got this whilst running ESX 3.5 under VMWare Workstation 6.5 build 99530, it happened because I was trying to boot my ESX installation from a SCSI hard disk – which it didn’t like – I assume because of driver support, swapped for an IDE one and it worked fine…
update – actually the VM had 384Mb of RAM allocated and that’s what actually stopped it from booting.. upped to 1024Mb and it runs fine.
Its the first time I’ve seen one – all the production ESX boxes I’ve worked with have always been rock-solid (touch wood)
I’m preparing a blog post about unattended installations of ESX when I hit this, in case you were wondering.