vSphere – Allocate 100% CPU to 1 VM / 88% efficiency

Cody Smith asked:

EDIT 2: My application benefits from hyper-threading

A. Yes I know what the technology is and what it does

B. Yes I know the difference between a physical core and a logical one

C. Yes turning HT off made the render run slower, this is expected!

D. No I am not overprivisoning when I assign all the logical (yes logical) cores to one VM, if you read the white papers from VMWare you will know that the scheduler generates a topology map of the physical hardware and uses that map when allocating resources, assigning ALL the logical cores to one VM generates 16 logical processors in Windows, the same as if I installed the VM on physical hardware. And whoa and behold, after 5 tests this arrangement has produced the fastest (and most efficient) render times.

F. The application in question is 3ds max 2014 using backburner and the Mental Ray renderer.

TL|DR: I (sometimes) want to run one VM on vSphere with as much CPU efficiency as possible, how?

I’m hoping to use VMWare’s ESXI / vSphere hypervisor in a bit of a non-standard way.

Normally people use a hypervisor to run multiple VM’s simultaneously on one system. I want to use the hypervisior to let me quickly switch between applications, but only ever really run one VM / App at a time.

It’s actually a pet project, I have a 5 node renderfarm (ea. node 2x Intel Xeon E5540) that for the most part stays off (when I’m not rendering I have no need to run these machines). It seems like a waste of valuable compute time so I was hoping to use them for other things when not rendering (kind of a general purpose 40 core / 80 thread compute cluster).

I was hoping that vSphere could let me spin up render node VM’s when rendering and other things when not. Problem is, I really really need a high efficiency when it comes to CPU when the render VM is running.

I’m using a render job as a benchmark and getting about 88% of the speed on the VM as I can get on a non-VM setup. I was hoping for closer to 95%, any ideas how I could get there?

EDIT: Details:

Resources being used by the render VM, I don’t fully understand why this bar is not full:

enter image description here

Resource settings for that VM:

enter image description here

Even though the VM doesn’t show as using 100% of the resources, the host does:

enter image description here

I don’t entirely understand the % shares here, is this when all these VM’s are on? Also I didn’t configure the other VM’s to reserve 10%:

enter image description here

Finally the host does show as being fully utilized, although not shown here, the MHz utilization is lower (IE not 100%):

enter image description here

VM Config:

enter image description here

I understand this is a interesting case, but nevertheless I feel the question is valid and good and may help others in a similar situation down the line (although I admit this case is quite specific).

My answer:

I think you’ve reached about the maximum of what you’re likely to get with those old Xeons, though unlike ewwhite I do not believe hyperthreading is causing you any sort of problem. Indeed, at least since ESXi 5.0, VMware has recommended using hyperthreading for most workloads, and your own testing seems to confirm that you are benefiting from HT. As ewwhite correctly notes, though, using HT will make some metrics in vSphere appear strangely.

I think you have one obvious issue and possibly one non-obvious issue here:

First is the obvious issue that virtualization itself incurs overhead that you can never fully eliminate. In the case of the CPU, certain instructions must be virtualized in order for the hypervisor to correctly isolate one virtual machine from another. Thus instead of executing the instruction directly, as in bare metal, the hypervisor will intercept the call and execute several instructions in its place. From prior experience we can see that 87-90% is about what you should expect for the CPU. Getting much past that would require a significant advance in hardware. If you’re now seeing 91% of native CPU performance, it’s probably about as good as it’s going to get.

Second is the non-obvious issue of NUMA. This is an issue with multiprocessor systems, where part of the memory is faster when accessed by the nearest CPU, and slower when accessed by other CPUs. Depending on how your rendering job handles memory, you might see some benefit by running two parallel renderers in two VMs, each of which is pinned to a specific CPU and always accesses the slightly faster memory. (If you run two VMs on a single host, each using half the available vCPUs, ESXi should sort this out automatically for you.) Though if you aren’t seeing this issue on bare metal, you probably will gain little benefit by trying this.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.