I am currently waiting for some code to compile and found a bit of time to type up a quick blog.
As described in a previous post, I am have set up my laptop to host Windows in a virtual machine guest with Mac OX (Mountain Lion) as the host.
Today, I needed to compile some code and thinking I was being clever, I put the code on the host OS (OSX) and shared the folder via VM-ware to the Windows guest OS.
Compiling from inside VM-ware
I ran my msbuild process from the shared folder, and it took FOREVER to compile. The obvious choice here is of course to blame virtualisation itself – after all, I only have 2 cores allocated to the virtual.
But not so fast! Our tuning knowledge comes in quite handy here. Have a look at the CPU pattern while I am compiling:
Just like with SQL Server, I start my tuning at a very high level (in this case, task manager) and dig in from there.
The first question we ask ourselves as tuners is: Does what I see make sense?
In this case, it obviously doesn’t. MSBUILD is set up to build highly parallelised, it should be using my cores and there are no obvious I/O bottleneck in the system. Having 50% of two cores busy (and with high kernel times) looks a lot like a single threaded bottleneck to me. The build was taking over 15 minutes, which was much longer than expected.
Diagnosing the Problem – our friend Xperf
Normally, I use xperf to troubleshoot servers. But it sure comes in handy for misbehaving client machines too.
Task manager only shows that the time is spend in the process d.exe – which is part of the build process. Is the compiler bad or must we look elsewhere? Sure would be surprising if the compiler used all the kernel time wouldn’t it?
Here is the quick and dirty CPU “zoom in” xperf command to get the details we need:
- xperf –on latency –stackwalk profile
- …wait a bit
- xperf –d <myfile>.etl
This captures a sample of the stack and CPU usage of each process and kernel module. From here, it is quite clear what is going on – let me walk you through the analysis.
First, open the trace with xperfview. I recommend staying with the Win7 version of xperfview, as the Win8 interface is… well… a Win8 interface.
Pick the CPU Sampling per CPU, right click and choose Summary Table:
From here, pick the columns: Module, CPU and % Weight which allows you to summarise by module. On my box, it looks like this:
Aha!… Most of the CPU burn goes in vmci.sys (just ignore intelppm.sys). This isn’t a part of Windows. Its relatively easy to trace this file back to VM-ware.
So, who calls into this kernel module? Adding the stack column after the module, we can see that too:
Eureka: It is file system access that is causing the slowdown. See the call stack? Starts from GetFileAttributesW and ends up inside vmci.sys.
Fixing the problem
Now, before you go ahead and conclude that VMware adds a horrible overhead to I/O, lets just try to move the source files into the guest OS itself. Recall that my machine is using VM-ware shared folders to access the source code. It might simply be the sharing framework that is acting strange…
The results of using the guest OS’s file system is staggering. Running the build process now looks like this at the CPU level:
And the total build time is down from over 15 minutes to less than 3 minutes.
Thank you xperf…
At this point, we are reduced to guessing what is going on