Been struggling for a while with high CPU load (0.40) but CPU usage was at best 10%.

Setup

Machine is an Intel Xeon E3-1240 v3, Fedora 26, running on a mdadm RAID 5 array. Including few virtual machines which at one moment I had to move to the RAID, as they would not fit the boot disk anymore.

One of the virtual machines runs Graphite / Carbon, receiving about 600 measurements / minute from the local network. The other two VMs were started, but idling.

Once the number of messages increased, so did the CPU load of the host machine (although top only showed the CPU was used at less than 10%).

Analysis

It was obvious that the system was busy waiting for IO (network or disk) BUT how to go about finding out who was the culprit ?

  1. iotop reported minimal disk activity (by the VM that was receiving collectd messages), but very little compared to what the system could handle. I couldn’t see how those writes would block.
  2. network traffic was quiet, except a little spike every minute when the collectd messages were passed, not much happening, again, nothing that should block.

The highest consumers of CPU (percentage, not load) were the three qemu-system-x86_64 processes. Decided to strace the highest consumer:

Checking the collectd.strace, it was littered with:

Checking the file descriptors:

At first glance, there were many calls to file descriptors that represented eventfd. Started looking on Google to understand what this is needed for and very shortly found this presentation: Applying Polling Techniques to QEMU. Shortly, the qemu polling was ancient but there is a new kid on the block, the aio=native that solves the old problems. Hmmmm

aio=native is available since qemu 2.9.0 and my installed version is:

Going over the virtual machines, all the disks were defined as IDE and under Performance Tab the “Cache Mode” was “Hypervisor default” and “IO Mode” was also “Hypervisor default”:

I decided to switch them to “Cache mode”: none and “IO mode”: native

and lo and behold, the average load dropped from 0.40 down to 0.20, very nice!

Aftermath

Besides qemu-system-x86_64 one of the other CPU hogs was md0_raid5. In theory, the RAID disks should be kept as idle as possible and I decided to buy an SSD and move the VM images off the RAID onto the SSD. Few hours later, the system has stabilized to almost 0 for the 15minute average – yey!:

Conclusion

Problem solved by aio=native,cache=none for each virtual machine.

Further investigations

Try out other type of caching (instead of none) – and measure differences.