We recently spent some time troubleshooting some network performance issues with VMware ESX server running on our IBM x336 servers and thought that what we learned might be interesting to others. Network performance within a VM has always been only so-so, but generally good enough. With RHEL3/4 VM's on ESX 2.5 we were generally able to get 400-500Mb/s out of a VM using iperf, about half what the physical hardware could do, which I guess isn't too bad. With ESX 3 we had seen that number rise significantly, with RHEL4 VM's running the vmxnet driver approaching 800-900Mb/s within about 10% of the speed we were seeing on physical hardware.
Unfortunately, this performance increase didn't occur on all hardware platforms, specifically, our IBM HS20 and Dell 1850 servers saw a huge boost in network performance with ESX 3, while our IBM x336 system continue to perform at about 1/3 the speed of the physical hardware. This really showed up after we switched one of our remote sites from a fiber channel based SAN to an iSCSI solution.
When we first decided to switch from our fiber channel SAN (EMX Clariion CX series) to a simpler iSCSI solution (Equallogic PS series) at our remote sites we did so based on performance numbers that showed we could actually get better performance numbers out if the iSCSI solution, even with software iSCSI, that with the aging, and far more complex CX solution. When we brought the PS300E into the lab for testing against an HS20 blade we were amazed that we could acheive performance of 85-90MB/s between a single VM using a single GigE link, near 90% utilization. Running two VM's using two GigE connections got us to 180MB/s pretty easily, significantly faster than we had ever managed to get our CX400 to acheive.
We purchased our PS300E and installed it at the remote site where ESX server was running on IBM x336 systems. We then re-ran the same benchmarks against the PS300E using the production systems. This time the performance numbers were not so good, with transfer rates in the 30-40MB/s range. We broght the array back to the lab site and ran our reference benchmark against the new PS300E, performance was aroun 90MB/s, right about what we expected.
We really couldn't understand what was going on, the specs between the two hardware platforms were nearly idenctical, Dual Intel Xeon 3.2Ghz processors, 8GB RAM, Broadcom GigE NIC's. Why would there be such a disparity? We suspected a network layer problem but after looking at switch configs over and over we just didn't see any of the usual suspects (duplex mismatch, increasing error counts, etc).
Since we were moving from Fiber Channel to iSCSI we had purchased a Intel Pro/1000 GT Quad card to replace the Qlogic FC HBA (yes, we could have purchased a hardware iSCSI HBA but our tested had indicated that this really would not be required to achieve the level of performance we needed and, based on forum feedback, the VMware support for Qlogic iSCSI adapters seemed a little flaky). We installed the Intel Pro/1000 into one of the IBM x336 cards and re-ran the test. Boom, performance was back in the 90MB/sec range.
We could have probably stopped there, we didn't absolutely need to use the onboard ports, however, now we were curious why the Broadcom onboard ports on a x336 would perform so much worse than the onboard Broadcom ports on an HS20. We began to investiage possible scenarios and ran across something interesting, when looking at the various files in /proc we came across the /proc/vmware/interrupts file and noticed it looked like this (edited slightly to fit on the page):
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
0x21: 2 0 0 0 COS irq 1 (ISA edge), <VMK device>
0x29: 0 0 0 0 <COS irq 4 (ISA edge)>
0x31: 0 0 0 0 <COS irq 6 (ISA edge)>
0x39: 0 0 0 0 <COS irq 7 (ISA edge)>
0x41: 0 0 0 0 <COS irq 8 (ISA edge)>
0x49: 0 0 0 0 <COS irq 9 (ISA edge)>
0x51: 15 0 0 0 COS irq 12 (ISA edge)
0x59: 0 0 0 0 <COS irq 13 (ISA edge)>
0x61: 1 0 0 0 COS irq 14 (ISA edge)
0x69: 0 0 0 0 <COS irq 15 (ISA edge)>
0x71: 68505248 0 0 0 COS irq 16 (PCI level), VMK vmnic6, VMK vmnic1
0x79: 108 0 0 0 COS irq 18 (PCI level)
0x81: 0 0 0 0 COS irq 19 (PCI level)
0x91: 155710 649201 289501 732347 <COS irq 20 (PCI level)>, VMK ips
0x99: 21577461 60183980 21031867 51667695 <COS irq 21 (PCI level)>, VMK vmnic2
0xa1: 106442 348866 139914 365738 <COS irq 22 (PCI level)>, VMK vmnic3
0xa9: 9503152 27310088 9603280 24230277 <COS irq 23 (PCI level)>, VMK vmnic4
0xb1: 106439 348786 140019 365653 <COS irq 24 (PCI level)>, VMK vmnic5
0xdf:417335059 417060518 418055868 416873770 VMK timer
0xe1: 3739008 1903562 4602326 2230321 VMK monitor
0xe9: 92395997 68149255 95110679 69560265 VMK resched
0xf1: 15628 55045 42367 53573 VMK tlb
0xf9: 214270 0 0 0 VMK noop
0xfc: 0 0 0 0 VMK thermal
0xfd: 0 0 0 0 VMK lint1
0xfe: 0 0 0 0 VMK error
0xff: 0 0 0 0 VMK spurious
Of note is Vector 0x71 which shows two things, first, notice that interrupts for this vector are not distributed evenly to all of the CPU's. Looking further to the right you can see that this is because the Console OS (COS) has at least one driver with ownership of this interrupt (note the lack of angle brackets around the "COS irq 16 (PCI level)" in the far right column), while the VM kernel (VMK) has ownership of both vmnic1 and vmnic6, which matches up with the two Broadcom onboard adapters. Since the Console OS owns runs only on the first processor this forces all interrupts to be handled by the first processor, and then this processor must execute the interrupt handler for the Console OS and then the VM kernel which requires context switches and a fair amount of overhead. This is actually fairly well documented for ESX 2.x in
VMware KB article 1290 and it would seem that this still applies to ESX 3.
Our task then, is to figure out what driver in the Console OS is actually using the interrupt on the same vector with the VM Kernel. This is pretty easy with following command:
# cat /proc/vmware/pci | grep 0x71Bus:Sl.F Vend:Dvid Subv:Subd Type Vendor ISA/irq/Vec P M Module Name
000:29.0 8086:24d2 1014:02dc USB Intel 11/ 16/0x71 A C 007:00.0 14e4:1659 1014:02c6 Ethernet Broadcom 11/ 16/0x71 A V tg3 vmnic1 008:00.0 14e4:1659 1014:02c6 Ethernet Broadcom 11/ 16/0x71 A V tg3 vmnic6
Of course you should replace 0x71 with the specific interrupt vector that is shown as shared on your system. Note that the "V" and "C" in the column labled "M" (before the "Module" and "Name" columns) shows which vector is used by the VMkernel (V) or the Console OS (C). In my example the console OS is attached to the USB interface while the two onboard NIC's are assigned to the VMkernel. So now we know that Irq 16, which according to the output above corresponds with vector 0x71, attaches the USB hardware to the Console OS and the Broadcom nics to the VMkernel. If we want to know specifcally which driver the Console OS is using to control the device we can simply run the following:
# cat /proc/interrupts
CPU0
0: 41270280 vmnix-edge timer
1: 4 vmnix-edge keyboard
2: 3075039 vmnix-edge VMnix interrupt
12: 15 vmnix-edge PS/2 Mouse
14: 5 vmnix-edge ide0
16: 76567 vmnix-level usb-uhci
18: 156 vmnix-level usb-uhci
19: 0 vmnix-level ehci-hcd
This shows that Irq 16 is being using by the usb-uhci driver in the console OS.
Ideally at this point you would simply modify your systems BIOS settings to reallocate the USB hardware interrupt onto a different interrupt than the onboard NICs, or perhaps disable USB altogether if you don't need it. Unfortunately in the case of the IBM x336 there does not appear to be a way to change this. You can assign the onboard NIC and USB to a different IRQ, but that just moves the problem interrupt from one vector to another, since they still both share the same one. At first I thought of just disabling the onboard USB as I really didn't need it for anything, but I couldn't find such an option in the IBM BIOS.
So, thwarted by the system hardware I decided there were other options. First, I was pretty sure I didn't need USB support for normal operation so I thought I would simply disable the USB driver in the console OS. This is actually pretty simple, just run "modprobe -r usb-uhci" and it will unload the USB driver. As soon as you do this the output of the above commands change, look at the following:
#cat /proc/vmware/interrupts
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
0x59: 0 0 0 0 <COS irq 13 (ISA edge)>
0x61: 1 0 0 0 COS irq 14 (ISA edge)
0x69: 0 0 0 0 <COS irq 15 (ISA edge)>
0x71: 68708836 1024 9813 7738 <COS irq 16 (PCI level)>, VMK vmnic6, VMK vmnic0x79: 156 0 0 0 <COS irq 18 (PCI level)>0x81: 0 0 0 0 COS irq 19 (PCI level)0x91: 157878 658589 292915 741902 <COS irq 20 (PCI level)>, VMK ips
I edited this but the important part to note is the the line with vector 0x71 now still show the VMkernel claiming the interrupts for vmnic1 and vmnic6, and that the Console OS has devices that it can claim, however, the presence of the angle brackets indicate that no driver is loaded by the Console OS for this device. You can also see that the interrupts have started being distributed across all CPU's. Also, the output of "cat /proc/interrupts" now shows no sign of interrupt 16 being claimed by any driver:
CPU0 0: 41365483 vmnix-edge timer 1: 4 vmnix-edge keyboard 2: 3083573 vmnix-edge VMnix interrupt 12: 15 vmnix-edge PS/2 Mouse 14: 5 vmnix-edge ide0 19: 0 vmnix-level ehci-hcdNMI: 0 LOC: 0 ERR: 0MIS: 0 After removing the driver we reran our benchmark and, sure enough, performance with the onboard Broadcom NIC's were roughly equal to the performance of the Intel Pro/1000 adapter. If we ran "modprobe usb-uhci" performance would return to it's previously poor state, while unloading the driver with "modprobe -r usb-uhci" always returned the system back to good performance.
This proved that the usb-uhci driver was the culprit. We initially simply modified modules.conf to remove the usb-uhci driver completely, but that turned out to have a nagative side effect that the IBM Remote Server Administrator card would no longer allow me to use the keyboard and mouse. Apparently the RSA emulates a USB keyboard and mouse and thus removing the driver removed the ability to use the remote keyboard. This really isn't that big of a problem since I usually just SSH into the system anyway, and if the system has crashed so hard that I can't ssh into it it's unlikely it will respond to the keyboard either. Plus, I can still use the RSA remote power features to power cycle the system and the keyboard works fine during the boot process.
Still, I decided to write a small script that simply unloads the usb-uhci driver 5 minutes after VMware ESX boots. That way if I'm attempting to troubleshoot something like a network connectivity issue, I can simply reboot the system, log in with the RSA and kill the script before the five minutes are up. It's a cheap trick, but it actually works pretty well.
Another option I've thought about was to simply use the onboard NIC's as standby adapters, so that they are only used if the Intel NIC malfunctions or looses it's network connection. Not a bad option really, even at only 300Mb/s they will still likely survive the normal workday without any major issues.
I've even considered a more complex approach, something like running mon in the console OS, if it can't ping the default gateway, thus indicating a network problem, it could load the udb-uhci driver so that the RSA console would work, then, once the network is restored it could unload the driver.
Anyway, lots of options, but, in the end, it was an interesting experience and I learned a good bit about ESX server and how the VMkernel and Console OS interact with the hardware. Hopefully you'll find it interesting too.
Wednesday, January 17. 2007 at 14:32 (Reply)
Saturday, April 21. 2007 at 00:09 (Reply)
There is a note on the VMware KB about problems with certain Broadcom 5700/5701 NICs.
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=2242
Interesting about the USB driver. Recently saw a Vmware PSOD get fixed by upgrading the Firmware on a HP DL38x.
The listed fix was the USB.
Thanks again.
Tuesday, October 23. 2007 at 14:43 (Reply)
THANK YOU.
Wednesday, November 28. 2007 at 11:56 (Reply)
I just could figure out why we had such a difference in performance on 3 identical (hardware wise) servers. I started looking into context switching problems and found you article by dumb luck.
I found that two of them had USB loaded and with the problems you described. And to make the situation worse the collided with RAID also (we use local discs)!
Again, thanks for a great article.
Wednesday, May 13. 2009 at 14:05 (Reply)
If your NIC's share interrupts with your RAID card, that's probably OK since both of those devices will be owned by the VMkernel. The problem occurs when an interrupt is shared by two or more physical devices and on some of those devices are controller by the Console OS while others are controlled by the VMkernel.
The most common scenario seems to be the USB drivers because the VMkernel doesn't use the USB hardware but the Console OS does. If your USB controller (a device controlled by the Console OS) shares an interrupt with your NIC, or your storage controller (devices controlled by the VMkernel) then your likely to see at least some performance penalty.
Thursday, June 25. 2009 at 18:46 (Link) (Reply)
A very good posting.
Sunday, June 28. 2009 at 20:22 (Link) (Reply)
Tuesday, January 22. 2008 at 17:22 (Reply)
This article helped me to solve my problem with network throughput completely! We were getting 10MB/sec until I unloaded the USB drivers. Thanks!
Sunday, April 27. 2008 at 11:17 (Link) (Reply)
Monday, December 22. 2008 at 03:42 (Reply)
Thursday, April 16. 2009 at 09:10 (Reply)
Monday, May 4. 2009 at 18:20 (Reply)
I pretty much assumed that anyone charged with administering VMware ESX servers would be able to write a two line script. Heck, is two lines really even a script? You could just as easily throw these two commands in rc.local, but we typically create scripts with names and comments so we get hinted as to what they do, our actual script has a more detailed comment explaining why we even need such a stupid script in the first place.
#!/bin/sh
#
# stopusb
#
# This is a stupid script to simply sleep for 300 seconds
# after boot and then unload the usb-uhci driver from the
# service console.
sleep 300
/sbin/modprobe -r usb-uhci
Tuesday, May 12. 2009 at 20:14 (Link) (Reply)
I mean, is that command can affect the working users if I doing that from remote site ?
Wednesday, May 13. 2009 at 13:37 (Reply)
/sbin/modprobe -r usb-uhci
Wednesday, May 13. 2009 at 17:58 (Link) (Reply)
It's an IBM x3500
I know if I go into the Infrastructure utility, I can see that a USB as been installed for the VM machine but says it's not supported, is it possible that cause the slow transfer event if i did the modprobe -r usb-uhci ?
Or do you have any other suggestions?
Wednesday, May 13. 2009 at 22:27 (Reply)
This article is written to teach you how to determine if you are having an interrupt sharing conflict between the Console OS and the VMkernel and, if so, determine which driver is causing the conflict. If you are having an interrupt sharing issue you must then determine how to resolve it, either by unloading a driver or perhaps reconfiguring your hardware, sometimes simply moving cards around will work. If you're not having an interrupt conflict then nothing in this article is going to help.
So, in summary, did you first determine that you were having an interrupt problem before you rushed to trying to unload the driver? If not, you should do that first.
Wednesday, May 13. 2009 at 10:40 (Reply)
Wednesday, May 13. 2009 at 13:54 (Reply)
Friday, May 15. 2009 at 11:38 (Link) (Reply)
http://kb.vmware.com/selfservice/viewContent.do?externalId=1003710&sliceId=2#determine
This explain the exact same problem, but update for ESX 3.5, and with some new info.
I have the same problem with a couple of Intel Server with Quad NICs and HBA cards, but my conflict is with the COS Keyboard! So I can't just deactivate it... I'll try disabling stuff in the BIOS, along with moving NIC and HBA cards to different slots to see. But I'm sure it will be resolved some how. Thanks anyway for pointing my strange problem in the right direction!
Tuesday, June 9. 2009 at 21:14 (Reply)
Instead of scripting the unloading of the device driver I simply commented out the load line in /etc/modules.conf.
Cheers.
Tuesday, June 9. 2009 at 22:06 (Reply)
Monday, June 29. 2009 at 06:30 (Reply)
and how to apply this on esxi? AFAIK there is no modprobe... thanks!
seb
Wednesday, July 1. 2009 at 20:55 (Reply)
That doesn't mean that ESXi couldn't have a network performance problem, but it wouldn't be caused by an interrupt problem with the service console.
Friday, July 3. 2009 at 14:57 (Reply)
See following from a HP DL380 G5/ESX 4(Hope it is true):
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3 PCPU 4 PCPU 5 PCPU 6 PCPU 7
0x21: 0 0 0 0 0 0 0 0 VMK ACPI Interrupt
0x22: 2246551 1584882 473506 1230378 2258795 1698829 2965905 3903666 VMK vmnic6
0x29: 1 0 0 0 0 0 0 0 COS irq 1 (ISA edge)
0x2a: 92310 197625 37995 65280 421543 91545 117300 51510 VMK vmnic7
0x31: 3 0 0 0 0 0 0 0
0x32: 181077 216495 173145 136425 78285 65792 89505 134385 VMK vmnic8
0x39: 5 0 0 0 0 0 0 0
0x3a: 49470 155550 41849 116535 372300 134130 49980 155295 VMK vmnic9
0x41: 1 0 0 0 0 0 0 0
0x49: 1 0 0 0 0 0 0 0 COS irq 8 (ISA edge)
0x51: 0 0 0 0 0 0 0 0
0x52: 2903018 1793928 1995905 1969202 929731 1062336 1156449 1099319 VMK cciss0
0x59: 94 0 0 0 0 0 0 0 COS irq 12 (ISA edge)
0x61: 0 0 0 0 0 0 0 0
0x69: 75888 72676 58396 74970 65537 53041 56100 71910 , VMK libata
0x71: 0 1 0 0 0 0 0 0 , VMK libata
0x79: 0 0 0 0 0 0 0 0 , VMK uhci_hcd:usb2, VMK ehci_hcd:u sb1
0x81: 18987345 11599593 21037269 23424302 22144712 22630377 21186424 21527883 , VMK lpfc820, VMK uhci_hcd:usb3
0x89: 89505 125775 12623 76118 39015 63368 64388 28560 , VMK lpfc820, VMK uhci_hcd:usb4
0x91: 0 0 0 0 0 0 0 0 , VMK uhci_hcd:usb5
0x99: 0 0 0 0 0 0 0 0
0xa1: 0 0 0 0 0 0 0 0
0xa9: 76 0 0 0 0 0 0 0 , VMK uhci_hcd:usb6
0xb1: 0 0 424671144 0 0 0 0 0 VMK vmnic0
0xb9: 0 0 15887664 0 0 0 0 0 VMK vmnic1
0xc1: 17150540 7276352 17066888 14817717 17940338 18078803 92458769 91250704 VMK vmnic2
0xc9: 2963036 2342438 75735 632658 187425 198394 1087830 1232417 VMK vmnic3
0xd1: 4119015 3178263 385306 1056721 188955 260355 539332 622710 VMK vmnic4
0xd9: 96900 378165 116790 117839 73950 126225 69360 95880 VMK vmnic5
0xdf: 1140618460 1220555952 1257201109 1260270679 1283699480 1285351766 1284983403 1286730325 VMK timer
0xe1: 393180 2941670 3891808 4389559 3531838 3701197 3222568 3274408 VMK monitor
0xe9: 570608175 737958186 354096437 378296109 372263598 372007246 382907516 382532491 VMK resched
0xf1: 46980 262720 283068 267401 299996 285801 280138 273801 VMK tlb
0xf9: 65053074 0 0 0 0 0 0 0 VMK noop
0xfc: 0 0 0 0 0 0 0 0 VMK thermal
0xfd: 0 0 0 0 0 0 0 0 VMK lint1
0xfe: 0 0 0 0 0 0 0 0 VMK error
0xff: 0 0 0 0 0 0 0 0 VMK spurious
Friday, July 3. 2009 at 18:05 (Reply)
That being said, something sure is suspicious about vmnic0 and vmnic1. There's no interrupt sharing going on there at all, which would imply the problem might still exist. The output of /proc/vmware/pci and /proc/interrupts would be interesting.
Saturday, July 4. 2009 at 17:12 (Reply)
this is strange - I encouter exactly the same problems (very low iscsi speed).
May I link to my thread with screenshots?
I still have no idea what to do.
http://communities.vmware.com/thread/218231?tstart=0
Saturday, July 4. 2009 at 18:19 (Reply)
Still, in perusing your thread you seem to be trying to connect your ESXi system to some non-certified desktop iSCSI box because you say "it's iSCSI and ESXi supports iSCSI". Well, ESXi does support iSCSI, but it expects iSCSI devices to support enterprise features like SCSI reservations and battery backed write back cache (so that synchronous writes don't cause full cache flushes which hammer performance, especially on writes). These desktop style boxes do "support iSCSI" but they're not enterprise iSCSI solutions and are not on the ESXi HCL thus they are not certified or supported. They might work, but probably not well. Even many enterprise iSCSI arrays have suffered from performance problems when used with iSCSI because of high overhead for synchronous write or poor performance with iSCSI reservations.
You can work around many of these issues by using the iSCSI initiator from within your guest VM's.
Tuesday, July 7. 2009 at 13:06 (Reply)
See the /proc/interrupts:
# cat /proc/interrupts
CPU0
0: 248973570 timer
1: 10 i8042
2: 519808144 VMnix interrupt
8: 1 rtc
12: 104 i8042
NMI: 0
LOC: 0
ERR: 0
MIS: 0
On ESX 4 (vShpere), they moved the /proc/vmware/pci (at least i could find it in /proc/vmware/), they also remove the modules.conf from /etc.
could not find any doc for this. Looks like they did change many ESX4's code.
Tuesday, August 4. 2009 at 07:49 (Reply)
Tuesday, August 4. 2009 at 16:01 (Link) (Reply)