Choosing an Ethernet NIC for Linux 2.4

A few years ago, choosing an Ethernet NIC for Linux meant choosing one that works at all. Luckily, mostly thanks to Don Becker's work the situation has improved to a point where you normally can buy any NIC, plug it in your Linux box, and there will be a driver for it. Finding the driver is as easy as cat /proc/pci to find the chip.

This document is not about finding a NIC that will work. This document is about finding the best NIC for you. There are several advanced features that may or may not be important for your decision, and this document attempts to assist in the decision.

The information in this list is probably also helpful if you need a NIC for a different OS than Linux, but that's not my focus. Only PCI Fast or Gigabit Ethernet NICs are mentioned here.

Bugs

If you find information on this page to be in error, please submit a bug report to felix-linuxeth@fefe.de!

Features Of Modern NICs

To put it simply, a NIC is expected to send and receive as many packets and bytes as possible while keeping CPU and bus utilization as small as possible.

Scatter/Gather

To achieve gigabit throughput, it is important that the operating system does not copy the data in the packets before sending them (this is called zero-copy IP). Unfortunately, the kernel needs to put a header before the data in the packet, so not copying the data to a buffer in kernel space means that the NIC needs to be able to fetch the header from a different place in memory than the user data in the packet. This is called scatter/gather and is necessary for zero-copy IP.

Hardware Checksumming

IP and TCP have checksums in the headers. The checksum are easy to calculate, so it does not normally use up many CPU cycles, but at gigabit speeds, you need every CPU cycle you can get for the NFS, Samba, FTP or HTTP server software, so it is a good idea to off-load the checksum calculation to the NIC.

Alignment Issues

Many NICs can only transfer data from or to aligned memory addresses (i.e. page/cacheline/dword/whatever only). This is bad because the standard Ethernet header is 14 bytes long. That means that the IP and TCP headers starts at an unaligned addresses. The kernel has to access the IP and TCP headers, which means that those accesses will be much slower because access to unaligned locations is either much slower (x86 architecture) or not possible at all (RISC architectures need to copy the header to an aligned memory location first).

Multicast Support

All modern NICs have multicast support in form of either a MAC list or a hash table or both. If you want to use a NIC as a multicast router, be sure to get one with a hash table.

Interrupt Mitigation

Especially for gigabit NICs, the number of packets per second is very large. Each IRQ uses several hundred CPU cycles even if it returns immediately, so if the NIC generates one IRQ per packet, that would eat up all the performance. It is important to be able to tell the NIC to combine IRQs in times of heavy traffic.

Hardware Fragmentation

Every hardware medium has a maximum packet size it can handle. This is called Link MTU (MTU means Maxiumum Transfer Unit). The kernel can not send packets larger than that. When talking to targets behind a router, the Path MTU may be even lower. So, if the application sends 100k, the kernel takes that data and separates it into 68 packets, generates a header for each of them, and sends each of them to the NIC.

Some modern NICs are able to generate the headers for you. They still have a maximum packet size, but it's much larger (for example 16k). The operating system then sends a 16k packet to the NIC, the NIC splits the packet into 11 packets and sends those out.

Hardware Encryption

A few NICs can do encryption in hardware. This could be used to accelerate IPsec a great deal, but the vendors are unfortunately not very forthcoming with information that would make an open source driver possible.

Legend

Scatter/Gather means that the card can read from several nonconsecutive buffers at the same time.

HW csum means that the card can calculate or verify checksums. Can be yes, no, Rx (verify receiving frames only) or Tx (calculate checksums over outgoing frames only).

Align means that the card can do unaligned accesses. "no" in this category means that all packets will have to be copied once by Linux.

Multicast can be good (+) or bad (-). The first is for client side, the second for a multicast router.

IRQMit means that the card can batch IRQs (i.e. not an IRQ per packet but per bunch of packets). Can be yes, no, Rx (receiving only) or Tx (sending only).

HW frag means that the card can fragment packets.

HW crypt means that the card has hardware encryption support.

The Contenders

ChipsetScatter/GatherHW csumAlignMulticastIRQMitHW fragHW cryptRemarks
3com 9xxyesyes[8]yes--/+[2]Txnonovendor driver[4]
Intel eepro100-(+)-(+)yes-[9]??no?[3], vendor driver[5]
Intel Tulipyesnono++/++yes[7]nonogood back-off[1]
RealTek RTL8139nonono[6]+/-nonono
Starfire?yes?+yesnono"Adaptec DureLAN"
AMD PCnet??????no?good back-off[1]
SMSC EPIC/100 83C170??????no?
Winbond 83c840nono?+??no?
TI Thunderlanno?no??+/+?no?no?
VIA Rhineno?no??+/+?no?no?
Davicomno?no??+/+?no?no?
SysKonnectyesRxyes+/+yesnono

[1] AMD PCnet and Tulip use a special patented back-off scheme that according to Andi Kleen improves performance greatly under Linux on half-duplex links.

[2] According to Don Becker, the 3com EtherLink III family has really bad multicast support. You can set a bit and then you get all multicast traffic, or you don't get any multicast traffic at all. This is basically only usable in environments with smart switches or for multicast routers. However, the driver initializes a multicast hash table but does not use it afterwards, so this may be a driver issue after all, or maybe only current hardware versions support it.

[3] Intel has always had a very bad documentation policy, but recently they have released a manual. The chip appears to be quite capable, though, but Don Becker had some complaints about the quirky design.

[4] 3com has a vendor driver, but I have read several reports of it being really crappy. Use the one from the kernel instead.

[5] Intel has a proprietary driver which apparently supports more advanced chipset features but has a bad license and lags behind the kernel advancements. Entries like "-(+)" mean that the kernel driver does not support this feature but the vendor driver does. Andi Kleen told me that these features need special firmware support which is not in the standard firmware. The Intel driver uploads a new firmware. Newsflash: Intel changed the driver license to GPL!

[6] RealTek cards use a fixed ring of buffers for receiving packets, which basically means that the kernel always has to copy the incoming data. :-(

[7] Tulip hardware supports IRQ mitigation only for sending. There is a patch that implements "IRQ mitigation" for receiving by reverting to polling instead of IRQs. This is obviously only useful if your machine is heavily loaded and receives all the time.

[8] According to Dan Hollis, 3c905B/3c905C have hardware checksumming.

[9] The Intel eepro is very very slow updating its Multicast lists.

Testimonials

Dan Hollis says that the 3c905C is the most efficient card for 100BaseT in the moment.

I'm happy with my Tulip card, although it won't do zero-copy IP. It easily saturates an 100BaseT link.

The realtek chip is very popular because the chip is on many $10 ultra low cost NICs and Mark Hahn told me that despite the flawed design it works can easily saturate a 100BaseT link with big (not huge, though) load on modern machines. It's just not a sane choice for for servers. Doobee sent me a big warning about RealTek NICs with a quote from the FreeBSD driver.

Dan Hollis wrote in to tell me that rtl8139, rhine, and davicom are all horrible chipsets each with major design flaws and performance problems. He also told me that older tulips (21140) do not support nway (auto negotiation) and that the TI Thunderlan is a reasonable chipset to have.

Sun Happy Meal and Acenic based cards are also zero-copy capable, according to Ingo Molnar.

About Zerocopy IP in Linux 2.4

It basically works. To get any benefit from zerocopy TCP the server software has to use sendfile() instead of read()/write(). Alternatively, it can mmap() the data to be sent out and then use a big write().

NFS used to do as many as 5 copies of incoming data.

Credits

The first batch of information on this page came from Andi Kleen.

Jeff Garzik sent in corrections for tulip and realtek.

Thomas Ruf from SysKonnect provided the data for their chipset.

See also