Aug 24

A TCP packet is normally 1500 bytes large, 40 bytes of that being header information and 1640 of it being data. Trouble is, sometimes you don’t have 1640 bytes of data to send. In a telnet session for example, hitting enter is 1 byte, but we still need to send an entire packet for that single byte meaning 41 bytes on the wire.  This isn’t efficient and too many small packets like this can bring a firewall to its knees. A solution was put in place years ago called Nagles Algorithm that holds off on sending data until a larger packet can be assembled. Good for general use, not good for an iSCSI SAN.  Here are some worthless and misleading numbers.

Single SATA drive, P4 OpenSolaris Build 91 shared via iSCSI on a dedicated gig network with MTU at 1500.

while true; do dd if=/dev/zero of=test.img bs=1024 count=100000; rm test.img; done
102400000 bytes (102 MB) copied, 3.84045 s, 26.7 MB/s
102400000 bytes (102 MB) copied, 4.24423 s, 24.1 MB/s

Now with Nagle disabled

while true; do dd if=/dev/zero of=test.img bs=1024 count=100000; rm test.img; done
102400000 bytes (102 MB) copied, 2.13878 s, 47.9 MB/s
102400000 bytes (102 MB) copied, 2.14086 s, 47.8 MB/s

For the statisticians among you, I did several more tests and used other utilities like cp, but this is a blog and blogs are quick so work with me here.

A bug was opened with the OpenSolaris folks (6621560) but in the meantime, we can take care of this ourselves with one little command.   To check the status of Nagle, run the following as root

ndd -get /dev/tcp tcp_naglim_def

If that comes back with anything other then 1, Nagle is kicking you in the shorts. So lets fix that.

ndd -set /dev/tcp tcp_naglim_def 1

Tada. things should be a fair bit speedier now.  Unfortunatly, this is system wide and may impact thigs like webservers, but if you are running apache on your SAN you pretty much get what you deserve. Enjoy!

Apr 15

To get the speed and duplex of a NIC in Solaris, use the ndd command.

ndd -get /dev/bge0 link_speed
ndd -get /dev/bge0 link_duplex

Duplex values are 0 (disconnected) 1 (half) and 2 (full)

Apr 8

NICs like to talk to each other at the same speed and duplex. In fact, bad things happen if you have a mismatch in either of those, so a way to view and then set those parameters is important. Let’s start with the viewing. As root, run

ethtool $DEV

Where the $DEV is the network interface. You’ll get output similar to the following

root@Seraph:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 10Mb/s
Duplex: Half
Port: MII
PHYAD: 32
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: d
Current message level: 0×00000007 (7)
Link detected: no

The fields in bold tell you the capabilities of your interface, the current speed, the current duplex and whether the NIC is set to try to auto-negotiate it’s speed and duplex settings with the switch. While auto-negotiate will work in most cases for clients, when you start talking about servers you really need to hardcode the NIC. Do that with

ethtool -s $DEV speed 100 duplex full

And now $DEV is set to 100Mbs Full Duplex. You can verify that with another run of ethtool. An important tip to remember is to never run this command if you are connected to the server using this interface. If a speed or duplex mismatch occur, the server will become pretty much unavailable over that interface. Always use the serial console, the attached keyboard or a different interface then the one you are working on to connect to the server.