Feb 17

By default, when you enable iscsi sharing within zfs, the share is created and bound to all available ethernet interfaces. This isn’t necessarily a bad thing, but if for some reason you can reach your iscsi share via two paths, you run the chance of sending iscsi traffic over a non optimized path and really messing with your performance. Fortunately, a way exists to bind iscsi to specific interfaces using interface groups.

First, we need to create the interface group. This is assuming that the IP 192.168.1.1 is the IP address that is assigned to the interface (or in the case of bound channels, multiple interfaces) that you want a specific share to use.

iscsitadm create tpgt
iscsitadm modify tpgt -i 192.168.1.1 1

A quick

iscsitadm list tpgt -v 1

Will let you know if this worked.

Now that your interface group is created, all you have to do is bind it to a specific share.

iscsitadm modify target -p 1 zpool/iscsiTarget

Done! This leaves open some interesting opportunities for using the same iscsi SAN to service connections on different networks in a relatively secure manner. Have fun!

Sep 3

How many of you saw this one coming? In my haste, I accidentally did a zpool destroy -r on the wrong pool this morning and offlined my SAN. Not good. I walked around the entire rest of the day bumming over all of the stuff I’d lost since my last backup to tape because now all of my snapshots were gone. I spent my lunch hour browsing some docs over on sun admin and came across what ended up being my salvation. How to undestroy a destroyed pool. To save you the time, you need to know two commands

zpool import -D

That will show you what pools you have that are still around, but have been deleted.

zpool import -Df tank

That command does the leg work and brings the good old tank pool back online. Hopefully you didn’t lose any drives during this process, but in theory you should be able to recover a degraded pool. Best bet though, don’t try to clean up the SAN at 5:30am before you go to work.

Sep 2

The other night, I moved from a ZFS pool on 3 IDE drives to a pool on 3 SATA drives on the same box. The volumes that I moved though were exported via iscsi to a vmware server, so they had no file system that I could just run a cp on. Instead, I used the versatile ZFS send and receive commands. The source pool is called tank and the target is vault

zfs send tank/Linux | zfs receive vault/Linux

seriously. That’s it. The new volume is created and we’re good to go. Notice that “|”. The pipe character gets any creative thinking geek a little excited because it means we’re working in the realm of STDIN and STDOUT. So let’s say for a minute that my pools were on different machines. I could perform the same function but use ssh to transport the data between servers like so.

zfs send tank/Linux | ssh new-san “zfs receive vault/Linux”

Aug 24

A TCP packet is normally 1500 bytes large, 40 bytes of that being header information and 1640 of it being data. Trouble is, sometimes you don’t have 1640 bytes of data to send. In a telnet session for example, hitting enter is 1 byte, but we still need to send an entire packet for that single byte meaning 41 bytes on the wire.  This isn’t efficient and too many small packets like this can bring a firewall to its knees. A solution was put in place years ago called Nagles Algorithm that holds off on sending data until a larger packet can be assembled. Good for general use, not good for an iSCSI SAN.  Here are some worthless and misleading numbers.

Single SATA drive, P4 OpenSolaris Build 91 shared via iSCSI on a dedicated gig network with MTU at 1500.

while true; do dd if=/dev/zero of=test.img bs=1024 count=100000; rm test.img; done
102400000 bytes (102 MB) copied, 3.84045 s, 26.7 MB/s
102400000 bytes (102 MB) copied, 4.24423 s, 24.1 MB/s

Now with Nagle disabled

while true; do dd if=/dev/zero of=test.img bs=1024 count=100000; rm test.img; done
102400000 bytes (102 MB) copied, 2.13878 s, 47.9 MB/s
102400000 bytes (102 MB) copied, 2.14086 s, 47.8 MB/s

For the statisticians among you, I did several more tests and used other utilities like cp, but this is a blog and blogs are quick so work with me here.

A bug was opened with the OpenSolaris folks (6621560) but in the meantime, we can take care of this ourselves with one little command.   To check the status of Nagle, run the following as root

ndd -get /dev/tcp tcp_naglim_def

If that comes back with anything other then 1, Nagle is kicking you in the shorts. So lets fix that.

ndd -set /dev/tcp tcp_naglim_def 1

Tada. things should be a fair bit speedier now.  Unfortunatly, this is system wide and may impact thigs like webservers, but if you are running apache on your SAN you pretty much get what you deserve. Enjoy!