Sep 2

The other night, I moved from a ZFS pool on 3 IDE drives to a pool on 3 SATA drives on the same box. The volumes that I moved though were exported via iscsi to a vmware server, so they had no file system that I could just run a cp on. Instead, I used the versatile ZFS send and receive commands. The source pool is called tank and the target is vault

zfs send tank/Linux | zfs receive vault/Linux

seriously. That’s it. The new volume is created and we’re good to go. Notice that “|”. The pipe character gets any creative thinking geek a little excited because it means we’re working in the realm of STDIN and STDOUT. So let’s say for a minute that my pools were on different machines. I could perform the same function but use ssh to transport the data between servers like so.

zfs send tank/Linux | ssh new-san “zfs receive vault/Linux”

Aug 24

A TCP packet is normally 1500 bytes large, 40 bytes of that being header information and 1640 of it being data. Trouble is, sometimes you don’t have 1640 bytes of data to send. In a telnet session for example, hitting enter is 1 byte, but we still need to send an entire packet for that single byte meaning 41 bytes on the wire.  This isn’t efficient and too many small packets like this can bring a firewall to its knees. A solution was put in place years ago called Nagles Algorithm that holds off on sending data until a larger packet can be assembled. Good for general use, not good for an iSCSI SAN.  Here are some worthless and misleading numbers.

Single SATA drive, P4 OpenSolaris Build 91 shared via iSCSI on a dedicated gig network with MTU at 1500.

while true; do dd if=/dev/zero of=test.img bs=1024 count=100000; rm test.img; done
102400000 bytes (102 MB) copied, 3.84045 s, 26.7 MB/s
102400000 bytes (102 MB) copied, 4.24423 s, 24.1 MB/s

Now with Nagle disabled

while true; do dd if=/dev/zero of=test.img bs=1024 count=100000; rm test.img; done
102400000 bytes (102 MB) copied, 2.13878 s, 47.9 MB/s
102400000 bytes (102 MB) copied, 2.14086 s, 47.8 MB/s

For the statisticians among you, I did several more tests and used other utilities like cp, but this is a blog and blogs are quick so work with me here.

A bug was opened with the OpenSolaris folks (6621560) but in the meantime, we can take care of this ourselves with one little command.   To check the status of Nagle, run the following as root

ndd -get /dev/tcp tcp_naglim_def

If that comes back with anything other then 1, Nagle is kicking you in the shorts. So lets fix that.

ndd -set /dev/tcp tcp_naglim_def 1

Tada. things should be a fair bit speedier now.  Unfortunatly, this is system wide and may impact thigs like webservers, but if you are running apache on your SAN you pretty much get what you deserve. Enjoy!

Aug 22

Solaris had seen better days with the release of Solaris 9.  No ground breaking innovations had occurred, the sparc architecture had started to lose it’s place as the data center chip of choice and linux was really kicking it in the teeth with it’s ease of access by the younger sysadmins. An x86 version existed, but it was really just a hobby OS  and no data center in its right mind would deploy it as production.  Things looked bleak, and then came Solaris 10. Solaris 10, and the cool threads/niagra CPUs, helped to put the shine back on Sun. Zones and containers helped to virtualize server hardware, giving a bit more return on investment, but what really did it for the geeks was ZFS.   ZFS is coined “the last word in file systems” and I gotta say, I believe it.  It combines LVM, RAID a journaled atomic file system and manages to increase performance all at the same time. Add to the equation that VMWare recently released ESXi (the bare metal hypervisor that they had been charging 3500 per node for) and you have a really sweet SAN backed virtualization solution in the making.

First things first, install open solaris and immediatly patch it.  You can find instructions on how to do that Here but the condensed version is

pfexec pkg refresh
pfexec pkg image-update
pfexec mount -F zfs rpool/ROOT/opensolaris-2 /mnt
pfexec /mnt/boot/solaris/bin/update_grub -R /mnt

Depending on your internet connection this may take an hour or a few.  The reason for the upgrade is that the shipping version of Open Solaris (2008.5) has a bug with the serial number generation that prevents VMWare from using volumes exported via iscsi.   Once you’ve upgraded solaris, we need to create our pool. We’re going to assume three drives, c0t0d0, c0t0d1 and c0t0d2 and we’re going to put them into a raidz (better look this up, think raid 5 but better)

zpool create tank raidz c0t0d0 c0t0d1 c0t0d2

And you can check your handy work by running

zpool status -v tank

So, we now have a zfs pool called tank that is made of 3 drives we’re going to create a 100 gig volume that we’ll use in the SAN.

zfs create -V 100g tank/iscsi-vol

We now have a 100 gig volume in /tank called iscsi-vol. Next step is to share that bugger out via iscsi

zfs set shareiscsi=on tank/iscsi-vol

and we’re done. you can verify with

iscsitadm list target -v

Now that we have the volume shared out, we need to get access to it with vmware. I’m assuming here that you have a single ESXi 3.5 Update 2 node to play with, so this is assuming a virtual center client to a single ESXi node. This is a pretty simple operation.  In the vmware console, click on configuration and go to networking.  add a vmkernel and then click properties and enable iscsi for that adapter.  Back to the main configuration tab,  click on storage adapters and select properties for the iscsi software adapter.  You’ll need to enable the device and then click on and close the window.  Open that property window again and go to dynamic discovery. Here you’ll add the IP of the Solaris box and then click ok.

Right click on the iscsi adapter and select rescan, this may take a minute.   When it’s done go into storage and click add storage. Looky what shows up in your vmfs storage pools, our new 100 gig volume.

Aug 21

I’m going to have a write up on this shortly, but just to give you a heads up. I’ve built a VMWare ESXi node with an old P4 (socket 478 baby) that I’ve connected to another P4 running OpenSolaris acting as an iscsi SAN with ZFS backed storage. So far, I’ve been really pleased with both ESXi and solaris iscsi target support, but I’ve been flat out blown away by ZFS. Watch this space.

Jun 25

It couldn’t get much easier then this. Assuming that your SAN is on 192.168.168.168 and that your lun doesn’t need CHAP authentication, give these commands a run.

iscsiadm add discovery-address 192.168.168.168:3260
iscsiadm list target
devfsadm -i iscsi

Now you can run format and access that disk,
enjoy!

« Previous Entries Next Entries »