iSCSI-Root mini-HOWTO

Britt Bolen bb@eludicate.com

V2.0.1, 22 January 2004


This mini-HOWTO tries explains how to set up a ``diskless'' Linux workstation, which mounts its root filesystems via Cisco's SW iSCSI initiator. The newest version of this mini-HOWTO can always be found at http://eludicate.com/~bolen/iscsi/

0. Current Status

This doc is due for a refresh, but in the meantime some updates...

This system has been in daily use by for over a year on multiple systems, and has been tested and placed into 24/7 production in at least one environment. Currently people are booting RH 7.3, 9, and SuSE enterprise in this manner, using the latest versions of the iSCSI SW initiator from sourceforge. iSCSI-Root continues to work and be tested with the latest targets from Network Appliance including DataONTAP 6.5 and 6.4.*

Britt Bolen -- 1/22/04

1. General Overview

2. Setting up basic iscsi on linux

3. Setting up the filer

4. Setting up linux to use iscsi for root

5. Netbooting linux

6. Working with lun snapshots

7. Things that don't work yet!


1. General Overview

This HOWTO describes how setup a diskless linux box which mounts it's root file system via the Cisco software iscsi initiator which is accessing iscsi on a filer.

1.1 Why?

why not? it's cool. It is faster than local disks. Your data is on a filer so it's more reliable (think how cool it would be to just snap restore your desktop machine when you accidentally type rm -rf /). If you're setting up a cluster of machines they can use luns all created from a single starting lun in a snapshot which allows you to share all but the blocks that are changed on each machine (most of this could be done with nfs, but not as easily sharing /etc/ is tricky etc). Did i mention it's cool? no disk noise in your office. it's cool.

1.2 How?

The trick to getting your root filesystem on iscsi is all in the initial ramdisk. You need a way to get the iscsi module, the iscsi daemon (which finds the disks) and the iscsi config files early enough in the linux boot cycle to be able to mount your root filesystem.

The way the iscsi software works by default is as a late in boot up feature for mounting other filesystems. This HOWTO tells you how to tweak what is in the package for root-fs use.

By taking the module and daemon (recompiled statically) plus the config files and a helper 'sleep' command and sticking them in the ramdisk we have all the parts we need to get an iscsi disk and mount it.

once we have a working tcp/ip network route to the filer that is. this comes from the earlier linux NFS-Root work, which lets you tell the kernel to configure the network interfaces very early on in the boot sequence. you can use dhcp, bootp, rarp, or manual ip configuration.

combine this with the iscsi module and voila you've got a disk.

2. Setting up basic iscsi on linux

2.1 getting it

You want the iscsi software packages that Cisco wrote and distribute on sourceforge. http://sourceforge.net/projects/linux-iscsi

This currently is only tested with version 3.2.0.1 against DataONTAP 6.4R1

2.2 compiling it

make
make install
Thats it. You might want to change the Makefile to change the install directory for the iSCSI tools.

2.3 setting it up

2.3.1 /etc/iscsi.conf

This file just needs to have a line listing the IP of the filer you'll be using.
DiscoveryAddress=10.60.152.25

I have only tested this using a single filer with a single lun setup.

2.3.2 /etc/initiatorname.iscsi

This file contains the name that the linux box will use when it connects to the iSCSI target. The name is created using the iscsi-iname command that was installed as part of building the iSCSI software.

bolen@bolen /etc % /opt/sbin/iscsi-iname
iqn.1987-05.com.cisco.01.a1ad5447905f8e93a778ddfefedb233

The file should look something like this

bolen@bolen /etc % r cat /etc/initiatorname.iscsi
## DO NOT EDIT OR REMOVE THIS FILE!
## If you remove this file, the iSCSI daemon will not start.
## If you change the InitiatorName, existing access control lists
## may reject this initiator. The InitiatorName must be unique
## for each iSCSI initiator. Do NOT duplicate iSCSI InitiatorNames.
InitiatorName=iqn.1987-05.com.cisco.01.6727f456fe3f50c8274f4484bd7862d2

2.4 testing it

first set up the filer :)

you should be able to restart the iscsi daemon (iscsid) which is what finds luns. best way is via
redhat: service iscsi reload
or: /etc/rc.d/init.d/iscsi reload
or: send a HUP signal to the iscsid process
(which is all the first 2 do anyways)

if you've got everything setup, you should have found disks. check /proc/scsi/scsi for the disks, they'll say NETAPP LUN. You should also see the filer's initiator name in /proc/scsi/iscsi/0. dmesg and /var/log/messages should mention where the disk was added, eg: /dev/sda /dev/sdb.

once you know where the disk is, you can format it with fdisk and put a filesystem on it, etc, etc, etc. If you can do mkfs and mount it, you've got it working.

now would be a good time to tar over your filesystems to the iscsi disk for future use...
I like this method
mount your iscsi fs
tar clf - / | (cd /iscsifs;tar xvf -) or some thing like that...
also don't forget to create a partition for swap on the iscsi disk.

3. Setting up the filer

this is a nice quick overview of using sledgehammer. most of this is convered in the SAN Storage Admin Guide for fcp. and a windows iscsi doc is coming for fullsail.0

3.1 installing iscsi

install 6.4R1 or later. license iscsi. run iscsi start.

3.2 setting up igroups

get that name from /etc/initiatorname.iscsi and use it as such.
igroup create -i -t linux my_igroupinitiator_name
initiator groups are a group of host identifiers that individual luns (vdisks) are mapped (aka exported) to. sorta like an NIS netgroup. -i means iscsi and -t linux means linux. igroup OSen don't mean anything yet...

3.3 setting up luns

you can use lun setup or manually run lun create and lun map to create luns. quick example of creating a 10g lun and mapping it to the igroup in section 3.2 as lun 0.

lun create -s 10g /vol/vol0/iscsi_lun
lun map /vol/vol0/iscsi_lun my_igroup 0
note: luns can only be at volume or qtree roots.

3.4 other useful filer stuff

iscsi show initiators should list your linux box if it has connected.
lun show
lun show -v
lun show -m
lun show -m -g my_igroup
lun stats
sysstat -i

4. Setting up linux to use iscsi for root

4.1 Overview

look at section 1.2, it covers overview.

4.2 kernel considerations

you'll want to make sure a handful of options / modules are enabled. I usually just compile things into the kernel, and avoid modules if i know i need the code all the time. In general you'll need to have module support for the iscsi module. You'll also need to have your ethernet driver compiled into the kernel, and not a module because the early IP config code seems to run before modules are loaded. You'll want to have the filesystem you'll be using compiled into the kernel, or make sure it's in the ramdisk.

Here are some specific features you'll need for iscsi-root

CONFIG_BLK_DEV_LOOP Loopback device support is very helpful for building the initial ramdisk.
CONFIG_BLK_DEV_RAM you'll want ram disks to enable initial ramdisks
CONFIG_BLK_DEV_INITRD you'll want initial ramdisks
CONFIG_IP_PNP you'll want IP: kernel level autoconfiguration
CONFIG_IP_PNP_DHCP you'll want ip config dhcp
CONFIG_IP_PNP_BOOTP you'll want ip config bootp
CONFIG_IP_PNP_RARP you might want ip config rarp
CONFIG_SCSI you'll want scsi.
CONFIG_BLK_DEV_SD you'll want scsi disks
CONFIG_SCSI_MULTI_LUN you'll want multiple luns

Once you've got your kernel configured, build it, install it and boot it so that you're running on the kernel you plan to use with iscsi.

4.3 rebuilding the iscsi tools

Now that you're running your new kernel, you'll probably want to rebuild the iscsi tools to match the kernel. You'll also need to recompile iscsid statically, so you might as well rebuild it all.

in your iscsi source code, add this like to the Makefile

DAEMONFLAGS=-static
do a make clean and a make install

4.4 building the ramdisk

This is the important part! This is what makes it all work. First make sure you're running your new kernel.

New better way

I have a modified version of mkinitrd that will do all the work for you of building the ram disk. you can find it at http://eludicate.com/~bolen/iscsi/mkinitrd.iscsi. This is based on the redhat 7.3 mkinitrd, so i make not promises about it working on anything else...

prior to running this you need to have done the following...

  1. identified a filer to use
  2. picked an iscsi initiator name
  3. built your new kernel
  4. built the iscsi software to match the new kernel
  5. built a staticly linked iscsid
  6. built a static sleep program like http://eludicate.com/~bolen/iscsi/sleep.c
once all that is done you can run the command like so...
mkinitrd.iscsi -v --iscsi_iname=iqn.1987-05.com.cisco.01.[unique stuff] 
--iscsid=[path to iscsid]
--iscsi_sleep=[path to sleep]
--iscsi_target=[ip addr of filer]
[ramdisk name]
[kernel version]
specifically i would run for my system this command:
mkinitrd.iscsi -v --iscsi_iname=iqn.1987-05.com.cisco.01.6727f456fe3f50c8274f4484bd7862d2 --iscsid=/opt/sbin/iscsid --iscsi_sleep=/u/bolen/tmp/sleep --iscsi_target=10.60.132.21 /tmp/iscsi.bolen.img.gz 2.4.19

Old Way

mkinitrd handles most of the work for us, and eventually i'll make a new mkinitrd that does all the work. Assuming you've compiled everything you need into the kernel and not as modules, do this
mkinitrd -v --with=iscsi_mod /tmp/iscsi.img.gz <kernel-version>

now mount that ramdisk image on your linux box.
1. gunzip /tmp/iscsi.img.gz
2. mkdir /tmp/initrd
3. mount -o loop /tmp/iscsi.img /tmp/initrd

now add the stuff you need that mkinitrd didn't add. This is the iscsi discovery daemon, the config files a sleep command, and some extra directories. I'm assuming you installed your iscsi stuff on /opt
1. cp /opt/sbin/iscsid /tmp/initrd/sbin
2. build the sleep tool

#include 
#include 

int main (int argc, char ** argv)
{
	int secs;
	
	if (argc != 2) 
		return 1;

	secs = atoi(argv[1]);

	sleep(secs);
	return 0;
}

gcc -static -o sleep sleep.c
strip sleep
cp sleep /tmp/initrd/sbin

3. Copy in the config files. You should have already setup /etc/iscsi.conf to point to the filer which will hold the root disk. And only that filer.
cp /etc/iscsi.conf /tmp/initrd/etc/iscsi.conf
cp /etc/initiatorname.iscsi /tmp/initrd/etc/initiatorname.iscsi
4. Create a directory for the iscsid lock
mkdir -p /tmp/initrd/var/run
5. Add iscsid to the linuxrc config file by adding these lines to /tmp/initrd/linuxrc after the /proc filesystem is mounted.
echo "Starting iscsid"
iscsid -d -d -l /dev/iscsi -m 755
echo "iscsid started"
echo "sleeping for iscsid"
sleep 5 


The '-d' arguements turn on some debugging messages. 6. unmount the initial ramdisk umount /tmp/initrd
7. gzip the ramdisk back up gzip /tmp/iscsi.img

4.5 building the iSCSI fs

by now you have your kernel and you have iscsi built. start the iscsi service with the script in /etc/rc.d/init.d/iscsi. You should see your disk as /dev/sda1 (unless you have other scsi disks of course).

This disk needs to be formatted with 2 partitions. 1 small one for swap and a large one for the fs. use fdisk.

The filesystem should be journaling because the iscsi shutdown path can be ugly. You may need to hit the reset button to get a reboot to work. Thus you want a journaled FS.

4.6.X Describe how to cleanup the shutdown path!!!!

4.6 booting the iscsi kernel and ramdisk

First we're going to start by booting off the local disk, and mount an iscsi disk to test if it works

Make sure you've got your lun created on the filer, and it is mapped, and you've put a file system on it. Make sure you've added that FS to your /etc/fstab.

Add the new kernel and ramdisk plus config info to your bootloader. This example is for Redhat 7.3 and grub. add this to /etc/grub.conf

title Linux (2.4.19) iscsi
root (hd0,0)
kernel /linux-2.4.19 ro root=/dev/hda2 ip=dhcp
initrd /iscsi.img.gz

This says use the iscsi initial ramdisk and configure the system with dhcp.

now reboot with the ramdisk! This should work. You should have your iscsi disk mounted.

to use it as root you need to change the /iscsifs/etc/fstab to point to /dev/sda1.

change the root= argument to the kernel in /etc/grub.conf

setup a sawp partition on the iscsi disk. mkswap /dev/sda2

Disable start and stop of the iscsi service in /etc/rc.d/rc3.d and /etc/rc6.d and maybe rc5.d if you boot to XWindows.

thats it, reboot and you should get your rootfs over linux. If it doesn't work, email me! I'll add a troubleshooting section...

5. Netbooting linux

This is a quick overview of netbooting with Intel PXE bios and PXELinux

get PXELinux.

put pxelinux.0 in your tftp server's /tftpboot directory. Be sure your tftpserver supports the 'tsize' option, I'm using /u/bolen/src/tftp-hpa-0.31.tar.gz running as a replacement for the tftpd under solaris.

The tftp service in the filer has been tested and can be used to netboot PXE systems.  This was tested against 6.4

create a dhcp entry that includes the options for next-server (ip of the tftpserver) and filename to boot which is "pxelinux.0"

follow the docs at the pxelinux site above to setup the files in the /tftpboot/pxelinux.cfg directory. The file i'm booting from is at http://eludicate.com/~bolen/iscsi/pxelinux.cfg/0A3C0839

Thats about it! There is also a nice summary in the October 2002 issue of Linux Magazine starting on page 16.

6. Working with lun snapshots

So you can make multiple linux boxes share most of their iscsi blocks if you just make more luns all backed by the snapshot of a single good lun.

  1. first build a working iscsi root filesystem lun.
  2. halt the machine that is using that lun.
  3. map the lun to another linux box with iscsi
  4. mount the lun as another device on a linux box
  5. fsck the filesystem.
  6. remove the hostname from /mountpoint/etc/sysconfig/network
  7. remove the dhcp caches from /mountpoint/etc/dhcpc/
  8. umount and unmap that lun.
  9. create a snapshot containing the lun
  10. create luns based on that snapshot.
      Assuming the lun was called "/vol/iscsi/root_fs" and the snapshot is called "good_root".

    1. snap create iscsi good_root
    2. lun create -b /vol/iscsi/.snapshot/good_root/root_fs /vol/iscsi/root_fs_2
    3. lun map /vol/iscsi/root_fs_2 new_linux_box 0
  11. create a new ramdisk image and kernel for the system, making sure the iscsi initiator name is unique.
  12. boot the new system
  13. fix the hostname in /etc/sysconfig/hostname

7. Things that don't work yet!

Lots of things about this don't work all that well, here's a list in no particular order.

the normal iscsi startup script tweaks the tcp/ip stack for better performance. the ramdisk's linuxrc file doesn't.

the initial ramdisk can't be unmounted because iscsid is running from it. Kiss that 3M of ram goodbye.

you need a different initial ramdisk for each linux box since you need a unique iscsi initiator name for each system. might just be able to use the fully qualified host name. either way iscsid needs to be enhanced to not depend on that file. i wish linux had a 'hostid' command like solaris.

iscsi command in /etc/rc.d/init.d can't be used to rescan for new luns since there isn't a pid file in /var/run. easily fixed with a better startup script. also a better script could tweak tcp/ip for us.

general linux complaint. device names can change depending on response times from filers. there is little gaurantee that /dev/sda is always going to be the same disk. not iscsi specific.

early ip config stuff seems to only work if the nic has a driver compiled into the kernel. doesn't work with modules. :(

linux ignores filer disk geometry. makes using lun resize + fdisk a little trickier.

$Id: iscsi_root_HOWTO.html,v 1.4 2004/01/22 17:41:50 bolen Exp $