This doc is due for a refresh, but in the meantime some updates...
This system has been in daily use by for over a year on multiple systems, and has been tested and placed into 24/7 production in at least one environment. Currently people are booting RH 7.3, 9, and SuSE enterprise in this manner, using the latest versions of the iSCSI SW initiator from sourceforge. iSCSI-Root continues to work and be tested with the latest targets from Network Appliance including DataONTAP 6.5 and 6.4.*
Britt Bolen -- 1/22/04
This HOWTO describes how setup a diskless linux box which mounts it's root file system via the Cisco software iscsi initiator which is accessing iscsi on a filer.
rm -rf /). If you're
setting up a cluster of machines they can use luns all created from a single
starting lun in a snapshot which allows you to share all but the blocks that
are changed on each machine (most of this could be done with nfs, but not as
easily sharing /etc/ is tricky etc). Did i mention it's cool? no disk noise
in your office. it's cool.
The trick to getting your root filesystem on iscsi is all in the initial ramdisk. You need a way to get the iscsi module, the iscsi daemon (which finds the disks) and the iscsi config files early enough in the linux boot cycle to be able to mount your root filesystem.
The way the iscsi software works by default is as a late in boot up feature for mounting other filesystems. This HOWTO tells you how to tweak what is in the package for root-fs use.
By taking the module and daemon (recompiled statically) plus the config files and a helper 'sleep' command and sticking them in the ramdisk we have all the parts we need to get an iscsi disk and mount it.
once we have a working tcp/ip network route to the filer that is. this comes from the earlier linux NFS-Root work, which lets you tell the kernel to configure the network interfaces very early on in the boot sequence. you can use dhcp, bootp, rarp, or manual ip configuration.
combine this with the iscsi module and voila you've got a disk.
This currently is only tested with version 3.2.0.1 against DataONTAP 6.4R1
make
Thats it. You might want to change the
make install
Makefile to change
the install directory for the iSCSI tools.
/etc/iscsi.conf
DiscoveryAddress=10.60.152.25
I have only tested this using a single filer with a single lun setup.
/etc/initiatorname.iscsiiscsi-iname
command that was installed as part of building the iSCSI software.
bolen@bolen /etc % /opt/sbin/iscsi-iname
iqn.1987-05.com.cisco.01.a1ad5447905f8e93a778ddfefedb233
The file should look something like this
bolen@bolen /etc % r cat /etc/initiatorname.iscsi
## DO NOT EDIT OR REMOVE THIS FILE!
## If you remove this file, the iSCSI daemon will not start.
## If you change the InitiatorName, existing access control lists
## may reject this initiator. The InitiatorName must be unique
## for each iSCSI initiator. Do NOT duplicate iSCSI InitiatorNames.
InitiatorName=iqn.1987-05.com.cisco.01.6727f456fe3f50c8274f4484bd7862d2
you should be able to restart the iscsi daemon (iscsid) which is what
finds luns. best way is via
redhat: service iscsi reload
or: /etc/rc.d/init.d/iscsi reload
or: send a HUP signal to the iscsid process
(which is all the first 2 do anyways)
if you've got everything setup, you should have found disks. check
/proc/scsi/scsi for the disks, they'll say NETAPP LUN. You
should also see the filer's initiator name in /proc/scsi/iscsi/0.
dmesg and /var/log/messages should mention where
the disk was added, eg: /dev/sda /dev/sdb.
once you know where the disk is, you can format it with fdisk and
put a filesystem on it, etc, etc, etc. If you can do mkfs and
mount it, you've got it working.
now would be a good time to tar over your filesystems to the iscsi disk for
future use...
I like this method
mount your iscsi fs
tar clf - / | (cd /iscsifs;tar xvf -) or some thing like that...
also don't forget to create a partition for swap on the iscsi disk.
iscsi start.
/etc/initiatorname.iscsi and use it as such.
igroup create -i -t linux my_igroupinitiator_namelun setup or manually run lun create
and lun map to create luns. quick example of creating a 10g lun
and mapping it to the igroup in section 3.2 as lun 0.
lun create -s 10g /vol/vol0/iscsi_lunnote: luns can only be at volume or qtree roots.
lun map /vol/vol0/iscsi_lun my_igroup 0
iscsi show initiators should list your linux box if it has
connected.
lun show
lun show -v
lun show -m
lun show -m -g my_igroup
lun stats
sysstat -i
you'll want to make sure a handful of options / modules are enabled. I usually just compile things into the kernel, and avoid modules if i know i need the code all the time. In general you'll need to have module support for the iscsi module. You'll also need to have your ethernet driver compiled into the kernel, and not a module because the early IP config code seems to run before modules are loaded. You'll want to have the filesystem you'll be using compiled into the kernel, or make sure it's in the ramdisk.
Here are some specific features you'll need for iscsi-root
CONFIG_BLK_DEV_LOOP Loopback device support is very helpful for building the initial ramdisk.
CONFIG_BLK_DEV_RAM you'll want ram disks to enable initial ramdisks
CONFIG_BLK_DEV_INITRD you'll want initial ramdisks
CONFIG_IP_PNP you'll want IP: kernel level autoconfiguration
CONFIG_IP_PNP_DHCP you'll want ip config dhcp
CONFIG_IP_PNP_BOOTP you'll want ip config bootp
CONFIG_IP_PNP_RARP you might want ip config rarp
CONFIG_SCSI you'll want scsi.
CONFIG_BLK_DEV_SD you'll want scsi disks
CONFIG_SCSI_MULTI_LUN you'll want multiple luns
Once you've got your kernel configured, build it, install it and boot it so that you're running on the kernel you plan to use with iscsi.
Now that you're running your new kernel, you'll probably want to rebuild
the iscsi tools to match the kernel. You'll also need to recompile
iscsid statically, so you might as well rebuild it all.
in your iscsi source code, add this like to the Makefile
DAEMONFLAGS=-staticdo a
make clean and a make install
This is the important part! This is what makes it all work. First make sure you're running your new kernel.
I have a modified version of mkinitrd that will do all the work for you of building the ram disk. you can find it at http://eludicate.com/~bolen/iscsi/mkinitrd.iscsi. This is based on the redhat 7.3 mkinitrd, so i make not promises about it working on anything else...
prior to running this you need to have done the following...
mkinitrd.iscsi -v --iscsi_iname=iqn.1987-05.com.cisco.01.[unique stuff]specifically i would run for my system this command:
--iscsid=[path to iscsid]
--iscsi_sleep=[path to sleep]
--iscsi_target=[ip addr of filer]
[ramdisk name]
[kernel version]
mkinitrd.iscsi -v
--iscsi_iname=iqn.1987-05.com.cisco.01.6727f456fe3f50c8274f4484bd7862d2
--iscsid=/opt/sbin/iscsid --iscsi_sleep=/u/bolen/tmp/sleep
--iscsi_target=10.60.132.21 /tmp/iscsi.bolen.img.gz 2.4.19
mkinitrd handles most of the work for us, and eventually i'll make
a new mkinitrd that does all the work. Assuming you've compiled everything
you need into the kernel and not as modules, do this
mkinitrd -v --with=iscsi_mod /tmp/iscsi.img.gz <kernel-version>
now mount that ramdisk image on your linux box.
1. gunzip /tmp/iscsi.img.gz
2. mkdir /tmp/initrd
3. mount -o loop /tmp/iscsi.img /tmp/initrd
now add the stuff you need that mkinitrd didn't add. This is the iscsi
discovery daemon, the config files a sleep command, and some extra
directories. I'm assuming you installed your iscsi stuff on /opt
1. cp /opt/sbin/iscsid /tmp/initrd/sbin
2. build the sleep tool
#include#include int main (int argc, char ** argv) { int secs; if (argc != 2) return 1; secs = atoi(argv[1]); sleep(secs); return 0; } gcc -static -o sleep sleep.c strip sleep cp sleep /tmp/initrd/sbin
cp /etc/iscsi.conf /tmp/initrd/etc/iscsi.conf4. Create a directory for the iscsid lock
cp /etc/initiatorname.iscsi /tmp/initrd/etc/initiatorname.iscsi
mkdir -p /tmp/initrd/var/run5. Add iscsid to the linuxrc config file by adding these lines to
/tmp/initrd/linuxrc after the /proc filesystem
is mounted.
echo "Starting iscsid" iscsid -d -d -l /dev/iscsi -m 755 echo "iscsid started" echo "sleeping for iscsid" sleep 5
umount /tmp/initrdgzip /tmp/iscsi.img
by now you have your kernel and you have iscsi built. start the iscsi
service with the script in /etc/rc.d/init.d/iscsi.
You should see your disk as /dev/sda1 (unless you have other
scsi disks of course).
This disk needs to be formatted with 2 partitions. 1 small one for swap and a large one for the fs. use fdisk.
The filesystem should be journaling because the iscsi shutdown path can be ugly. You may need to hit the reset button to get a reboot to work. Thus you want a journaled FS.
First we're going to start by booting off the local disk, and mount an iscsi disk to test if it works
Make sure you've got your lun created on the filer, and it is mapped, and
you've put a file system on it. Make sure you've added that FS to your
/etc/fstab.
Add the new kernel and ramdisk plus config info to your bootloader. This
example is for Redhat 7.3 and grub.
add this to /etc/grub.conf
title Linux (2.4.19) iscsi
root (hd0,0)
kernel /linux-2.4.19 ro root=/dev/hda2 ip=dhcp
initrd /iscsi.img.gz
This says use the iscsi initial ramdisk and configure the system with dhcp.
now reboot with the ramdisk! This should work. You should have your iscsi disk mounted.
to use it as root you need to change the /iscsifs/etc/fstab to point to /dev/sda1.
change the root= argument to the kernel in /etc/grub.conf
setup a sawp partition on the iscsi disk. mkswap /dev/sda2
Disable start and stop of the iscsi service in /etc/rc.d/rc3.d and /etc/rc6.d and maybe rc5.d if you boot to XWindows.
thats it, reboot and you should get your rootfs over linux. If it doesn't work, email me! I'll add a troubleshooting section...
This is a quick overview of netbooting with Intel PXE bios and PXELinux
get PXELinux.
put pxelinux.0 in your tftp server's /tftpboot directory. Be sure your tftpserver supports the 'tsize' option,
I'm using /u/bolen/src/tftp-hpa-0.31.tar.gz running as a replacement for the tftpd under solaris.
The tftp service in the filer has been tested and can be used to netboot PXE systems. This was tested against 6.4
create a dhcp entry that includes the options for next-server (ip of the tftpserver) and filename to boot which is "pxelinux.0"
follow the docs at the pxelinux site above to setup the files in the /tftpboot/pxelinux.cfg directory. The file i'm booting from is at http://eludicate.com/~bolen/iscsi/pxelinux.cfg/0A3C0839
Thats about it! There is also a nice summary in the October 2002 issue of Linux Magazine starting on page 16.
Lots of things about this don't work all that well, here's a list in no particular order.
the normal iscsi startup script tweaks the tcp/ip stack for better
performance. the ramdisk's linuxrc file doesn't.
the initial ramdisk can't be unmounted because iscsid is
running from it. Kiss that 3M of ram goodbye.
you need a different initial ramdisk for each linux box since you need a unique iscsi initiator name for each system. might just be able to use the fully qualified host name. either way iscsid needs to be enhanced to not depend on that file. i wish linux had a 'hostid' command like solaris.
iscsi command in /etc/rc.d/init.d can't be used to rescan for new luns since there isn't a pid file in /var/run. easily fixed with a better startup script. also a better script could tweak tcp/ip for us.
general linux complaint. device names can change depending on response times from filers. there is little gaurantee that /dev/sda is always going to be the same disk. not iscsi specific.
early ip config stuff seems to only work if the nic has a driver compiled into the kernel. doesn't work with modules. :(
linux ignores filer disk geometry. makes using lun resize + fdisk a little trickier.
$Id: iscsi_root_HOWTO.html,v 1.4 2004/01/22 17:41:50 bolen Exp $