Edubuntu LTSP - How it works

Recently, OK a while now, I blogged about building a DNS round robin cluster. The problem with the solution was that the server in the cluster that responded to DHCP was also the server responsible for TFTP and the server that would serve the nbd blocks to the client. It was and is too much like a single point of failure. I would rather that any of the other machines could take over from the TFTP stage. I thought it would be simple to just create that scenario using the variables on offer in DHCP... but it didn't work the way I anticipated. So I thought perhaps I should start by making sure I completely understand what happens during the LTSP client boot. So what this entry is about is disecting how LTSP thin clients boot in the Edubuntu implementation (hopefully I have this right!)

Step 1, the PXE client (the primary loader) running on the thin client's PXE network card sends out a DHCP request. The DHCP server responds supplying the contents of the DHCP server config, which includes a number of things, but mainly the IP for the client and the name of the file that the PXE client needs to request from the server, via tftp, in order to begin booting.  The client receives its IP and the file name then tftps pxelinux.0 from the TFTP server running on the server.  The file is transferred over the network to the client (probably using UDP?) and the client loads it into memory and executes it.

The now running pxelinux (a secondary loader) loads its config file by fetching it from the TFTP server (these are the files found in pxelinux.cfg in the TFTP server's root directory). The config file contains the name of the kernel and the initram disk that need to be tftp'ed from the TFTP server to the client, as well as the necessary kernel arguments. Once retrieved by the client they are loaded into memory and the kernel then begins executing.

The now running kernel detects and initialises all the hardware. Because no root is specified the kernel then mounts the ramdisk image as the root file system. Next the kernel runs the init script which sets up environment variables and creates necessary files and ultimately loads a new root file system.

What init does more specifically seems to be that somehow the scrip ltsp_nbd gets run, which sends out another dhcp request and saves the response to a file. Next the nbd_root_server variable is set depending on either the value of the nbdroot (possibly set in the initram disk) or nbd_root_server which is set as the IP of the DHCP server, from the dhcp resonse. In addition, the nbd_root_port is set either based on the nbdroot or to 2000. Then the nbd_swap variable is set which influences the values of the nbd_swap_server and port. If the client doesn't have enough memory (< 48M) then it uses nbdswapd to create extra swap on the server for use by the client.

Next, /cow (copy on write) gets created and has a tmpfs file system mounted. nbd client sets up a link to the nbd server and associates /dev/nbd0 with the link. On the server side /etc/inetd.conf has a line that allows nbdrootd to receive communications on port 2000 and then serve the nbd image file (blocks needed at the time). Next on the client side /dev/nbd0 gets mounted at /rofs (read only file system) (which was created earlier along with /cow) with a squashfs file system. Then /cow and /rofs are mounted together at /root with a unionfs file system. /cow is then mounted into the new root as /root/cow and similarly /rofs is mounted at the new root, /root/rofs

Then the lts.conf file gets tftp'ed to the client (if it exists, its only created if clients need to be different from the basic image). A script called run_init is then executed which changes /root to / and then calls /sbin/init. Now we have a proper operating system and /sbin/init begins to execute. /sbin/init then starts all the other services of the thin client machine (ssh, networking, etc), much like a normal linux operating system booting.

From what I have learnt, it still looks to me that the way to make the system more robust (less single points of failure) is with dhcp. I need to get the dhcp server to tell the client that it can use another server for it communication. But it didn't work the way I thought it would last time I tried that, so I think the next step for me to do is read the PXE specifications, to see what the PXE client expects from the dhcp server, and then use wireshark to see what is happening when the client boots and requests IPs etc.


Display comments as (Linear | Threaded)

    No comments

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.