--
This document details the work I put into creating an optimal computer lab setup at a high school in Borlänge. The idea was to have PXE booted machines that downloaded disk images of the operating systems needed for the courses given. Since the hard drives were wiped at each boot, and a new image was written to them, restoring the machines to their last good state was a simple as pressing the power button.
Apart from this instant reinstall user accounts and files would be handled by a central server, so that the user settings would still be saved, even though the systems were wiped. I will log my continued effort for each day below.
Day 1
Before I actually started to work on this I had scanned the Internet for information, since I thought that somebody else should have come up with the same idea before. Apart from a couple of commercial solutions that might do what I wanted, I found nothing. There were lots of information on how to run diskless systems, via PXE etc, but since I want the students to have full access to a working operating system, this was not what I needed.
I started out by installing DHCP and TFTP on the debian server. This might sounds as a trivial task, but it wasn't. First of I stared out with the basic DHCP package, not knowing that these was a DHCP3 package available, that was a lot more powerful then the basic one. Once I discovered that, it was time for some trial and error of the TFTPD server. The basic package doesn't support the TSIZE command, and therefore doesn't work with PXELINUX, the boot loader i was going to use (more on this later). So I tried both TFTPD-HPA and ATFTPD before settling with ATFTPD.
PXELINUX is a boot loader that can be loaded via PXE and TFTP (as the name implies). Getting this to run, while sending the configuration data in the DHCP package took some time, even if most of the problems were due to the difference in the TFTPD implementations (TFTP-HPA supports CHROOT, which i wanted, so it took some time before i decided to scrap it in favour for ATFTPD). Once i got around the last file permission error (make sure that Others can read the PXELINUX files, since TFTP checks the default permissions) I finally got PXELINUX to boot.
Getting the kernel to load had to wait until tomorrow.
Day 2
So, the system boots, but PXELINUX fails to load the kernel, and again TFTPD is to blame. It seems PXELINUX can only access files in its own dir, so i have to remove the nice /pxelinux dir that i had in my /tftpboot dir and store the pxelinux files directly in the root dir, this way it could find the kernel files in /kernel. So, the system then loads the kernel file, but fails to mount the file system via NFS.
Yesterday I installed NFS-Server to allow the kernel to read its files from the server, amongst them the scripts to write the image to the hard drive. After fixing some problems in the /etc/exports I thought that everything should be fine, but I was wrong. Then i realise that the kernel doesn't have any support for the NIC used in the client machines, since it was configured to load all the needed drivers as modules. I downloaded the kernel source and compiled a version with static support for the 3com NIC.
After spending hours configuring and compiling the new kernel I came to the point were it would boot, just to throw a kernel panic when trying to mount the NFS partition. I decided to scrap all things NFS and take the custom initrd route, ie, ill have a ram drive loaded via TFTPD that contains the tools and scripts that i need. I downloaded a existing initrd to use a proof of concept, but booting that was all there was time for before i had to leave.
This day was a frustrating one, since a lot of time went into compiling and testing the kernel, a lot of work that was later discarded. Well, that s life I guess.
Day 3
So, i needed a small Linux distribution that I could modify to do the cloning. I found Nanobox, a small Busybox based distribution. It fitted the bill quite nicely, so after ensuring that it would net boot i set out to add the tools i needed.
To limit the amount of hassle i would have to go through with dependencies etc i decided to compile the tools I needed statically, and after figuring out how to do this I apt-get the source for gzip and ncftp (since I wanted to use ncftget and ncftpput). After compiling them I added them to the disk image along with two scripts to clone and restore hda.
Once this was tested and the system booted as expected I installed vsftpd on the Debian server and added an account for the net booting clients. When this was done I crossed my fingers and started a cloning from one of the clients (there were old Windows installations an all of them, and these would have to do to test it). And it work. It wasn't fast, and this is something that ill have to have a look at, but it worked!
So, tomorrow will see me finishing up the installation and then installing one copy of each of the client operating systems, to uses as master image, but that is all for tomorrow.
Day 4
All that was left on the server now was to install samba and configuring it to act as a Primary Domain Controller. When this was done i started to install the clients, to get a master copy to use as disk image. I had gotten about halfway through updating Windows when the net died. Seems like the people handling the Internet connection were supposed to do some work during today and tomorrow, so the last parts of this will have to wait until the fall.
Day 4, part 2
So, the fall is here, and i am back in school. I went back to samba, recheck the configuration and added support for roaming profiles. After this i restarted the Windows install, installed Microsoft Office (Since the school already had a license for it, if this isn't the case i think openoffice.org would do just as well).
After this i let the automatic update run until Windows was up to date. I also installed Mozilla Firefox and redid the settings and layout for the start menu. After this i joined the Windows machine to the Samba controlled domain, and it worked like a charm. The only thing i couldn't get to work was the default user setup, but since this will only be used once its of no major concern. So, i PXE booted the Windows box and started the disk cloning process. The image for the 10GB partition was 6GB compressed. This is obviously to large and i intend to reduce the size of the partition. I have decided to focus on getting the Windows install to work, before adding any more operating systems to the mix.
Day 5
Today was devoted to getting the automatic cloning of the lab machines to work. My first idea was to use wake on lan to start the machines, combining this with a cron job on the server that ensured that the right pxelinux.cfg file was active. This would mean that each box would be woken with a wake on lan signal, issued by the server as a cron job, and it would then boot, get the load disk psxlinux.cfg file, download the image and write it to disk. I configured the bios on my test box to allow this, as well as installed wakeonlan on the debian server. At first it didn't work, but i then realised that the box needed to be in hibernation for it to wake up. Booting Widows (and while i was there activating wake on lan for my NIC) and then putting it into hibernation solved that. Problem now was, apart from the fact that the boxes needed to hibernate and not shutdown, was how i would get all of them into hibernation each night. There are lots of tools that do this, but they are all OS dependent (ie, there is no shutdown on lan), and since my machines might be running a multitude of OS:es this wasn't a practical solution.
Then i realised that since this is an old electronics lab, i have wall sockets that are connected to an emergency halt system, if i hook the boxes up to this i can shut them all down at once. Looking in the bios (these are all Dell Optiplex GX110 machines), i found a automatic wakeup feature. This could be used to turn the machines back on to start the cloning. Well, it turns out the auto power on feature doesn't work if you cut the power with a switch or strip. So, the solution i have to use is that ill start the cloning via a timer, that turns on the power, and then the machines should have finished by the time i get there in the morning. Its good to have a lot of electricans around =).
I finish the day by running a test cloning of six of the boxes i had build.
Day 6
After yesterday everything worked as intended, but there are still some minor issues that could be solved, to ensure that the solution is more stable. First of all the Windows boxes complained that they all had the same NetBIOS name. This is due to Windows not updating the NetBIOS name from the supplied hostname (which are generated from each machines unique IP address). I found Workstation Name Changer and combined with a service wrapper it allowed me to set the NetBIOS name on the machines at boot time, solving that problem.
The cloning process could also be improved, by allowing anonymous access to the disk images, then requiring a username and password to upload to that directory. This way, if any of the users figure out how this is done, it would be more secure.
There are probably more things that could be improved, but for now i am content with the setup as it is. If i ever do any updates ill create a new version of this document.