<!doctype linuxdoc system>
<linuxdoc>
  <article>
    <titlepag>
      <title>Building Linux Beowulf Clusters</title>
      <author>
        <name>Mike Perry (mikepery@fscked.org)</name>
      </author>
      <date>Updated 09/26/00</date>
      <abstract>This document aims to provide an overview of Beowulf clusters,
		from cluster design, to cluster use and maintainance. This
document was written while I was under hire by <htmlurl
url="http://www-illigal.ge.uiuc.edu" 
name="The Illinois Genetic Algorithms Lab (Illigal)">, and as such, spends a good deal of time describing how to
maintain and use the specific type of cluster I built for them.
      </abstract>
    </titlepag>
    <toc>
    <sect>
      <heading>Introduction</heading>
      <sect1>
        <heading>What is a Beowulf Cluster?</heading>
        <p>
	Essentially, any group of Linux machines dedicated to a single purpose
	can be called a cluster. To be called a Beowulf cluster, all you need
is a central node that does some coordination (Ie, technically an IP Masq'ed
network fits loosely into the Beowulf definition given by the <htmlurl 
url="http://www.supercomputer.org:8080/FAQ/beowulf-faq.html" 
name="beowulf mailing list.">)
        </p>
      </sect1>
      <sect1>
        <heading>What can a Beowulf Cluster do?</heading>
	<p>
	A beowulf cluster is capable of many things that are applicable in
areas ranging from data mining to reasearch in physics and chemistry, all the 
way to the movie industry. Essentially anything that can perform several semi jobs
concurrently can benifit from running on a Beowulf Cluster. There are two
classes of these parallel programs.
	</p>
	<sect2>
	  <heading>Embarrassingly Parallel Computation</heading>
	  <p>A Beowulf cluster is best suited for "embarrassingly parallel"
tasks.  In other words, those tasks that require very little communication
between nodes to complete, so that adding more nodes results in a near-linear
increase in computation with little extra synchonization work. Embarrassingly
parallel programs may also be known as linearly scalable. Esentially, you get
a linear performance increase for each additional machine with no sign of
deminishing returns.
         </p>
         <p><htmlurl 
url="http://surf.de.uu.net/encore/www/" name="Genetic Algorithms"> 
are one example of a application of embarassingly parallel
computation, since each added node can simply act as another "island" on which to 
evaluate fitness of a population. Rendering an animation sequence is another,
because each node can be given an individual image, or even sections of an
image to render quite easily even using standard programs like <htmlurl
url="http://www.povray.org" name="POVRAY">.
          </p>
        </sect2>
	<sect2>
          <heading>Explicitly Parallel Computation</heading>
          <p>Also applicable are explicitly parallel programs, that is
programs that have explicity coded parallel sections of their code that can
run independantly. These differ from fully parallelizable programs in that
they contain one or more interdependant serial or atomic components that must 
be syncroninzed across all instances of the parallel application. This
synchonization is usually done through libraries such as <htmlurl
url="http://www.faqs.org/faqs/mpi-faq/" name="MPI">, <htmlurl
   url="http://netlib2.cs.utk.edu/pvm3/faq_html/faq.html" name="PVM">, or even
extended versions of POSIX threads and <htmlurl
url="http://wallybox.cei.net/dipc/" name="System V shared memory">.
	  </p>
        </sect2>
      </sect1>
      <sect1>
        <heading>What can't a Beowulf Cluster do?</heading>
	<p>
	A common misconception is that there is a specific set of beowulf
software that you can run on a group of Linux machines, and viola, everything
runs faster. This is not the case. While there is software to turn a cluster
of Linux workstations into what seems like a single computer with a single process id
space, this is impraticle for a single user setup, as even the most hardcore
poweruser can only run (otherwise unparallelizable) CPU intensive jobs for a 
short period of time before letting the machine fall idle. Also, the way these
systems are implemented also makes them impracticle even to speed up a simple
"make -j". 
	</p>
	<p>So essentially, when thinking about implementing a Beowulf cluster,
the two questions to ask are "Are lots of independant programs going to be
running" or "Is the extra speed worth rewriting my programs? Are they even
parallelizable?" "Are my programs CPU bound?" All Beowulf configurations will 
slow down disk or memory-bound applications.
	</p>
      </sect1>
    </sect>
    <sect>
      <heading>Types of Beowulf Clusters</heading>
      <p>
       There are several ways to design a Beowulf. I'm going to try to
       break down the types by network, hardware, and software configuration, 
       since you really can pick one from each. Under each section will be a
       quick blurb about the best price/performace technology to use. Please
       note that the terms I use here are my own, so I can refer to them
easily. (Also, most of them I made up cause
they sound cool ;) 
       </p>
       <sect1>
	 <heading>Classification by Network Architecture</heading>
	 <p>
         For all cluster types, I would recommend switched 100BaseT as the
price/performance sweet spot. Gigiabit switches and cards are roughly 10 times
more expensive currently than 100BaseT, and for embarassingly parallel jobs,
you won't need this bandwith, as your communication should be minimal. If you
are implementing a <htmlurl url="http://www.mosix.org" name="MOSIX">-based cluster, Gigabit networking might be something
to consider, as MOSIX is more network bound than your typical GA.
	 <itemize>
	   <item><bf>NetType I - Single External Node</bf>
	   <p>In this configuration, you basically have one single entry point to
	   the cluster, ie one monitor and keyboard, and one external IP
	   address. The rest of the cluster is usually then behind a normal 
       	   <htmlurl
	    url="http://www.linuxdoc.org/HOWTO/IP-Masquerade-HOWTO.html" 
	    name="IP masqed setup">. Users are usually then encouraged to 
           login only to the main node, and spawn remote jobs with only ssh.
	   <p><item><bf>NetType II - Multinode/Nondedicated</bf>
           <p>In this configuration all nodes are equivalent from a network
           standpoint. They all have external IP addresses, and usually all have
           keyboards and monitors.
           Usually this configuration is chosen so that nodes can also double
           as desktop workstations, and thus have external IP addresses in
           their own right. If you don't need to do this, doing a NetType I 
           configuration is recommended.
	   </itemize>
	 </p> 
       </sect1>
       <sect1>
	 <heading>Classification by System Architecture</heading>
	 <p>For all cluster types, I recommend x86 hardware. As shitty as it
is, it really is the best price/performance ratio, especially when you're dealing
with near-linearly scalable algorithms. For example, even if a DEC Alpha 21264
was 3 times faster than an AMD K7T (which it is not), you can buy
about 4 Thunderbirds for the price of one Alpha. (Trust me, I looked.
Thunderbirds run as low as $900 a piece with 256 RAM, while a 21264
will run you at LEAST $3600, and thats if you find a half dozen stolen ones on 
eBAY).
	 <itemize>
	   <item><bf>ArchType I - Local Synchronized disk</bf>
	   <p>
	   In this configuration, all nodes have local disks. In addition,
these disks are kept in sync nightly by an <htmlurl
url="http://rsync.samba.org/rsync/index.html" name="rsync"> job that 
updates pretty much everything except /var, /tmp, and /etc/sysconfig. In this
configuration, extra scratch space and /home can be optionally NFS mounted
across all clusters.
	   <p><item><bf>ArchType II - Local Nonsynchronized disk</bf>
	   <p>
	   In this configuration, all nodes have local disk, but they are not
kept in sync. This is most useful for disk-independant embarrassingly parallel
setups that merely do number crunching, and no disk-based syncronization.
	   <p><item><bf>ArchType III - NFS root</bf>
	   <p>
           This configuration is most useful for those who wish to save money
on disks for all nodes, and wish to avoid the headaches of having to keep a
few dozen to a few hundred disks in sync. This option is actually quite a
reasonable choice, especially for programs that need some disk syncronization,
but aren't otherwise disk-bound. 
	 </itemize>
         </p> 
       </sect1>
       <sect1>
         <heading>Classification by Synchronization Software</heading>
	<p>
	Now this is the important part. Choosing your software essentially
determines the use of the cluster. For example, you don't want to go full 
fleged with
MOSIX if all you are doing is setting up a rendering farm or a GA.
	<itemize>
	  <item><bf>SoftType I - Batch System</bf>
	  <p>
          Basically a batch system is one where you just send it a job, and it
does it. Usually only one job is run at a time, and job scheduling is left to
the programmer/job runner. Sometimes a queue or at least a launching script 
is provided through some simple bash scripts, otherwise remote processes are 
launched manually one at a time on each node, or through ssh. Needless to say,
this is the easiest software type to set up, and also the least overhead. It
is the Recommended Buy (tm) for embarrasingly parallel apps.
	  <p><item><bf>SoftType II - Preemptive Scheduler/Migrator</bf>
	  <p>
	  This next class contains systems that automatically schedule and
migrate processes based on cluster status. This kind of setup really is geared
more towards those who just want to set up a cluster for a general mass-login
system, rather than those who want to do distributed programmnig. The two
software packages available that provide this ability are <htmlurl
url="http://www.cs.wisc.edu/condor/" name="Condor"> and <htmlurl
url="http://www.mosix.org" name="Mosix">. Condor isn't officially Open Source
yet, and Mosix seems far more full featured for this purpose. In fact, Mosix
allows you to even build a NetType II/ArchType II style cluster and still use
each node for cluster jobs. In addition, Condor places a lot of limitations on
the types of jobs that can be run across the entire cluster, where as Mosix is
meant to be entirely transparent. Plus Mosix is Open Source :) Technically
neither of these technologies is part of the traditional "Beowulf" scheme, but
they are significant enough that they must not go without mention in any
Beowulf document.
	  <p><item><bf>SoftType III - Fine Grained Control</bf>
	  <p>
	  The last beowulf software implementation class actually has some
overlap with SoftType I. This is the Fine Grained Control section, where
individual programs themselves control the synchronization, load balancing,
etc. Often times these jobs are launched through a SoftType I method, and then
syncronized using one of the standardized source-level libraries already
available. These include the industry standard <htmlurl
url="http://www.faqs.org/faqs/mpi-faq/" name="MPI"> and<htmlurl
   url="http://netlib2.cs.utk.edu/pvm3/faq_html/faq.html" name="PVM">. Also 
availble are an extention to SysV IPC, called 
<htmlurl url="http://wallybox.cei.net/dipc/" name="DIPC">, and a more
control-oriented remote-forking system called <htmlurl
url="http://www.beowulf.org/software/bproc.html" name="bproc">. Of course, to
utilize this method, you must (re)write your software to use one of these
libraries.
	</itemize>
	</p>
       </sect1>
    </sect>
    <sect>
      <heading>Building the System</heading>
      <p>As I have said, I've built two clusters so far. The NCSA cluster was
a NetType I, ArchType I, SoftType I 8 node, 12 processor cluster, which I
actually built with the help of <htmlurl url="http://wuollett.net"
name="Kristopher Wuollett">. The Illigal
cluster is a NetType I, ArchType III, SoftType I/III. Unfortunately, the NCSA
reinstalled the cluster (with NT, ugh.. Is that place ever going down the tubes :() and deleted all our scripts after I left, so the best I
can do is talk about what we did there. I think I like the Illigal setup better
anyways. It's cheaper and feels cleaner (plus synchronization is automatic).
<p>Some assumptions: I
based both my clusters on <htmlurl url="http://www.redhat.com" name="RedHat">
Linux 6.2. Redhat takes a lot of flak from people, but out of all the distributions I've seen,
they actually have the cleanest setup. In addition, you will find it very
helpful if you use a secondary package manager, such as <htmlurl
url="http://encap.cso.uiuc.edu" name="encap"> or <htmlurl
url="http://www.gnu.org/software/stow/" name="GNU Stow">. These
programs allow you to package third party programs (such as scripts you write)
into their own directory in /usr/local/encap, and then maintain the symbolic
links into /usr/local. Really comes in handy when you do maintainance.
      <sect1>
        <heading>Building the NetType</heading>
	<p>This is pretty straightforward. The only method that needs any
explaining is the NetType I.
Read the <htmlurl
url="http://www.linuxdoc.org/HOWTO/IP-Masquerade-HOWTO.html" name="IP MASQing HOWTO">, and
have a look at my <htmlurl url="scripts/ipmasq" name="scripts">. Essentially,
I determine the kernel version in <htmlurl url="scripts/ipmasq/ipmasq.init" name="ipmasq.init"> and run the appropriate scripts
for either kernel 2.4 or 2.2. 
<p>The ipmasq.init is run only on the world node.
The eth0 interface is configured as normal with the external IP, and then
use <htmlurl
url="http://www.linuxdoc.org/HOWTO/mini/IP-Alias.html" name="IP aliasing"> in
the scripts to put the world node on both the external and internal (10.x.x.x) network
segments. The subnodes are given IP's in the internal network. The method for
doing this, however, depends on the ArchType.
      </sect1>
      <sect1>
        <heading>Building the ArchType</heading>
	 <p>
	 <itemize>
	   <item><bf>ArchType I - Local Synchronized disk</bf>
	   <p>
	   Achieving local syncronized disks is a bit of a pain. We did it
though, but unfortunately, like I said, our work was deleted. 
	   <p>To install, esentially we set 
up 8 floppies, put a mini root filesystem with a kernel, bash, and replaced 
/sbin/init with a shell script that basically fetched several tarballs that
comprised a complete RH6.2 system (one for each major directory heirarchy), 
and allowed
you to tune some paramaters (or use preset values from a saved config file). 
Essentially, we did something like <htmlurl
url="http://www.linuxdoc.org/HOWTO/KickStart-HOWTO.html" name="KickStart">
(cept we wrote it, so it was cooler ;)
	   <p>We then NFS exported /home from the world node, as well as each
machines /scratch partition. Each machine had all the others /scratch's
mounted on /scrach/machinename, and it's own scratch was /scratch/machinename.
	   <p> <htmlurl
url="http://rsync.samba.org/rsync/index.html" name="Rsync"> was run nightly from cron to keep non-nfs stuff synced up,
so you only had to install an RPM on the root node, and it would propogate to
the subnodes by nighttime.
	   <p><item><bf>ArchType II - Local Nonsynchronized disk</bf>
	   <p>
	   This configuration is easy to set up, of course. Just install the
machines however you want, and don't worry about it! :)
	   <p><item><bf>ArchType III - NFS root</bf>
	   <p>This is the configuration used in the Illigal cluster.
Essentially, the way I recomend setting things up is to use dhcp, and then
either a NFS boot floppy, or if you're lucky, PXE to boot of of an Intel
EtherExpress Pro 10/100+.
           <p>This actually caused me quite a bit of grief. Floppies proved so
unrelaible as to stop booting after one or two boots, and PXE documentation is
sparse and incomprehensive. Eventually, I settled on using <htmlurl
url="http://freshmeat.net/projects/syslinux/?highlight=syslinux"
name="SysLinux">. Syslinux contains a package called <htmlurl
url="pxelinux.txt" name="PXElinux"> that you use in
conjunction with Intel EtherExpress Pro/100+ ethernet cards to do remote
booting.
	   <p>To set up a boot floppy, you basically create a <htmlurl
url="floppy" name="mini root filesystem"> like you would do for ArchType I, except 
you pass some special nfsroot arguments to the kernel, as you can see in my
<htmlurl url="floppy/etc/lilo.conf" name="lilo.conf">. You also have to mknod
&lt;nfs,boot255&gt; c 0 255 for lilo to not complain, as well as cp -a those other
couple devices from /dev. You then can install
lilo on this floppy with <tt>lilo -r /mnt/floppy</tt>. 
	   <p>Booting off a PXE capable bootrom is a bit harder, but much more
reliable. Esentially, if you follow the <htmlurl
url="pxelinux.txt" name="PXElinux documentation file">, you should be alright.
You're going to need to fetch <htmlurl 
url="ftp://www.kernel.org/pub/software/network/tftp/" name="tftpd-hpa">, 
because the default tftpd supplied
with RH Linux 6.2 does not support tsize. Installing tftpd-hpa is 
straightforward. Just replace in.tfpd in /etc/inetd.conf with the full path 
to the new tftpd. I would also recommend changing "wait" to "wait.600" or so,
because when all the machines boot at once it will trigger inetd's flood
protection and shut down the tftp service.
	   <p>After this is done, you're going to want to write a 
<htmlurl url="dhcpd.conf" name="dhcpd.conf">. The PXElinux documentation is a little unclear on how to
set up tftp serving. Basically you copy pxelinux.bin into /tftpboot, and then
put your <htmlurl url="pxelinux.cfg/default" name="configuration files"> 
in /tftpboot/pxelinux.cfg/. You can have a seperate config file for each IP,
but unless you have a heterogenious cluster, there isn't much point. However,
I would recommend hard coding the dhcpd.conf file like I did, because if you
have any other machines on the network that try to dhcp, they could steal your
cluster IP's, and thats no good. The best way to start is to dynamically
allocate IPs from the subnet, and then once you get every machine's eth
address from arp, just cut and paste into dchp.conf.
	   <p>Of course, neither floppy nor PXE will work properly unless
you configure your kernel to support these options. The main two options you
need for <htmlurl 
url="nfsroot.txt" name="nfsroot">
are kernel level autoconfiguration (under networking options), and then NFS
filesystem under filesystems (In-kernel, no module), 
and root filesystem on NFS (will appear under NFS filesystem IFF kernel
autoconfiguration is chosen).
In addition, if
you are going to use PXE-style booting, I have noticed that kernels &lt;
2.2.17 seem to give CRC errors. Either comment out the CRC check in
./lib/inflate.c:gunzip(), or get 2.2.17. 
	   <p>As you can see from the kernel append options, the disk attempts
to NFS mount /export/&lt;IP address&gt; off of 10.0.0.1. Technically, you can
have all machines mount a single root directory in /export. I did try this, and
things seemed to work. But be warned, locking in /var will become unreliable,
as will shutting down and certain other system scripts, since redhat keeps a 
list of running subsystems in /var/lock/subsys. So if you have multiple 
systems adding and deleting from that
directory, well it ain't good. The way to create those root filesystem mirrors is to <tt>mount -o  remount -r /</tt> and <tt>/var</tt>, and then <tt>dd if=/dev/&lt;root partition&gt; of=rootdev</tt>,
and likewise for vardev. You can then <tt>mount -o loop</tt> these files, and then cp
-a the mounted directories to all the IPs in /export/. This is the easiest way
I can think of to be able to copy only / and /var.
           <p><htmlurl
url="http://www.instmath.rwth-aachen.de/~heine/nfs-swap/nfs-swap.html#Pitfalls"
name="DO NOT PUT SWAP ON NFS">. Its slow, insecure, unstable, and there are
a lot of kernel layers to go through (ie it takes much more memory to NFS swap than
to swap normally. And guess when you swap..). Simply leave diskless nodes swapless.
	   <p>Now there's just one more step to actually get these nodes to
boot. You have to change some things in the standard RH6.2 init scripts to work 
with NFS root filesystems. You need to <htmlurl url="rc.sysinit.patch" 
name="remove the bootup fsck check"> in
<htmlurl url="rc.sysinit" name="rc.sysinit">, and then you have to
change the order of startup scripts slightly in <htmlurl url="rc3.d"
name="rc3.d"> and <htmlurl url="rc6.d" name="rc6.d"> in order to leave the
networking and RPC stuff up. Also, finally, the <htmlurl url="halt" 
name="halt script"> needs <htmlurl url="halt.patch" name="some tweaking"> in
order to not kill off RPC in the killall phase, and to successfully unmount
the NFS filesystems.
	   <p>Your final fstab file should look <htmlurl url="fstab" name="something like this">.
	   Additionally, you will want to modify each of the
/export/&lt;IP address&gt;/etc/sysconfig/network files to have the correct hostname for the export IP.
	 </itemize>
      </sect1>
      <sect1>
        <heading>Building the SoftType</heading>
	<p>I have only set up SoftType I clusters. If you have any experiance
with MOSIX, Condor, MPI, or PVM, please mail the completed sections (in SGML
if possible) to <htmlurl url="mailto:mikepery@fscked.org" name="me">.
	<itemize>
	 <item><bf>SoftType I - Batch System</bf>
	 <p>In order to set up a batch system, you first need some method of
farming out processes to all nodes in the cluster. Most books mention using 
rsh. If you use rsh on your cluster, I will personally hunt you down and skin 
you alive. With the death of the RSA patent, and the availability of <htmlurl
url="http://www.openssh.com" name="openssh">, there is no reason why you
shouldn't use ssh to run remote jobs, other than pure laziness. Setting up ssh
to do password authentication requires 3 steps. Frist, you must edit your
sshd_config file to set RhostsRSAAuthentication to yes. Then, you have to take
each of the hosts keys from ssh_host_key.pub and prefix it with the
hostname,IP and put this into ssh_known_hosts. Finally, you must add each
hostname to /etc/shosts.equiv, and optionally to /root/.shosts if you want 
passwordless root login and launch. Creating a symlink from /etc/shosts.equiv
to /root/.shosts should work for all implementations.
	  <p>From this point, it is relatively easy to write 
<htmlurl url="scripts/softtype" name="a couple scripts"> that 
will launch a specific command on every node, give the status of the cluster,
or shut it down. On Illigal, I wrote <htmlurl
url="scripts/softtype/sbin/beodown" name="beodown">, which shuts down all
nodes of the cluster but the main one, <htmlurl
url="scripts/softtype/bin/allnodes" name="allnodes">, which executes a command
on all nodes of the cluster, and <htmlurl
url="scripts/softtype/bin/status" name="status">, which just runs uptime on
each node.
	  <p><item><bf>SoftType II - Preemptive Scheduler/Migrator</bf>
	  <p>Again, I don't have any experiance with setting up this type of
cluster, other than to point you at the <htmlurl
url="http://www.cs.wisc.edu/condor/" name="Condor"> and <htmlurl
url="http://www.mosix.org" name="Mosix"> sites. If you end up
building this type of cluster and would like to contribute a blurb for use
here describing the pitfals and gotchas of such a setup, please <htmlurl
url="mailto:mikepery@fscked.org" name="do so">.
	  <p><item><bf>SoftType III - Fine Grained Control</bf>
	  <p>Likewise with fine grained control methods. I actually did try
out  <htmlurl
url="http://www.beowulf.org/software/bproc.html" name="bproc"> on the NCSA
cluster, but I can't remember what was involved. I do know that bproc is meant
only as a method of remote forking (from within C programs). It does provide a unified process
space, but does not do 
load balancing, locking, or synchronization.  So in
my opinion, ssh probably supercedes bproc's features (although not necessarily
in convienience) at the moment. In addition, bproc hasn't had a version update
since I installed that cluster, which was over a year ago.
 	</itemize>
      </sect1>
      <sect1>
	 <heading>Other Notes</heading>
	 <p>In addition to the architectures described above, you may find it
handy to install some method of password synchronization among all the nodes,
such as kerberos or NIS.
I installed <htmlurl
url="http://www.suse.de/~kukuk/nis-howto/HOWTO/NIS-HOWTO.html" name="YP/NIS"> 
on the Illigal cluster. One stumbling block I came across was that I had to
use 10.0.0.1 as the address of the domain server in the yp.conf files 
of the subnodes, since for
some reason it wouldn't use /etc/hosts to resolve the node hostname.
	 <p>Another method is to mount the root node's /etc directory on each
node as well, and them simply symlink that passwd, shadow and group file into /etc.
This might be a bad idea though, because tools might check for symlink
versions of those files for security reasons.
      </sect1>
    </sect>
<!--
    <sect>
      <heading>Using the System</heading>
	<p>FIXME: Is anything even needed here? I would hope that this part
would be obvious to someone planning on building a cluster.a
	<sect1>
	<heading></heading>
	</sect1>
	<sect1>
	<heading></heading>
	</sect1>
	<sect1>
	<heading></heading>
	</sect1>
    </sect>
-->
    <sect>
      <heading>System Maintainance</heading>
	<sect1>
	<heading>Planing for the Future</heading>
         <p>System maintainance is a very important part of cluster design that
most books leave out. When you build your cluster, as a rule of thumb, 
don't do anything "by hand" to any of the nodes when setting them up. Usually, 
following this rule will cause you to do things in a formal scripted or 
packaged manor, thus ensuring easy maintainance.
	</sect1>
	<sect1>
	  <heading>Always Use Package Managers</heading>
          <p>As I said before, I recommend installing every non-distribution
package via <htmlurl url="http://encap.cso.uiuc.edu" name="encap">, including
scripts you write. This will make it easy to handle multiple versions of
programs, and allow you to rapidly &quot;roll back&quot; if something goes
wrong.
          <p>In a ArchType I system, you just install the RPM or encap, and
then wait a day for rsync to take care of things. In an ArchType II system
like Illigal, I recommend putting the RPM in somewhere exported to all nodes
of the cluster, like /usr/src/redhat/RPMS/i386/, and then running 
"allnodes rpm -Uvh /usr/src/redhat/RPMS/i386/rpmname.rpm". Technically you are
reinstaling most of the same files over and over again, but you do need to
update the rpmdb on each node, and you also want to make sure that files in
/lib and /bin get updated too.
	</sect1>
        <sect1>
	  <heading>Adding a Node</heading>
  	  <p>
          <itemize>
           <item><bf>Adding an ArchType I (Synchronized disk) node</bf>
	    <p>FIXME: TODO
	   <p><item><bf>Adding an ArchType III (NFS root) node</bf>
	    <p>To add a NFS root node, assuming that the main node is all
prepaired already, your changes should be minimal, but tedious. 
	    <p>First, you need to give the
machine a root directory to mount in /export/&lt;IP address&gt;. Just copying
an exisiting directory will be fine. Just be sure to modify the etc/fstab and
etc/sysconfig/network to have the proper entries for hostname and IP for the
new node. Be sure to add an entry to /etc/exports on the root node for this 
new directory.
	   <p>Next, you need to determine it's ethernet hardware address and 
give it an IP address in <htmlurl url="dhcpd.conf" name="dhcpd.conf"> (One way to determine the hardware address 
is to boot it first by
uncommenting out that &quot;range&quot; section for the machine to get an IP
dynamically, then booting it and running ifconfig).
	   <p>Finally, you have to tell all the other machines about the node.
This is the tedious part. You have to add the hostname of the new node to all
shosts.equiv files on each export directory, as well as the main nodes
/etc/shosts.equiv. You have to do the same for the etc/hosts files.
          </itemize>
	</sect1>
	<sect1>
	  <heading>Other Notes</heading>
	  <p>On the Illigal cluster, I found it handy to keep the kernel
config files of the nodes in /usr/src/MachineType.kconfig. This makes
upgrading the kernels easier, since you just copy the appropriate config file
to .config, and run make oldconfig.
	  <p>In addition, I also wrote a script called 
<htmlurl url="allexports" name="allexports"> 
which basically is a simple while loop that iterates on each of the NFS export
directories sets a vol variable to each of the export directories. So to do
something like copy in all the kernel modules to each directory, you would to
<tt>allexports cp -a /lib/modules/2.2.17 \$vol/lib/modules/</tt>. You have
to escape that $, or the shell will sub in $vol before the command runs. It
also comes in handy if you need to edit a file on each of the nodes, or even
launch another script (in which case you don't have to escape the $ from 
inside the script). $i is exported as well, and is simply the number 2-&gt;15.
	</sect1>
    </sect>
    <sect>
      <heading>Other Sources of Info</heading>
      <sect1>
        <heading>Where to Find this Document</heading>
	<p>The latest version of this document is at <htmlurl
url="http://fscked.org/writings/clusters/cluster.html">. It is written in
<htmlurl url="http://fscked.org/writings/clusters/cluster.sgml" name="SGML">. 
      </sect1>
      <sect1>
        <heading>Books</heading>
	<p>I've read most of <htmlurl
url="http://shop.barnesandnoble.com/booksearch/isbnInquiry.asp?userid=3MUPB9K7GN&amp;mscssid=GHFQ26F4C0W58K1LPSBKCQBKT91C8JVB&amp;isbn=026269218X"
name="How to Build a Beowulf"> and I was left mostly unsatisfied. My goal in
writing this paper was to do a better job. If you still want to buy this book,
be my guest.
      </sect1>
      <sect1>
        <heading>URLs</heading>
	<p>If most of this document was over your head, perhaps you should
consult the <htmlurl url="http://www.linuxdoc.org/LDP/sag/book1.html"
name="Linux System Administrator's Guide"> and the 
<htmlurl url="http://www.linuxdoc.org/LDP/nag/node1.html" 
name="Linux Network Administrator's Guide">.
	<p>For more info on Beowulfs, consult the 
 <htmlurl url="http://www.linuxdoc.org/HOWTO/Beowulf-HOWTO.html" 
name="Beowulf HOWTO">, the 
<htmlurl url="http://www.supercomputer.org:8080/FAQ/beowulf-faq.html" 
name="Beowulf Mailinglist FAQ">, and<htmlurl url="http://www.beowulf-underground.org/" name="Beowulf Underground">.
	<p>For info on parallel processing in general, consult the 
<htmlurl url="http://www.linuxdoc.org/HOWTO/Parallel-Processing-HOWTO.html"
name="Linux Parallel Processing HOWTO">
      </sect1>
    </sect>
    
  </article>
</linuxdoc>
