There are several ways to design a Beowulf. I'm going to try to break down the types by network, hardware, and software configuration, since you really can pick one from each. Under each section will be a quick blurb about the best price/performace technology to use. Please note that the terms I use here are my own, so I can refer to them easily. (Also, most of them I made up cause they sound cool ;)
For all cluster types, I would recommend switched 100BaseT as the price/performance sweet spot. Gigiabit switches and cards are roughly 10 times more expensive currently than 100BaseT, and for embarassingly parallel jobs, you won't need this bandwith, as your communication should be minimal. If you are implementing a MOSIX-based cluster, Gigabit networking might be something to consider, as MOSIX is more network bound than your typical GA.
In this configuration, you basically have one single entry point to the cluster, ie one monitor and keyboard, and one external IP address. The rest of the cluster is usually then behind a normal IP masqed setup. Users are usually then encouraged to login only to the main node, and spawn remote jobs with only ssh.
In this configuration all nodes are equivalent from a network standpoint. They all have external IP addresses, and usually all have keyboards and monitors. Usually this configuration is chosen so that nodes can also double as desktop workstations, and thus have external IP addresses in their own right. If you don't need to do this, doing a NetType I configuration is recommended.
For all cluster types, I recommend x86 hardware. As shitty as it is, it really is the best price/performance ratio, especially when you're dealing with near-linearly scalable algorithms. For example, even if a DEC Alpha 21264 was 3 times faster than an AMD K7T (which it is not), you can buy about 4 Thunderbirds for the price of one Alpha. (Trust me, I looked. Thunderbirds run as low as $900 a piece with 256 RAM, while a 21264 will run you at LEAST $3600, and thats if you find a half dozen stolen ones on eBAY).
In this configuration, all nodes have local disks. In addition, these disks are kept in sync nightly by an rsync job that updates pretty much everything except /var, /tmp, and /etc/sysconfig. In this configuration, extra scratch space and /home can be optionally NFS mounted across all clusters.
In this configuration, all nodes have local disk, but they are not kept in sync. This is most useful for disk-independant embarrassingly parallel setups that merely do number crunching, and no disk-based syncronization.
This configuration is most useful for those who wish to save money on disks for all nodes, and wish to avoid the headaches of having to keep a few dozen to a few hundred disks in sync. This option is actually quite a reasonable choice, especially for programs that need some disk syncronization, but aren't otherwise disk-bound.
Now this is the important part. Choosing your software essentially determines the use of the cluster. For example, you don't want to go full fleged with MOSIX if all you are doing is setting up a rendering farm or a GA.
Basically a batch system is one where you just send it a job, and it does it. Usually only one job is run at a time, and job scheduling is left to the programmer/job runner. Sometimes a queue or at least a launching script is provided through some simple bash scripts, otherwise remote processes are launched manually one at a time on each node, or through ssh. Needless to say, this is the easiest software type to set up, and also the least overhead. It is the Recommended Buy (tm) for embarrasingly parallel apps.
This next class contains systems that automatically schedule and migrate processes based on cluster status. This kind of setup really is geared more towards those who just want to set up a cluster for a general mass-login system, rather than those who want to do distributed programmnig. The two software packages available that provide this ability are Condor and Mosix. Condor isn't officially Open Source yet, and Mosix seems far more full featured for this purpose. In fact, Mosix allows you to even build a NetType II/ArchType II style cluster and still use each node for cluster jobs. In addition, Condor places a lot of limitations on the types of jobs that can be run across the entire cluster, where as Mosix is meant to be entirely transparent. Plus Mosix is Open Source :) Technically neither of these technologies is part of the traditional "Beowulf" scheme, but they are significant enough that they must not go without mention in any Beowulf document.
The last beowulf software implementation class actually has some overlap with SoftType I. This is the Fine Grained Control section, where individual programs themselves control the synchronization, load balancing, etc. Often times these jobs are launched through a SoftType I method, and then syncronized using one of the standardized source-level libraries already available. These include the industry standard MPI and PVM. Also availble are an extention to SysV IPC, called DIPC, and a more control-oriented remote-forking system called bproc. Of course, to utilize this method, you must (re)write your software to use one of these libraries.