Next Previous Contents

1. Introduction

1.1 What is a Beowulf Cluster?

Essentially, any group of Linux machines dedicated to a single purpose can be called a cluster. To be called a Beowulf cluster, all you need is a central node that does some coordination (Ie, technically an IP Masq'ed network fits loosely into the Beowulf definition given by the beowulf mailing list.)

1.2 What can a Beowulf Cluster do?

A beowulf cluster is capable of many things that are applicable in areas ranging from data mining to reasearch in physics and chemistry, all the way to the movie industry. Essentially anything that can perform several semi jobs concurrently can benifit from running on a Beowulf Cluster. There are two classes of these parallel programs.

Embarrassingly Parallel Computation

A Beowulf cluster is best suited for "embarrassingly parallel" tasks. In other words, those tasks that require very little communication between nodes to complete, so that adding more nodes results in a near-linear increase in computation with little extra synchonization work. Embarrassingly parallel programs may also be known as linearly scalable. Esentially, you get a linear performance increase for each additional machine with no sign of deminishing returns.

Genetic Algorithms are one example of a application of embarassingly parallel computation, since each added node can simply act as another "island" on which to evaluate fitness of a population. Rendering an animation sequence is another, because each node can be given an individual image, or even sections of an image to render quite easily even using standard programs like POVRAY.

Explicitly Parallel Computation

Also applicable are explicitly parallel programs, that is programs that have explicity coded parallel sections of their code that can run independantly. These differ from fully parallelizable programs in that they contain one or more interdependant serial or atomic components that must be syncroninzed across all instances of the parallel application. This synchonization is usually done through libraries such as MPI, PVM, or even extended versions of POSIX threads and System V shared memory.

1.3 What can't a Beowulf Cluster do?

A common misconception is that there is a specific set of beowulf software that you can run on a group of Linux machines, and viola, everything runs faster. This is not the case. While there is software to turn a cluster of Linux workstations into what seems like a single computer with a single process id space, this is impraticle for a single user setup, as even the most hardcore poweruser can only run (otherwise unparallelizable) CPU intensive jobs for a short period of time before letting the machine fall idle. Also, the way these systems are implemented also makes them impracticle even to speed up a simple "make -j".

So essentially, when thinking about implementing a Beowulf cluster, the two questions to ask are "Are lots of independant programs going to be running" or "Is the extra speed worth rewriting my programs? Are they even parallelizable?" "Are my programs CPU bound?" All Beowulf configurations will slow down disk or memory-bound applications.


Next Previous Contents