bit-tech.net

Legacy content from www.custompc.co.uk

Distributed computing

You might just use your bog-standard 2.4GHz PC for writing inane messages to your friends, but it could become the most powerful computer in the world, or at least part of it. Ian Betteridge explains how

How much data each client handles depends on the project. In the SETI@home project, for example, each chunk of data sent to you is around 340KB in size, and this can be sent to you in less than five minutes, even if you're using a modem (usefully, the SETI@home software will even disconnect your modem if you're not using it for anything else). Each chunk of data requires roughly three trillion floating point operations (3Tflops) to complete, which means that - if you devoted your entire CPU to it - a 2.4GHz Pentium 4 would complete a chunk of data in around three hours. Of course, in the real world you're going to be working on your machine for some of that time, and it will have other processes to handle, so how fast you manage to process that single chunk depends entirely on what you're doing.

Once the client has done its job, the completed chunk is passed back to the main servers, but that isn't the end of the story. All distributed projects require a measure of post-client processing to weed out potential errors and collate the data. Data error can actually be a real problem to distributed projects; although computers are incredibly reliable, distributed projects use so much CPU time that errors will creep in. Even if an average CPU makes a calculation error once every 1,018 operations, because a distributed project uses many years of CPU time every day, several errors will be returned by the thousands of CPUs involved. Add in the inevitable errors from broken Internet connections, proxy problems and disk failures, and there's potential for a major problem.

The simplest and cheapest way to reduce errors is redundancy, so it's no surprise that this is the method that distributed projects most commonly use. Every chunk of data is sent out to several clients, and the values are then crosschecked to ensure accuracy.

Add enough clients, and you have a very powerful computer. According to Vijay Pande, assistant professor of chemistry at Stanford University and a key figure in both the Folding@home and genome@home distributed projects, the 100,000 CPUs active in the Folding@home project at any one time are roughly ten times more powerful than the world's fastest supercomputer, Japan's Earth Simulator. And, of course, they only cost Stanford the price of developing the software instead of hundreds of millions of dollars, which is the main appeal for university researchers. In fact, according to Pande, 'For us, this was the only way to get our work done. We don't do distributed computing for the coolness factor.'

Different setups
Whether you want to max-out your contribution to science or just get to the top of the leaderboard, you'll want to maximise your PC's performance for distributed computing tasks, so how do you make sure you get the most from your setup?

As we've seen, all distributed computing projects are essentially number-crunching tasks, so the biggest factor involved is your CPU speed. And, unsurprisingly, this is an area where 64-bit CPUs really come into their own. For example, using SETI@home with the excellent SETISpy benchmarking application (www.cox-internet.com/setispy), an Itanium 2 achieves a sustained throughput of 798Mflops/sec, easily beating a Pentium 4 overclocked to 4GHz, which manages only 753Mflops/sec. By comparison, a Pentium III running at 1.33GHz achieves a mere 217Mflops/sec.

Subscribe to Custom PC