Tapping the Grid
Interview with David Anderson
For David Anderson, Project Leader for the SETI@home distributed computing program, one of his favorite quotes is from the German poet, Rilke, who describes how difficult it can be to see the future. "As people were long mistaken about the motion of the sun, so they are even yet mistaken about the motion of that which is to come. The future stands firm . . . but we move in infinite space. How should it not be difficult for us?"
When the screensaver project was conceived at a Christmas party in 1994, the future of large-scale, distributed computing seemed entirely unproven. In the ensuing nine years, their team's task was to prove it could be done.
|"In the 90s there was a gradual transition from where computers were busy ... all of the time ..to where they were almost never busy." --David Anderson
To do it, they had to answer naysayers, wait for the explosive rise of personal computing as a vehicle to displace the mainframe with the network itself, and finally, to raise the needed funds. Compared to the approximately two million dollars raised thus far--from the Planetary Society and a variety of private and public sources-- the equivalent supercomputing mainframes cost in the hundreds of millions of dollars. The core of the project's success has been a vast network of untapped resources on ordinary computers--the Grid.
Backing up this grid are 4.7 million voluteers who own those PCs. The volunteers now routinely donate their spare processing time to the search for interesting radio signals from deep space. As of August, the volunteers had processed their billionth job (called work units). At first glance, the project's path might seem to cut against the predictions of Rilke: the SETI@home team, despite the challenges, has seen the future.
Distributed from the U.C. Berkeley's Space Science Laboratory, the SETI@home screensavers process in a single day what would otherwise take thousands of years to uncover in their search for extraterrestrial intelligence (SETI). Anderson describes the search as looking for the 'faint whispers of another civilization.'
Since its public launch in May 1999, computer owners in more than 226 countries have chipped in. [The United Nations only includes 191, and nearly two-thirds are developing countries, with few spare computers laying around unused]. Anderson has explored parts of the globe, and more than a few civilizations, just via contributors to his project. On the very day SETI@home launched in 1999, their file server was overwhelmed by just this kind of unexpected, high-demand burst: Hello World! Since then, SETI@home has performed 1.6 million years of computer processing time. After four and half years, this computing capacity continues to double every year.
Among all the many ways to describe technically how much data has been analyzed, an often overlooked aspect of their network approach is seldom mentioned--its apparent human efficiency. For instance, ask any system administrator today to staff an untried network to handle 4.7 million computing nodes, and they may give you a rough labor figure equivalent to a battalion of programmers and network specialists. But Anderson's most remarkable feat may well have gone unnoticed. The team manages to sustain about 0.1% of the world's total computing capacity, with as few as six programmers and system administrators [photo of team].
Emblematic of his task with SETI@home, Anderson is well-suited as a mountain climber. He is also a pianist, occasionally hosting house concerts (Chopin and Hugo Wolf), and holding a long-time interest in computer music. If decipering telescope signals from radio noise was not enough of a challenge in problem solving, he completed all of the Atlantic Monthly Puzzlers , for the last several years--in ink.
Astrobiology Magazine had the opportunity to talk with Dr. David Anderson, about the hugely successful model that the SETI@home project has demonstrated and his own future plans in scientific computing.
Astrobiology Magazine (AM) : David Gedye was a graduate student of your's at Berkeley, when the opportunity to look deeper into practicalities of distributed computing came to the front. Any synopsis of how the screensaver itself took hold as an interface, how PCs become the virtual supercomputer, and the beginning of the project from your perspective?
David Anderson (DA): In the 80s and early 90s there was a gradual transition from a situation where computers were busy most or all of the time - like shared mainframes, or early slow PCs that labored to keep up with basic tasks - to a situation where they were almost never busy. Typing 90 words per minute at Microsoft Word uses less than 1% of even a slow Pentium chip. Economy of scale led to microprocessors getting faster at a greater rate than "big" computers, and in the 90's they took the lead. Almost every "supercomputer" built since then has been based on Pentium-type chips.
David Gedye came up with the idea of the SETI application during a conversation at a Christmas party in 1994 in Seattle. He's primarily responsible for getting the project off the ground - it took several years, and it was discouraging, knowing that we had a great idea but not being able to get any support for it.
AM: Do you have a particular moment from the history of SETI@home, when you were astonished at how active the network had become?
DA: The first time that we got 1,000 years of CPU time in one day. That was mind-boggling, and still is.
AM: What is a bad computer day at the Berkeley home server, something like a hard disk or memory failure? Are there bad days?
DA: The worst days have involved database corruption. This requires, in some cases, restoring the database from a backup tape and attempting to reconstruct the missing data; a very labor-intensive operation, and one that puts us off-line for a long period.
AM: You gave a lecture around 2001 at a peer-to-peer (P2P) computing conference in San Francisco, just around the time SETI@home (and to some degree, the music exchange, Napster) started to demonstrate the power of networks in different ways. You mentioned two things that stuck out about your experiences. First that you were surprised about how competition for the most data units processed became fuel for software adoption. Could you comment about how that contest component really took off?
|Work units move from the radio telescope, Arecibo's computer room (top) to Berkeley's unit distribution server (middle) to some large corporate users (bottom), in this case processing data on a large Beowulf supercomputer cluster (Annexion).
DA: Credit (work unit totals) is important for many reasons. A large class of users is not competitive as such, but they want evidence that their help is making a difference. If our software has a bug that occasionally keeps them from getting credit for a completed work unit, that's a big deal to them, and we give such bugs top priority.
The "top users" leader board has always been dominated by organizations (e.g. computer companies) that own lots of computers and run them all under the same account. For this reason, teams became the focus of competition; it gives every user a chance to belong to a "top 10" entity.
Teams have also played a role in adoption - team members recruit new users to boost their team's standings.
AM: You also mentioned something about operating systems, that one of the most challenging early problems was getting the Mac screensaver going--and that one of the specialized programmers [Charlie Fenton] who jumped in was able to make SETI@home cross-platform for PCs and Macs. Is that correct, and any further comments on the challenges from a computer standpoint of interoperability in today's network projects?
DA: There's a standard called Portable Operating System Interface (POSIX) that provides a common operating-system interface, so that you don' have to write a separate version for each system. Unfortunately, Windows and pre-OS X Mac implement only part of POSIX, and do a poor job of it. Also unfortunately, POSIX includes only low-level things like file and network communication; higher-level features like the graphical user interface (GUI) and screensaver are totally system-specific.
AM: And that will change with the Berkeley Open Infrastructure for Network Computing, BOINC, which is designed to allow other science projects to access the unique SETI@home architecture?
DA: There's a major improvement in BOINC: the application graphics are done using OpenGL, which is standardized across platforms and also is able to make use of graphics co-processors. So SETI@home on BOINC will have much nicer looking graphics, and they'll use less CPU time
AM: What is a good day on the Grid? When a bunch of new data gets pushed out for analysis?
DA: Normally data gets generated and distributed at a constant rate. The good days are when we get some new software working. We've had some good days recently involving BOINC.
AM: Can you speak about your plans for rolling out BOINC? In particular the possibilities with biology such as protein folding and gene mapping or climate prediction. Any others that have been suggested?
DA: There's another project involving gravity-wave detection and LIGO. [Two completely independent gravity detectors located in Washington State and Louisiana, the Laser Interferometer Gravitational-Wave Observatory (LIGO) is essentially a giant strain gauge, designed to detect gravity waves for the first time].
I'm personally extremely excited about the possibilities of BOINC, both for scientists (who now have a chance to do previously unfeasible computations) and the public (who will soon have a smorgasbord of projects available to them).
AM: You have taken the concept demonstrated on SETI@home to a start-up (United Devices). What kinds of projects are planned or ongoing there, that are different from what a volunteer network can provide?
DA: United Devices and similar companies (most notably Entropia) found that there wasn't a business in reselling public computer time. There aren't enough paying customers for it - pharmaceutical companies, for example, aren't interested because of security concerns - and it's hard to convince people to volunteer their computers for profit-making activities. So these companies have switched to the "corporate intranet" market - letting technology companies use their own desktop PCs for their own R&D computing.
AM: Gateway, Inc. has some kind of network set-up (with United Devices) for selling processing on computers hooked up between retail stores. Another highly successful company in Houston was using a computer warehouse to run jobs on pre-sold computers prior to shipment. For the ten or so days-- while the computers were just inventoried-- they tapped them to process geophysical data for the oil industry--mainly seismic stuff finding oilwells to drill. Their project solved two problems: stale inventory in a rapidly changing technology business, and state-of-the-art calculations otherwise unobtainable--so two 'wins'. Can you comment on the whole nature of how to use PCs like movable components, borrowing cycles, and how that changes the world of supercomputers as big monolithic devices, costing hundreds of millions of dollars?
|Serving from the big screen at the O'Reilly 2001 Peer-to-Peer conference, in San Francisco, to many human peers in the audience.
DA: There's a huge spectrum of supercomputing problems. Some of them require frequent, high-bandwidth communication between processors. For these, supercomputers (like the IBM Accelerated Strategic Computing Initiative, ASCI, series) will always be best.
"Public distributed computing" (my name for what SETI@home does) only works when the data-to-computing ratio is low, and high latencies are tolerable.
AM: Are you surprised by some of the advantages of doing these kinds of piggy-back cycles in this way--was this sort of bazaar every dreamed of when researchers first began to understand the power of network sharing, or did it just sort of evolve?
DA: When I started doing computer science research, in 1985 or so, I was mystified by the research world's obsession with shared-memory multiprocessors, which were expensive to build and hard to program.
Why build new hardware when we already have a network (the Internet) and processors (desktop PCs)?
So working on SETI@home - and now BOINC - is especially fun for me because it lets me put into practice the ideas I was thinking about 15-20 years ago.
The SETI@home team has plans to end the current incarnation of SETI@home in a year or so, to be followed by an enhanced capability. They summarized the project's plan for 2003 and beyond: "Due to the incredible response we will be able to extend SETI@home past its initial two year life span. We're planning for SETI@home II now. We may increase our radio band coverage at Arecibo by adding another recorder system. We may add a recording system to a telescope in the southern hemisphere so we can see an entirely different part of the sky. We'll also add new features to our web site showing more details of the process of the analysis process, and show in more detail your personal contribution to SETI@home".
The Planetary Society has been a primary sponsor of SETI@home since 1998.
Related Web Pages
Stellar Countdown Yields Skymap
Berkeley Open Infrastructure for Network Computing, BOINC
Terrestrial Planet Finder Home Page
What Does ET Look Like from 40 Light Years Away?
Anybody Out There? Part I
Anybody Out There? Part II
Search for Life in the Universe: Neil deGrasse Tyson Interview
Aliens Depend on Time to Grow Brains
Rare Earth? Are we so special