Appendix B: Processing Resource Applet
Appendix C: Control Server Source Code
Distributed computing is the practice of dividing computational tasks among resources on a computer network. This has long been used to solve extremely difficult problems or process complex data, such as the Search for ExtraTerrestrial Life at Home (SETI@Home), the Great Internet Search for Mersenne Primes (GIMPS), distributed.net, and the of cracking two of the last three Enigma Machine encrypted messages, which was accomplished by a distributed computing network in February and March of 2006[1]. There are two bare essentials: networking, to communicate, and persistence, to store results and keep state.
Networking must be provided by physical links between computers, as well as software drivers and protocols to communicate over them. Persistence must be a software-based solution, with one or more processes running on a accessible computer at all times. In our project, networking is provided by the Internet and the TCP/IP protocol suite. Persistence is provided by both a database, and the control server.
Distributed computing began in the 1960's and 70's, when the Internet began[2]. In fact, it was one of the key reasons TCP/IP and the Internet was created; to link the disparate and expensive military computer systems, to get more use out of them[2]. As networking technology advanced, distributed computing became easier, more reliable, and offered greater rewards. Some of the early pioneers of distributed computing were, in fact, network worms[2]. The Creeper worm would use unused CPU cycles, to find its next target, and attack. Later, programmers at Xerox's Palo Alto Research Center would, in 1973, use the same idea to render graphics, by having a virus infect almost 100 computers on their local area network[2].
Its harder to implement distributed computing over a public network, as we are doing, for all the reasons that make any action on the Internet more difficult. For a start, you don't know how long a resource will be able to function. Since we don't own or operate the remote host, we aren't at all responsible for it uptime or downtime. We must implement systems for a resource to verify its connection is still active, by posting an update.
Since we also can't control the configuration of their routers and gateways, we must use protocols and methods that are almost always allowed by networking hardware.
While these pose major challenges that aren't faced in conventional distributed computing, there are generally more advantages than disadvantages; nearly endless computing power available, extremely low cost, and ease of use, for instance.
On a conventional distributed computing network, you don't have to worry about giving any one machine too much work. On our volunteer network, we've had to limit the jobs to something manageable.
Another challenge is firewall and gateway settings. We don't know what kinds of connections the networking hardware will allow. We can predict, however, that almost all of them will allow HTTP connections. That's why we have our resources interact with the control server, which, in turn, communicates with the database.