Appendix B: Processing Resource Applet
Appendix C: Control Server Source Code
There's no single way to implement a distributed computing network. To at least have guidelines for moving forward, we declared the following for ourselves:
We will use the Python and Java programming languages.
The technologies and protocols used in our solution must be universally available, or nearly so.
The primary processing resources used by our solution will be ordinary desktop or laptop computers.
The components in our solution must be able to communicate over the Internet, without any requirement for custom configuration.
Our solution will solve a difficult computational problem, as proof of the concept.
We selected the Python programming language, for its extensive libraries, easy-to-read syntax, and because it is easy to learn and program in. As we are novice programmers, we needed these benefits to complete our project on time. Also, the wide availability of Python on Apache web servers, via the Common Gateway Interface (CGI), makes this a natural fit to our project.
We also decided to use Java – specifically, Java applets – in our solution. This way, a processing resource would be able to run the necessary calculations to solve our problem, simply by the act of the user browsing our web pages. An applet is a small Java application that executes in a "sandbox" within a browser[3]. This environment helps protect the user from malicious code. Java and browsers are extremely universal, and are most likely on a computer when its shipped by the original equipment manufacturer.
Other distributed computing projects, such as GIMPS and SETI@Home, generally use lower-level languages such as C or Assembly Language. Programs in these languages can be compiled for several different architectures and operating systems to achieve compatibility. However, when this code is highly optimized, fully porting it to different hardware can be very time consuming and expensive; for this reasons and others, some of these systems haven't yet been compiled for all current operating systems. In using Java for our processing resources, we are consciously trading some level of performance for broad portability.
Compatibility and interoperability is always a concern on heterogeneous networks such as the Internet. Different operating systems have different standards, and execute programs differently. In addition to the use of Java applets, we decided to deal with this by moving anything that isn't totally universal (e.g. the Python code) to the control server, implementing it via Apache CGI scripts. This allowed it to be accessed by anyone, regardless of operating system.
Similarly, we decided to implement the persistence portion of our solution with the MySQL database, (which has an open-source edition, available on most server platforms in wide use), and limit access to the database to the CGI scripts.
We wanted to use desktop or laptop machines, because there are an almost unlimited number of them, and they are getting faster and faster every year[4]. It's now possible (albeit still challenging) to build a distributed processing network that is equivalent (in many ways) to a supercomputer, made up entirely of consumer-level computers that volunteer to participate.
For a method of two-way communication between the resources and the server, that wouldn't be blocked by most networking hardware, we decided to us eXtensible Markup Language Remote Procedure Call (XML-RPC). XML is a way of serializing data independent of computer language, to enable (among other things) transmission and processing of data across application and system boundaries. Remote Procedure Calls[5] are requests to a remote host, to execute a function. XML-RPC is an XML-based protocol that uses the HTTP protocol for its transport. Since HTTP is the primary transport protocol used by web servers and browser, XML-RPC requests and responses can pass through any networking hardware and software that allows web traffic.
For our problem, we chose to search for Mersenne primes, because it was a topic of interest that was and is quite suitable for distribution. Mersenne primes are prime numbers of the Mersenne sequence, which is determined by 2n - 1, where n is a positive integer. Mersenne primes are used in random number generation, public-private architecture and rotor encryption, hash mapping (i.e. databases) and various low-level storage functions in computer science[6]. Large Mersenne primes are easier to find than ordinary large primes, because, we can use known primes as seed numbers[6], and because we can more easily determine their primality via the Lucas-Lehmer test (see Model).