In my study, I have compared the performance of the two machines on identical code that solves the time dependent Laplace equation in the context of two dimensional heat conduction. The problem consists of a square piece of insulating material one meter square with boundary conditions of 400C and 0C on opposite sides. Two separate methods were compared on the two machines, one being an explicit solution to the system of linear equations and the other a simple implicit method known as simultaneous displacements. Both codes employed a 200X200 mesh and used a time step of 5000 seconds.
The Hardware Performance Monitor (hpm) on the Cray uses on-chip counters to determine performance statistics such as number of floating point operations, cpu cycles, and MFLOP (Millions of Floating Point Operations) rate. Using the information from the hpm utility I was able to estimate the performance on the Origin 2000 by referencing the system clock to get run times.
The explicit method ran much faster on both machines due to the fewer number of operations required (~9.4 billion). As a result, the Cray was able to execute the code in 47.4 seconds running at 200.3 MFLOP/s. The Origin 2000 was able to distribute the job over eight of its ten processors and thus was able to run the job in just 9.0 seconds. Since the codes were identical, the same number of operations must have been performed so the Origin 2000 was operating at over 1 GFLOP/s (Billion Floating Point Operations per second).
The simultaneous method required 256 billion operations to complete the job due to the iterative nature of the method. As a result, the Cray machine required 1158.2 seconds to complete the job, but was able to increase its performance to 221.14 MFLOP/s. The new Origin 2000 machine completed the same job in just 207.0 seconds which means that it was performing floating point operations at a rate of 1.23 GFLOP/s.
A fundamental challenge of any numerical method relates to the accuracy of the answer, since in the end all numerical methods are an approximation to the actual solution. Furthermore, different methods and different machines will end up predicting a different answer based on machine architecture and compiler implementation. To test the consistency between the machines, I checked the value of the center mesh point after the programs finished. The value obtained by the explicit method varied by 0.00652% between the machines and the value from the simultaneous method differed by 0.01088%.
[an error occurred while processing this directive]