Achieving Good Performance
Step 1: Get the right answers: While this may seem obvious, it is quite easy to forget to check the answers along the way as one is making performance improvements. A lot of frustration can be avoided by making only one change at a time and verifying that the program is still correct after each change.
Step 2: Use existing tuned code: The quickest and easiest way to improve the performance of a program is to link it with libraries already tuned for the target hardware. In addition to the standard libraries, such as libc, whose hardware-specific versions are automatically linked in with programs compiled and run on the Origin2000 system, there are other libraries which may provide substantial performance benefits.They include: complib, complib sgimath and sgimath.
Step 3: Find out where to tune: When confronted with a program composed of hundreds of modules and thousands of lines of code, it would require a very heroic, and very inefficient effort to tune the entire program.Tuning needs to be concentrated on those few sections of the code where the work will pay off with the biggest gains in performance.These sections of code are identified with the help of a profiler. The hardware counters in the R10000 CPU make it possible to profile the behavior of a program in many ways without modifying the code. Profiling tools include:
Given that the various optimizations the compiler will perform can interact with each other, it is impossible to provide a simple formula specifying which optimizations should be attempted in which order to achieve the best results in all cases. Nevertheless, we can make some recommendations which work well in general. A good set of compiler flags to use are the following:
-n32 -mips4 -Ofast=ip27 -OPT:IEEE_arithmetic=3
In addition, when linking in math routines, be sure to use the fast math library:
Step 5: Modify the code for better cache utilization: For a cache-based system, such as the Origin2000, the optimizations which have the greatest potential for significant performance gains are those which improve the program's utilization of the cache hierarchy. In the MIPSpro 7.x compilers, this class of optimizations is known as loop nest optimizations, and they are performed by the loop nest optimizer, or LNO.
It should be pointed out that if you are using the flags recommended above, you are already using the LNO since it is enabled whenever the highest level of optimization, -O3 or -Ofast, is used. For the majority of programs and users, this is a great benefit since the LNO is capable of automatically solving many cache use problems.
Ranges and Precision
Here are the values for the double data type:
[an error occurred while processing this directive]