Running Jobs on lakemead using mpi

Users are allowed to run test jobs on the development node. If you have a serial code, code that uses a single processor, i.e., not a parallel job:
time ./a.out < input >& output &

To run a test job using MPI:
time mpirun -np 2 my-mpi.x

You can also create your own file --my-hosts-- with a list of hosts: time mpirun -np 9 -machinefile my-hosts my-mpi.x

In this particular example, one process will run on lakemead (master) and the other eight processes (slaves) will run one on each node.

You can also run jobs that create two processes per node. To maximize the amount of local memory that each process will use, set the variable:
P4_GLOBMEMSIZE

Then you can run your job using your own machine file or in batch mode let the SGE Utility assign the processes.

Using the mpirun flag -nolocal

MPICH usually assigns the "MPI driving" process to the master node and then creates (n-1) tasks on available nodes. However, it does not by default create a running process (task) on the master node. Thus, there will beone process running on the master node and (n-1) processes running on the slave nodes.

For interactive jobs (testing purposes) adding the flag -nolocal to the mpirun command will create all np processes on the slave nodes and none on the local node (node where the mpirun command was typed).

mpirun -np 8 -machinefile my-machines -nolocal my-mpi.exe

For MPICH batch jobs, using the -nolocal option will have a different effect. By default, MPICH using the SGE utility will create an MPI driving process on the master node and then SGE or the PE will create (n-1) tasks on the slave nodes. This has the effect of creating 3 processes on one dual node, which will affect the parallel efficiency of your job. The addition of the -nolocal flag will correct this problem.

mpirun -np 8 -machinefile $TMPDIR/machines -nolocal my-mpi.exe

For production jobs please the the section about batch jobs.


©2010 NSCEE