Friday, October 28, 2011

How to compile CMAQ in parallel

Recently, Phil and I put together a parallel compilation of CMAQ on nitrate, the recently purchased Linux box. Direct comparison with a serial compilation on the same machine, with the same input data, reveals that CMAQ scales reasonably well on a small number of cores. We observed a 7x increase in run speed by running on all 8 cores as compared to a single core. However, some key modifications need to be made to the installation scripts for successful compilation. These are listed, to the best of my knowledge, below. Please note that this is NOT meant to be a comprehensive guide on how to install CMAQ, nor can I guarantee that I've caught all the necessary changes, since I'm writing this up about a week after the actual install. That said, it may serve as a useful resource for someone who already knows how to compile CMAQ in serial and wants to get it running successfully in parallel.

1. Before installing
Some libraries that work well for serial compilations don't play nicely with the parallel compilation. To save yourself grief later, link these libraries into the CMAQ libraries location:

libnetcdf.a - Make sure this is compiled without DAP support. I have no idea what DAP support is, but it breaks parallel compilations and we don't use it. There is a flag that can be passed to the configure script when installing netcdf that turns it off.

libioapi.a - Surprisingly, a standard version of IOAPI will work just fine with a parallel compilation. IOAPI has a bunch of parallel IO options that you can set when compiling, but CMAQ doesn't use them. CMAQ (at least 4.7.0) is only parallelized for processing, not for file IO, so just use whatever library you used for the serial compilation. Of course, make sure you've properly included the fixed_src folder as you'll need the contents throughout.

libmpich.a - This doesn't have an explicit folder the way IOAPI and netCDF do in the CMAQ installation, but you'll need it for a parallel installation. If it isn't on your system, download and install it.

2. pario
Installing the parallel IO library (pario) is not necessary for a serial installation, but it is necessary to build CMAQ in parallel. Install it as you would any other component (bcon, icon, etc.) by modifying the library paths and the compiler path/flags.

3. stenex
The stencil exchange library has both parallel and serial components. You can get away with just installing sef90_noop for a serial build (built from bldit.se_noop), but for parallel you'll also need to run bldit.se to generate se_snl. It may be possible to skip the installation of sef90_noop if you want to run strictly in parallel, but I haven't tried. In any case, the only difference in installing these two files is that se_snl needs the mpich header file location.

4. Other components
To the best of my knowledge, the installation for m3bld, jproc, icon, and bcon is all unchanged from serial installation. Build these as you normally would.

5. cctm
This is probably where the largest numbers of changes need to occur. Let's break it down into two categories: building and running

Building cctm:

Make the following changes to the bldit.cctm script:
  • Uncomment the line reading "set ParOpt"
  • appropriately set location of mpich in the MPICH variable. Note that this is the top-level directory, and should have include, bin, and lib directories underneath it
  • Change FC from whatever compiler you were using before to mpif90 (provided it is installed on your system). mpif90 is a wrapper compiler that adds in extra flags as needed for compiling parallel programs. Note that this may not be available for MPI implementations other that MPICH
  • Add a flag to F_FLAGS reading -f90=oldCompiler where "oldCompiler" is the compiler you were using before. This makes sure mpif90 wraps the correct compiler.
  • find the line where the script sets the COMP variable. Comment it out and replace with
    set COMP = "intel"

Running cctm:

Make the following changes to the run.cctm script
  • Change the variables NPCOL_NPROW and NPROCS to reflect the number of processors you would like to use and their organization. There should be an example commented out in the file already. Note that the two values for NPCOL_NPROW should multiply to give NPROCS
  • At the very bottom of the file, comment out the line "time $BASE/$EXEC"
  • uncomment the four lines beginning "set MPIRUN", "set TASKMAP", "cat $TASKMAP", "time $MPIRUN"
  • Change the location of MPIRUN to reflect the actual path to the executable on your system (at the command line, run "which mpirun" to find the executable if you don't know where it is)
Make the following changes BEFORE RUNNING

  • There should be a file in the cctm directory labeled "machines8". Open up this file, erase the contents (they are meaningless) and enter in "sysname:num" on each line, where sysname is the name of the system you're working on, and num is a number starting at 1, and 1 larger each line. Put this string on each line, continuing until you've reached the max number of processors. IE for nitrate, the machines8 file looks something like
    nitrate:1
    nitrate:2
    nitrate:3
    nitrate:4
    nitrate:5
    nitrate:6
    nitrate:7
    nitrate:8
You should now be ready to run CMAQ in parallel!