Compiling
The parallelized version of qg model is poorly documented, which greatly increased the difficulty of compiling and running. The model uses FFTW 2.1.5, so need to build it first. The configure line is something like
./configure --enable-mpi --prefix=/lustre/f1/unswept/${USER}/lib/fftw2.1.5_intel/ CC="cc" F77="ifort"
Note that I used cc (intel C compiler) instead of gcc. Also I didn’t use mpi compiler wrappers like mpicc or mpif90 because they are not usable on this platform. There are several modules one need to load before making this configuration. These modules are PrgEnv-intel
and xt-mpich2
.
Then one can continue to build the model. The library information needs to be added to the makefile
INCLUDE += -I$(MPICH_DIR)/include
LDFLAGS += -L/lustre/f1/unswept/${USER}/lib/fftw2.1.5/lib -lfftw_mpi -lfftw
The header file mpif.h
shipped with the model is not workable and caused mpi initialization problem. So I replaced it by including the default header file on my platform.
I learnt some more tweaks on linking along the way. For example, instead of letting linker search for library files, one can instruct the exact file by specifying the full library name without using -l, e.g.,
LIB = -L${MPICH_DIR}/lib/libmpich.so
Another way to do this is
LIB = -L${MPICH_DIR}/lib -l:libmpich.so
A useful command to find out which libraries are dynamically linked is ldd
.
Test runs and benchmark
Set kmax = 1023 (resolution 2048×2048), 1 layer, forced stochastically, integrate 10,000 time steps. Output every 1000 times.
- 1 Core: > 6 hours
- 32 Cores: 64 min
- 64 Cores: 34 min
- 128 Cores: 24 min
Set kmax = 511 (resolution 1024×1024), 2 layer, integrate 50,000 time steps, output every 1000 times.
- 64 Cores: 69 min
- 128 Cores:44 min
Leave a comment