1.4.3.3. OpenMPI with CVMFS¶
This is similar to the previous example but with CVMFS instead.
1.4.3.3.1. Deploying MPI applications using the provided wrapper¶
Create a file mpi.ini
:
universe = parallel
executable = /opt/simonsobservatory/cbatch_openmpi
arguments = env.sh mpi.sh
machine_count = 2
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = env.sh,mpi.sh
request_cpus = 16
request_memory = 32999
request_disk = 32G
# contraining CPU to match the environment using in env.sh
# Requirements = (Arch == "INTEL") && (Microarch == "x86_64-v4")
# currently the only attributes that is exposed at Blackett is
Requirements = Arch == "X86_64"
log = mpi.log
output = mpi-$(Node).out
error = mpi-$(Node).err
stream_error = True
stream_output = True
queue
Note that it calls a wrapper script cbatch_openmpi
.
Note that this script takes 2 arguments, both are also scripts, where the 1st one setup the software environment and the second one runs the MPI application.
In the first file env.sh
,
#!/bin/bash -l
# helpers ##############################################################
COLUMNS=72
print_double_line() {
eval printf %.0s= '{1..'"${COLUMNS}"\}
echo
}
print_line() {
eval printf %.0s- '{1..'"${COLUMNS}"\}
echo
}
########################################################################
CONDA_PREFIX=/cvmfs/northgrid.gridpp.ac.uk/simonsobservatory/pmpm/so-pmpm-py310-mkl-x86-64-v3-openmpi-latest
print_double_line
echo "$(date) activate environment..."
source "$CONDA_PREFIX/bin/activate"
print_line
echo "Python is available at:"
which python
echo "mpirun is available at:"
which mpirun
We see that it is basically preparing for the software environment following CVMFS.
Note
See Continuous Deployment (CD) for tips on which CVMFS environment to choose.
The reason this wrapper script has such an interface is because MPI is part of your software environment. Only after you loaded this environment (where you can change to any OpenMPI installation you want as long as it is OpenMPI), the wrapper script can continue to start the OpenMPI launcher to prepare for you to run mpirun
later.
Then in mpi.sh
,
#!/usr/bin/env bash
# helpers ##############################################################
COLUMNS=72
print_double_line() {
eval printf %.0s= '{1..'"${COLUMNS}"\}
echo
}
print_line() {
eval printf %.0s- '{1..'"${COLUMNS}"\}
echo
}
########################################################################
print_double_line
set_OMPI_HOST_one_slot_per_condor_proc
echo "Running mpirun with host configuration: $OMPI_HOST" >&2
print_double_line
echo 'Running TOAST tests in /tmp...'
cd /tmp
mpirun -v -host "$OMPI_HOST" python -c 'import toast.tests; toast.tests.run()'
Here set_OMPI_HOST_one_slot_per_condor_proc
, provided within the wrapper script, is called to set OMPI_HOST
. There are 2 such bash functions provided. Be sure to read the cbatch_openmpi
documentation to know which one to choose. The recommended setup for Hybrid MPI such as MPI+OpenMP is to use set_OMPI_HOST_one_slot_per_condor_proc
such that each HTCondor process is one MPI process.
Note the use of mpirun -host "$OMPI_HOST" ...
, which uses the prepared OMPI_HOST
to launch the MPI processes.
Warning
When writing these scripts such as env.sh
and mpi.sh
, note that in parallel universe in HTCondor, the executable
is run in the single program, multiple data (SPMD) paradigm. This is very different from what you would do in SLURM’s batch script for example.
As a concrete example, if there’s a line echo hello world
in your scripts, in each HTCondor process, echo hello world
will be run once, and there corresponding mpi-?.out
files will each have a hello world
there.
So when env.sh
is run, in each of the HTCondor process, the software environment is being preparing individually. This is important, as all processes should shares exactly the same software environment to launch an MPI program.
Lastly, submit the job as usual by
condor_submit mpi.ini
After waiting for a while as the job finished, you can see what happened by reading the contents of log
, output
, and error
as specified in the ClassAd.
See Monitor your jobs to see how to monitor the status of your job. For advance use, use this command instead,
condor_submit mpi.ini; tail -F mpi.log mpi-0.out mpi-0.err mpi-1.out mpi-1.err
and see Streaming stdout & stderr with tail for an explanation on what it does.
Warning
It is known that when running the TOAST3 test suite with OpenMPI using our provided software environment has some failed unit tests. We are investigating and will be fixed in the future.