ARC and Singularity

From neicext
Jump to navigation Jump to search

Singularity

Singularity is a "mobility of compute" project from LBL: http://singularity.lbl.gov/

It uses containers and filesystem images to enable users to bring their own custom OS environment to HPC clusters.

WLCG usecase

Since the LHC experiment code is rather limited in what OS they support, singularity provides a nice way for sites to provide the old RHEL-deritvatives needed without putting limitations on what to run on cluster nodes.

ARC integration

Work in progress (2017-04-06), with an active discussion in how to integrate.

The RTE ENV/SINGULARITY/CENTOS6/ATLAS is a signal that the job wants a site-provided ATLAS image based on CentOS6.

This is handled by adding the following to the submit-SLURM-job script, after memory requirements but before umask override section:

. ${pkgdatadir}/choose-slurm-chroot.sh >>  $LRMS_JOB_SCRIPT

The business part of the choose-slurm-chroot.sh script is (and needs to be adjusted to local preferences):

for os in "ENV/SINGULARITY" "ENV/SINGULARITY/CENTOS6/ATLAS" "ENV/GENTOO" "ENV/SLC5-X86_64" "ENV/SL6-X86_64" "ENV/SL5-X86_64"
do
     joboption_num=0
     eval "var_is_set=\${joboption_runtime_$joboption_num+yes}"
     while [ ! -z "${var_is_set}" ]; do
        eval "var_value=\${joboption_runtime_$joboption_num}"
        if [ "$os" == "$var_value" ] ; then
          chr=$os
        fi
        joboption_num=$(( $joboption_num + 1 ))
        eval "var_is_set=\${joboption_runtime_$joboption_num+yes}"
     done
done

case "$chr" in
  "ENV/SINGULARITY/CENTOS6/ATLAS" )
    echo 'if [ -z $SINGULARITY_CONTAINER ]; then'
    echo '  exec /usr/bin/singularity exec -B /arc,/etc/grid-security/certificates,/var/spool/slurmd,/cvmfs,/pfs/nobackup/,/afs/hpc2n.umu.se/,/sys/fs/cgroup,/import /pfs/nobackup/data/containers/centos6/atlas.img $0'
    echo 'fi'
    ;;
 * )
   ;;
esac

Limitations

It is sometimes difficult to run a newer OS container on an older base OS, particularly if a newer glibc requires features from a newer kernel than the node is running. The obvious solution to this is to run new stuff on the hardware, and provide containers for user programs that need old environments.

Before stable support of overlayfs all the bindmountpoints (-B above) need to exist inside the image, which severely limits the possibility to share the same image between sites. Once this is solved (all sites upgraded to OS with modern kernels, where modern is somewhere 4.0-4.4) ATLAS might provide golden images centrally that everyone can just use.

Also, on old kernels (centos6, ubuntu 12.04 for example), autofs is tricky. All the filesystems need to be mounted on the node before singularity is launched. Also an incentive to run modern stuff.