1. User guide¶

This is intended to be read by typical users following our recommended usage. While it is written for general audiences, some pointers specific to NERSC users will be given to adapt their pipelines at NERSC to the SO:UK Data Centre.

We will start by pointing out main differences between NERSC and SO:UK Data Centre that will have important implications to how to deploy workflows here.

Facility	NERSC	SO:UK Data Centre
Nature	HPC	HTC
Configuration	Homogeneous within a pool	Heterogeneous by default
Workload Manager	SLURM	HTCondor
Job Classification Model	Different QoS can be selected, such as debug, interactive, regular, premium, etc., categorized by priority and charge factors. They shares exactly the same software environments.	Different “universe” can be selected, like vanilla, parallel, docker, container, etc., based on software environments and job launch methods. Universes are mutually exclusive and hence a job cannot be configured to multiple universes simultaneously. Interactive job is only available in vanilla universe.
Login Node Designation	Login nodes reachable via ssh with 2-factor authentication. Passwordless login can be achieved by using `sshproxy` service to create temporary ssh keys.	Called Submit Node in HTCondor. Tentatively, a special login node named `vm77` is reachable via ssh. Users are required to submit ssh keys to maintainer, and is passwordless by default.
Compute Node Designation	Compute nodes	Worker nodes
Home Directory	Globally mounted home directory, backed up periodically	Not available on worker nodes
Archive Filesystem	HPSS	Not available
Scratch Filesystem	Parallel distributed file system (LUSTRE) with all SSD. Purged once every few months.	Local to each worker node. Data doesn’t persist post job completion.
Software Distribution Filesystem	Read-only global common	Read-only CVMFS
Large Storage Pool	CFS with a filesystem interface	Grid storage system without a filesystem interface
Job Configuration	SLURM directives within the batch script	ClassAd in a separate, ini-like format
Wallclock Time	Must be specified in job configuration	Not applicable
Sharing Physical Nodes	Requested via interactive QoS	Always shared by default
Exclusive Physical Node Allocation	Requested via regular QoS	Not applicable
Utilizing Multiple Nodes	Available by default	Must specify parallel universe in ClassAd
Priority	Different levels permitted with various charge factors and restrictions	Not applicable
Fair-Share System	Fixed amount of NERSC hours allocated to be used within an allocation year. Proposal required to request and is renewed on a year-to-year basis.	More flexible with no strict quota limit
MPI Support	Native	Parallel universe is not exclusively for MPI. We maintain custom wrappers to start MPI processes within the parallel universe.
Container Support	Officially supported	Only officially supported in the docker/container universe. Jobs cannot belong to both a container universe and a parallel universe.