User Guide Contents
Overview
Beagle3 is the newest addition to RCC's high end HPC systems. Beagle3 is primarily a GPU (Graphics Processing Unit) cluster. This new HPC system will facilitate a select user group from the University of Chicago community to benefit from advanced computing infrastructure that enables novel discoveries and innovations.
Currently, two login nodes are available for Beagle3. The key features of the Beagle3 hardware are listed as follows:
- 44 GPU compute nodes (a total of 176 GPUs)
- 22 nodes with 4x Nvidia A100 GPUs
- 22 nodes with 4x Nvidia A40 GPUs
- 4 Big Shared memory nodes with 512GB of memory per node and no GPUs ("beagle3-bigmemX")
- All nodes have HDR InfiniBand (100 Gbps) network cards.
- 1 PB of usable high-capacity GPFS space
Specifications
The specifications of the compute node on the Beagle3 platform are as follows:
A40 and A100 Specifications
| A40 | A100 | |
|---|---|---|
| CUDA Cores | 10,752 | 6,912 |
| Tensor Cores | 336 | 432 |
| RT Cores | 84 | - |
| GPU Memory | 48 GB | 40 GB |
| GPU Memory Bandwidth | 696 GB/s | 1,555 GB/s |
| Interconnect | 64 GB/s | 64 GB/s |
| Peak FP32 TFLOPS | 37.4 | 19.5 |
| Peak FP64 TFLOPS | - | 9.7 |
| Peak TF32 TFLOPS | 74.8 | 156 |
Choosing between A40 and A100
The A40 has more CUDA cores and higher single precision floating point (FP32) performance. The A100 has more Tensor cores and higher tensor plating point (TF32) performance, making it more suitable for machine learning tasks. The A40 has no double-precision (FP64) hardware acceleration, so only the A100 should be used for applications that need to utilize double precision.
The CPU specifications of the compute node on the Beagle3 platform are as follows:
Architecture: x86_64
CPU(s): 32
Thread(s) per core: 1
Core(s) per socket: 16
Sockets: 2
Model name: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz
CPU MHz: 3600.000
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 36864K
Access
Currently, connecting to the login nodes is possible via ssh from a terminal:
ssh [CNetID]@beagle3.rcc.uchicago.edu
Using ThinLinc to connect to the login nodes is possible as well. Please follow the instructions.
For connecting via ThinLinc using a web browser, the web address should be
beagle3.rcc.uchicago.edu
When using the desktop client, the server is beagle3.rcc.uchicago.edu .
To start an interactive session on the Beagle3 compute nodes, the following command is necessary:
sinteractive --partition=beagle3 --constraint=a100 --account=pi-<group>
Filesystems
A user using Beagle3 will be also able to seamlessly access their storage (scratch and capacity) on Midway3 and vice versa. Individual Scratch Space on Beagle3 is available at /scratch/beagle3 with a total of 200TB storage space. The Soft-quota per user is 400GB and the hard limit is 1TB. The user scratch folder is meant to be the target for your compute job's I/O. The scratch file system was configured to have a larger block size of 16 MB, which means that continuous reading/writing of chunks of data will be more performant with this file system. The global parallel scratch space is accessible from all compute and login nodes.
The /home directory and the /beagle directory are both part of the capacity file system.
- Home ( /home/$USER ): This is the user's home directory.
- beagle3 ( /beagle3/pi-): This is the group shared capacity storage file system space that is accessible to all group members of the pi- Unix group. Just as with running jobs on Midway, /beagle3 should be treated as a location for users to store data they intend to keep.
NOTE
- Snapshots are available for the
/homeand/beagle3filesystems. 7 daily and 4 weekly snapshots are kept for each filesystem. They are accessible at the path/gpfs3/cap/.snapshots/for /home and/gpfs4/cap/.snapshots/for /beagle3 on Beagle3. There are NO snapshots available for the/scratchfile system. There is also no tape backup.
The parallel storage is partitioned into two file systems that have different file block size configurations.
Storage Quotas
The following table lists the file systems and their default quotas.
Default storage quotas:
| File Set | Path | Soft Quota | Hard Quota |
|---|---|---|---|
| Home | /home/$USER | 30GB | 35GB |
| Scratch | /scratch/beagle3/$USER | 400GB | 1TB |
| Project | /project/pi- | 5TB | 6TB |
Local Scratch Directory ( /scratch/local )
There is also a scratch space that resides on the local SSD of each node and can only be used for jobs that does not require distributed parallel I/O. The capacity of the local SSD is 960GB, but the actual amount of usable space will be less than this and may depend on the usage of other users utilizing the same node if your job resource request does not give you exclusive access to a node. There is presently no SLURM post script to clean up the local scratch, so please be mindful to clean up this space after any job.
It is recommended that users use the local scratch space if they have high throughput I/O of many small files ( size < 4 MB) for jobs that are not distributed across multiple nodes.
Software
Both GNU and Intel compilers are available on Beagle3. Several math libraries such as fftw are also included. For a complete list, please use:
module avail
Software Modules
Beagle3 uses the same operating system CentoOS as Midway3. Beagle3 will also share the same software module stack with Midway3.
Similar to the Midway3 platform, the Beagle3 platform uses Environment modules to manage software packages.
Compiling Software
The Midway3/Beagle3 software module system provides a number of compilers. For compiling C/C++ code, we recommend either the GNU compiler collection gcc, or the Intel C/C++ compiler icc. To use gcc, you can either run module load gcc/<version>, or use the default system-wide gcc installation. To use the Intel compilers, run module load icc. References for both of these compilers can be found in their documentation pages:
- https://gcc.gnu.org/onlinedocs/
- https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html
See module avail for a complete list of modules, including compiler modules.
If there is any compiler or other library missing in the Midway3 module environment that is required to build your software, please send a request to help@rcc.uchicago.edu with the details.
Partition
There is only one partition for the entire Beagle3 cluster, called beagle3. This partition includes 2 types of GPU (A100 and A40) and 4 big shared memory nodes (beagle3-bigmemX). The specifications of the three kinds of nodes are given in the table.
Nodes on the Beagle3 platform:
| Node type | Num. of Nodes | Node Specifications |
|---|---|---|
| A100 | 22 | 32 cores, 256GB memory, 960GB SSD |
| A40 | 22 | 32 cores, 256GB memory, 960GB SSD |
| Big memory | 4 | 512GB memory, 960GB SSD, no GPUs |
Users can specify which kind of node they want by using the --constraint flag for the SLURM scheduler:
-
--constraint=a40jobs will only run on an A40 node -
--constraint=a100jobs will only run on an A100 node -
--constraint=256gjobs will run on either an A40 or an A100 node, but not a big-memory node ("beagle3-bigmemX") -
--constraint=512gjobs will only run on a big-memory node
It may be noted that beagle3-00[01-22] are all A100 nodes and beagle3-[0022-44] are all A40. The command sinfo gives information about the existing status of nodes.
Eg.
sinfo -p beagle3 -O 'partition,available,nodes,features,statecompact,nodelist'
PARTITION AVAIL NODES AVAIL_FEATURES STATE NODELIST
beagle3 up 1 gold-6346,256g,a100 down* beagle3-0010
beagle3 up 1 gold-6346,256g,a40 mix beagle3-0023
beagle3 up 21 gold-6346,256g,a100 idle beagle3-[0001-0009,0
beagle3 up 21 gold-6346,256g,a40 idle beagle3-[0024-0044]
beagle3 up 4 gold-6346,512g idle beagle3-bigmem[1-4]
QoS Policies
There are two quality-of-service (QoS) options available on the beagle3 partition. You can specify either one by using the --qos flag in your sbatch scripts or sinteractive commands.
-
--qos=beagle3: This QoS allows you to request up to 512 CPU-cores and 64 GPUs, and a maximum wall time of 48 hours. It is the default QoS for thebeagle3partition. -
--qos=beagle3-long: This QoS allows you to request up to 128 CPU-cores and 16 GPUs, and a maximum wall time of 96 hours.
Jobs
Jobs can be run on the compute nodes on Beagle3 via slurm, the same way it was done on Midway3. Jobs submitted should explicitly specify --account=pi-<group> in their job submission script or interactive invocation.
An example sbatch job submission script for submitting a single core job to the standard compute partition is given below:
#!/bin/bash
#SBATCH --job-name=mnist
#SBATCH --output=out.txt
##SBATCH --time=01:00:00
#SBATCH --time=00:05:00
#SBATCH --partition=beagle3
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --gres=gpu:1
#SBATCH --constraint=a100
#SBATCH --account=pi-<group>
module load python/anaconda-2021.05
module load cudnn/11.2
python mnist_convnet.py
Help
Contact the RCC at help@rcc.uchicago.edu