WhatsApp%20Image%202020-06-23%20at%2011.

BioCRF HPC Cluster

Hardware

1. Compute Nodes

Model: HPE ProLiant XL170r Gen10

Number of node: 36
2 x Intel Xeon Gold 6230 (20-core/2.1 GHz/28MB cache) processor
12 x 32GB (384GB) DDR4-2933 RAM
1 x 1.2TB 12Gb/s 10K rpm SAS HDD
1 x single-port/dual-port 10Gbps ethernet network card
1/2 x 1Gb Ethernet adapter and 1 dedicated management port

2. Master Node

Model: HPE ProLiant DL380 Gen10

Number of node: 1
2 x Intel Xeon Silver 4210 (10-core/2.2 GHz/14MB cache) processor
64GB DDR4-2400 RAM
2 x 1TB 12Gbps NL-SAS 7.2K rpm HDD in RAID 1 configuration as the system disk
1 x dual-port 10Gbps ethernet network card
2 x 1Gb Ethernet adapter and 1 dedicated management port

3. Storage Servers

Model: HPE Apollo 4510 Gen10

Number of node: 2
2 x Intel Xeon Silver 4214 (12-core/2.2 GHz/17MB cache) processor
128GB DDR4-2400 RAM
2 x 1TB 12Gbps NL-SAS 7.2K rpm HDD in RAID 1 configuration as the system disk
48 x 12TB 7.2K rpm SAS 12Gbps in RAID 60 configuration as the data disk
Each server provides around 437 TB usable disk space

Software

1. System

Operating system: Rocks Cluster 7 based on CentOS 7.7 64-bit
Resource management and job scheduling system: Simple Linux Utility for Resource Management (SLURM) Version 19.05.4

2. Application:

Bioconda based on Miniconda3 v4.8.2 and Miniconda2 v4.7.12
Location: /opt/miniconda2 and /opt/miniconda3
References:
https://bioconda.github.io/
https://docs.conda.io/projects/conda/en/latest/user-guide/index.html

To use python 3.x version,
source /usr/local/setup/miniconda3.sh

To use python 2.x version,
source /usr/local/setup/miniconda2.sh

In general, users can install software in their own home directory or group share directory.

Please note that users are responsible for the licenses and copyright of the software they install in the cluster.

Access the cluster

To access the cluster, user has to login the master node, biocrfhpc.ust.hk, via Secure Shell (SSH) from campus wired network or campus WiFi eduroam (VPN will be required when access outside of campus).

The username and credential are same as ITSC network account.

Data can be uploaded to the cluster using sftp.

In this master node, user can compile programs/applications and submit them to run in most of the compute nodes with the batch queuing system SLURM (Simple Linux Utility for Resource Management).

Note: The master node is for compilation and simple tasks and is shared among all users in the cluster. Please do NOT run your application in the master node and that might affect other users and the system reliability.

Job Execution

The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM. You need to submit your job or application to SLURM with the job script.

When there are not enough idle compute nodes/cores for a submitted job, the job will be set into pending state until enough resource is available.

Partitions (queues) are logical sets of compute nodes. The cluster partition and nodes are summarized as follows.

PARTITION	NO. OF NODES	CPU PER NODE	MEMORY PER NODE
q1	36	2 x Intel Xeon Gold 6230 (20-core)	384GB

Useful commands in SLURM:

Following are the most common operations:

Purpose	Command
To check what queues (partitions) are available:	sinfo
To submit job:	sbatch <your_job_script>
To view the queue status:	squeue
To view the queue status of your job:	squeue -u $USER
To cancel a running or pending job:	scancel <your_slurm_jobid>

Job Script Sample:

The following sample SLURM script “sjob1.txt” is for your reference.

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
# while those starting with "#" and "##SBATCH" are comments. Uncomment
# "##SBATCH" line means to remove one # and start with #SBATCH to be a
# SLURM command or statement.

#SBATCH -J slurm_job #Slurm job name

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end

# Use 1 node and 40 cores
#SBATCH -N 1 -n 40

# Setup runtime environment if necessary
/usr/local/setup/miniconda3.sh
source activate my_env
# or you can source ~/.bashrc or ~/.bash_profile

# Go to the job submission directory and run your application
cd $HOME/apps/slurm
./your_application

The standard output of the job will be saved as “slurm-<your_slurm_jobid>.out” at the job submission directory.

For the $SBATCH options, please consult the man page.

Disk Quota and Usage

1. Accounting

The cluster account is organized with the principal investigator (PI) or the group leader of a research team. Each member of the team has an individual user account under PI's group to access the cluster and run jobs on the partitions (queues) with SLURM. With this accounting scheme, the system can impose resource limits (usage quota) on different partitions for different groups of users.

2. Resource Limits

Compute node has processors, memory, swap and local disk as resources. Our cluster resource allocation is based on CPU core only. No core can run more than one job in a partition at a time. In case one needs to use several nodes exclusively for a job, user can specify exclusive option in the slurm script. The resource limits on partitions are imposed on PI group as a whole. This implies that individual users in the same group share the quota limit.

3. Job Submission Policy

Partitions (queues) are logical sets of compute nodes. There is only one partition called q1 in the cluster. Some PI groups have higher priority to use the partition, which means the jobs from those PI group has higher priority to run when there are not enough idle nodes.

Policy for job submission is as follows.

PARTITION	NO. OF NODES	CPU CORES/ NODE	MEMORY/ NODE	GrpNodes	GrpJobs	GrpSubmit	MaxWallTime
q1	36	40	384GB	12 (equivalent to 480 cores)	12	12	7 days

4. Disk Quota

The default disk quota for each PI group is 30TB and it is shared among all members of the group. User can apply for more disk space if needed.

To check the disk usage of your own home directory:

du -sh $HOME

To check the disk usage and quota of your group:

quota -gs <your_group_name>

5. Group Share Directory

A share directory is assigned to each PI group. Users from the same group can access, create and modify files in the share directory.

To access the share directory:

cd $PI_HOME

Note that the group disk quota is also applied to the share directory.

6. Backup

There is NO backup service on the cluster and user is required to manage the backup of the data themselves.

Hardware

Sotware

Access the cluster

Job execution

Disk quota and usage

Cluster Status

We provide a visualization tool called Ganglia to report the real time resource usages of the cluster:

https://biocrfhpc.ust.hk/ganglia/?c=biocrfhpc.ust.hk

Support

If you have questions on SLURM and system, please email to hpcadmin@ust.hk with subject line starting with “[biocrfhpc]” for assistance.

Cluster status

Support