BioCRF HPC Cluster
Hardware
1. Compute Nodes
Model: HPE ProLiant XL170r Gen10
-
Number of node: 36
-
2 x Intel Xeon Gold 6230 (20-core/2.1 GHz/28MB cache) processor
-
12 x 32GB (384GB) DDR4-2933 RAM
-
1 x 1.2TB 12Gb/s 10K rpm SAS HDD
-
1 x single-port/dual-port 10Gbps ethernet network card
-
1/2 x 1Gb Ethernet adapter and 1 dedicated management port
2. Master Node
Model: HPE ProLiant DL380 Gen10
-
Number of node: 1
-
2 x Intel Xeon Silver 4210 (10-core/2.2 GHz/14MB cache) processor
-
64GB DDR4-2400 RAM
-
2 x 1TB 12Gbps NL-SAS 7.2K rpm HDD in RAID 1 configuration as the system disk
-
1 x dual-port 10Gbps ethernet network card
-
2 x 1Gb Ethernet adapter and 1 dedicated management port
3. Storage Servers
Model: HPE Apollo 4510 Gen10
-
Number of node: 2
-
2 x Intel Xeon Silver 4214 (12-core/2.2 GHz/17MB cache) processor
-
128GB DDR4-2400 RAM
-
2 x 1TB 12Gbps NL-SAS 7.2K rpm HDD in RAID 1 configuration as the system disk
-
48 x 12TB 7.2K rpm SAS 12Gbps in RAID 60 configuration as the data disk
-
Each server provides around 437 TB usable disk space
Software
1. System
​
-
Operating system: Rocks Cluster 7 based on CentOS 7.7 64-bit
-
Resource management and job scheduling system: Simple Linux Utility for Resource Management (SLURM) Version 19.05.4
​
2. Application:
​
-
Bioconda based on Miniconda3 v4.8.2 and Miniconda2 v4.7.12
-
Location: /opt/miniconda2 and /opt/miniconda3
-
References:
https://bioconda.github.io/
https://docs.conda.io/projects/conda/en/latest/user-guide/index.html
To use python 3.x version,
source /usr/local/setup/miniconda3.sh
To use python 2.x version,
source /usr/local/setup/miniconda2.sh
In general, users can install software in their own home directory or group share directory.
Please note that users are responsible for the licenses and copyright of the software they install in the cluster.
Access the cluster
To access the cluster, user has to login the master node, biocrfhpc.ust.hk, via Secure Shell (SSH) from campus wired network or campus WiFi eduroam (VPN will be required when access outside of campus).
The username and credential are same as ITSC network account.
Data can be uploaded to the cluster using sftp.
In this master node, user can compile programs/applications and submit them to run in most of the compute nodes with the batch queuing system SLURM (Simple Linux Utility for Resource Management).
Note: The master node is for compilation and simple tasks and is shared among all users in the cluster. Please do NOT run your application in the master node and that might affect other users and the system reliability.
Job Execution
The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM. You need to submit your job or application to SLURM with the job script.
When there are not enough idle compute nodes/cores for a submitted job, the job will be set into pending state until enough resource is available.
Partitions (queues) are logical sets of compute nodes. The cluster partition and nodes are summarized as follows.
PARTITION | NO. OF NODES | CPU PER NODE | MEMORY PER NODE |
---|---|---|---|
q1 | 36 | 2 x Intel Xeon Gold 6230 (20-core) | 384GB |
Useful commands in SLURM:
​
Following are the most common operations:
Purpose | Command |
---|---|
To check what queues (partitions) are available: | sinfo |
To submit job: | sbatch <your_job_script> |
To view the queue status: | squeue |
To view the queue status of your job: | squeue -u $USER |
To cancel a running or pending job: | scancel <your_slurm_jobid> |
Job Script Sample:
​
The following sample SLURM script “sjob1.txt” is for your reference.
​
#!/bin/bash
​
# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
# while those starting with "#" and "##SBATCH" are comments. Uncomment
# "##SBATCH" line means to remove one # and start with #SBATCH to be a
# SLURM command or statement.
​
#SBATCH -J slurm_job #Slurm job name
​
# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end
​
# Use 1 node and 40 cores
#SBATCH -N 1 -n 40
​
# Setup runtime environment if necessary
/usr/local/setup/miniconda3.sh
source activate my_env
# or you can source ~/.bashrc or ~/.bash_profile
​
# Go to the job submission directory and run your application
cd $HOME/apps/slurm
./your_application
​
The standard output of the job will be saved as “slurm-<your_slurm_jobid>.out” at the job submission directory.
For the $SBATCH options, please consult the man page.
Disk Quota and Usage
1. Accounting
The cluster account is organized with the principal investigator (PI) or the group leader of a research team. Each member of the team has an individual user account under PI's group to access the cluster and run jobs on the partitions (queues) with SLURM. With this accounting scheme, the system can impose resource limits (usage quota) on different partitions for different groups of users.
​
​
2. Resource Limits
Compute node has processors, memory, swap and local disk as resources. Our cluster resource allocation is based on CPU core only. No core can run more than one job in a partition at a time. In case one needs to use several nodes exclusively for a job, user can specify exclusive option in the slurm script. The resource limits on partitions are imposed on PI group as a whole. This implies that individual users in the same group share the quota limit.
​
​
3. Job Submission Policy
Partitions (queues) are logical sets of compute nodes. There is only one partition called q1 in the cluster. Some PI groups have higher priority to use the partition, which means the jobs from those PI group has higher priority to run when there are not enough idle nodes.
Policy for job submission is as follows.
PARTITION | NO. OF NODES | CPU CORES/ NODE | MEMORY/ NODE | GrpNodes | GrpJobs | GrpSubmit | MaxWallTime |
---|---|---|---|---|---|---|---|
q1 | 36 | 40 | 384GB | 12 (equivalent to 480 cores) | 12 | 12 | 7 days |
4. Disk Quota
The default disk quota for each PI group is 30TB and it is shared among all members of the group. User can apply for more disk space if needed.
To check the disk usage of your own home directory:
du -sh $HOME
To check the disk usage and quota of your group:
quota -gs <your_group_name>
​
​
5. Group Share Directory
A share directory is assigned to each PI group. Users from the same group can access, create and modify files in the share directory.
To access the share directory:
cd $PI_HOME
​
Note that the group disk quota is also applied to the share directory.
​
​
6. Backup
There is NO backup service on the cluster and user is required to manage the backup of the data themselves.
Cluster Status
We provide a visualization tool called Ganglia to report the real time resource usages of the cluster:
Support
If you have questions on SLURM and system, please email to hpcadmin@ust.hk with subject line starting with “[biocrfhpc]” for assistance.