WhatsApp%20Image%202020-06-23%20at%2011.

BioCRF HPC Cluster

Hardware

1. Compute Nodes
 

Model: HPE ProLiant XL170r Gen10

 

  • Number of node: 36

  • 2 x Intel Xeon Gold 6230 (20-core/2.1 GHz/28MB cache) processor

  • 12 x 32GB (384GB) DDR4-2933 RAM

  • 1 x 1.2TB 12Gb/s 10K rpm SAS HDD

  • 1 x single-port/dual-port 10Gbps ethernet network card

  • 1/2 x 1Gb Ethernet adapter and 1 dedicated management port

 


2. Master Node
 

Model: HPE ProLiant DL380 Gen10

 

  • Number of node: 1

  • 2 x Intel Xeon Silver 4210 (10-core/2.2 GHz/14MB cache) processor

  • 64GB DDR4-2400 RAM

  • 2 x 1TB 12Gbps NL-SAS 7.2K rpm HDD in RAID 1 configuration as the system disk

  • 1 x dual-port 10Gbps ethernet network card

  • 2 x 1Gb Ethernet adapter and 1 dedicated management port

 


3. Storage Servers


Model: HPE Apollo 4510 Gen10

 

  • Number of node: 2

  • 2 x Intel Xeon Silver 4214 (12-core/2.2 GHz/17MB cache) processor

  • 128GB DDR4-2400 RAM

  • 2 x 1TB 12Gbps NL-SAS 7.2K rpm HDD in RAID 1 configuration as the system disk

  • 48 x 12TB 7.2K rpm SAS 12Gbps in RAID 60 configuration as the data disk

  • Each server provides around 437 TB usable disk space

Software

1. System


2. Application:


To use python 3.x version,
source /usr/local/setup/miniconda3.sh

 

To use python 2.x version,
source /usr/local/setup/miniconda2.sh


In general, users can install software in their own home directory or group share directory.


Please note that users are responsible for the licenses and copyright of the software they install in the cluster.

Access the cluster

To access the cluster, user has to login the master node, biocrfhpc.ust.hk, via Secure Shell (SSH) from campus wired network or campus WiFi eduroam (VPN will be required when access outside of campus).


The username and credential are same as ITSC network account.


Data can be uploaded to the cluster using sftp.


In this master node, user can compile programs/applications and submit them to run in most of the compute nodes with the batch queuing system SLURM (Simple Linux Utility for Resource Management).
 

Note: The master node is for compilation and simple tasks and is shared among all users in the cluster. Please do NOT run your application in the master node and that might affect other users and the system reliability.

Job Execution

The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM. You need to submit your job or application to SLURM with the job script.


When there are not enough idle compute nodes/cores for a submitted job, the job will be set into pending state until enough resource is available.


Partitions (queues) are logical sets of compute nodes. The cluster partition and nodes are summarized as follows.

Useful commands in SLURM:

Following are the most common operations:

Job Script Sample:

The following sample SLURM script “sjob1.txt” is for your reference.

 

#!/bin/bash

# NOTE: Lines starting with "#SBATCH" are valid SLURM commands or statements,
# while those starting with "#" and "##SBATCH" are comments. Uncomment
# "##SBATCH" line means to remove one # and start with #SBATCH to be a
# SLURM command or statement.

#SBATCH -J slurm_job #Slurm job name

# Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end

# Use 1 node and 40 cores
#SBATCH -N 1 -n 40

# Setup runtime environment if necessary
/usr/local/setup/miniconda3.sh
source activate my_env
# or you can source ~/.bashrc or ~/.bash_profile

# Go to the job submission directory and run your application
cd $HOME/apps/slurm
./your_application

The standard output of the job will be saved as “slurm-<your_slurm_jobid>.out” at the job submission directory.
 

For the $SBATCH options, please consult the man page.

Disk Quota and Usage

1. Accounting


The cluster account is organized with the principal investigator (PI) or the group leader of a research team. Each member of the team has an individual user account under PI's group to access the cluster and run jobs on the partitions (queues) with SLURM. With this accounting scheme, the system can impose resource limits (usage quota) on different partitions for different groups of users.

2. Resource Limits


Compute node has processors, memory, swap and local disk as resources. Our cluster resource allocation is based on CPU core only. No core can run more than one job in a partition at a time. In case one needs to use several nodes exclusively for a job, user can specify exclusive option in the slurm script. The resource limits on partitions are imposed on PI group as a whole. This implies that individual users in the same group share the quota limit.

3. Job Submission Policy


Partitions (queues) are logical sets of compute nodes. There is only one partition called q1 in the cluster. Some PI groups have higher priority to use the partition, which means the jobs from those PI group has higher priority to run when there are not enough idle nodes.


Policy for job submission is as follows.

4. Disk Quota
 

The default disk quota for each PI group is 30TB and it is shared among all members of the group. User can apply for more disk space if needed.


To check the disk usage of your own home directory:


du -sh $HOME


To check the disk usage and quota of your group:


quota -gs <your_group_name>

5. Group Share Directory


A share directory is assigned to each PI group. Users from the same group can access, create and modify files in the share directory.


To access the share directory:


cd $PI_HOME

Note that the group disk quota is also applied to the share directory.

6. Backup


There is NO backup service on the cluster and user is required to manage the backup of the data themselves.

 
 
 
 
 

Cluster Status

We provide a visualization tool called Ganglia to report the real time resource usages of the cluster:

https://biocrfhpc.ust.hk/ganglia/?c=biocrfhpc.ust.hk

Support

If you have questions on SLURM and system, please email to hpcadmin@ust.hk with subject line starting with “[biocrfhpc]” for assistance.

 
 

Copyright © Biosciences Central Research Facility at Hong Kong University of Science and TechnologyAll rights reserved.