High Performance Computing

The College of Engineering deployed a new High Performance Computing Cluster (HPC2) in Fall 2020.

HPC2

HPC2 technical details can be found at the HPC Core Facility website. If your research group is not part of the HPC2 Cluster and you would like to join, please send an email to coeithelp@ucdavis.edu so that we can discuss access.

Logging In and Getting Help

Access to HPC2 requires an account for the cluster, which can be requested through the HPC Core Facilities Account Request Form. The form will require login though the Central Authentication Service with your campus credentials.

After login, the site may inform you that you do not have an account on the website. The error message can be disregarded and you should be able to access the form after logging into CAS.

Login to HPC2 requires an SSH keypair. The HPC section of UCD's knowledge base has instructions for how to create an SSH keypair. The Account Request Form will ask for your public key. Keep your private key secret and safely stored. If it is lost you will have to create a new pair and redo the account creation process.

After your account is created, you will be able to log in to the HPC2 head node by using SSH to connect to hpc2.engr.ucdavis.edu.

HPC jobs are intended to be submitted via Slurm, the cluster's job scheduler. Slurm will automatically allocate resources for your job based upon the resources you've requested and the resources dedicated to your research group. Information on using Slurm and HPC in general can be found on the HPC section of UCD's knowledgebase and on the training materials and frequently asked questions pages of HPCCF's website.

The HPC Core Facility provides support for HPC2 software and hardware. If you experience technical issues or need particular software installed, please contact them at hpc-help@ucdavis.edu. Other inquires can be directed to coeithelp@ucdavis.edu.

HPC 2 Participating Research Groups

Biomedical Engineering
Jinyi Qi / Laura Marcu (NCIBT)

Chemical Engineering
Ambarish Kulkarni
Roland Faller
Shirley Ahn

Civil and Environmental Engineering
Geoff Schladow
Jonathan Herman
Fabian Bombardelli

Computer Science
Randall O’Reilly

Electrical and Computer Engineering
Avesta Sasan
Junshan Zhang
Anh-Vu Pham
Neville Luhmann
Soheil Ghiasi
Zhi Ding
Chen-Nee Chuah

Materials Science and Engineering
Jeremy Mason

Mechanical and Aerospace Engineering
JP Delplanque
Seongkyu Lee

Center for Neuroscience
Charan Ranganath

HPC1

The College of Engineering High Performance Computing Cluster (HPC1) contains 60 compute nodes and central storage, all connected on Infiniband networking. Each node contains 64GB of RAM shared by two CPU sockets, each with an 8-core CPU running at 2.4GHz. Central storage is managed by redundant storage servers, with 200 TB of usable storage evenly allocated to researchers. The storage is for temporary computation and is not backed up or duplicated in any way except that it is configured as RAID6 so can withstand up-to-two simultaneous hard drive failures.

Jobs are managed by the SLURM queue manager. Access to the cluster can be granted only to the participating professors and their research groups. If you qualify, enter your access application information here and your professor will be contacted to confirm your access.

Documentation on submitting jobs and other helpful links can be found here.

History

The cluster was built as a shared resource by participating College of Engineering professors with the understanding that the professors and their affiliated research groups will have complete and instantaneous access to the cluster nodes that they purchased. To illustrate, if a professor purchased five nodes of the cluster and wants to immediately run a job on those five nodes, any jobs currently running on those nodes will be immediately stopped and put back into the input queue and the professor’s job will run immediately. If the professor needs more resources than his original purchase (say 10 nodes), he can start a job requesting those resources and may be bumped if the owner of those other nodes requires them.

The compute node configuration is a 1U Dell PowerEdge R630 server with:

Two Intel E5-2630 v3 2.4GHz CPU’s with eight cores (16 threads) each
64GB of RDIMM RAM
Intel QDR InfiniBand network adapter (30 gigabit, low latency)
1 gigabit Ethernet network adaptor
1 1TB 7200RPM hard drive
10 Gigabit uplink to campus network backbone

Central storage is allocated based on the number of nodes purchased by a PI/Research group at 4TB per node. If 4 nodes are purchased 16TB storage will be allocated to the group. Storage can be expanded if additional nodes are purchased later.

Several compute nodes have the same internal configuration but are Dell PowerEdge R730 2U servers to accommodate the future use of two GPU cards.

HPC 1 Participating Research Groups

Biomedical Engineering
Sharon Aviran
Craig Benham
Yong Duan
Jinyi Qi
Leonor Saiz
Cheemeng Tan

Chemical Engineering
Jennifer Sinclair Curtis
Roland Faller
Ambarish Kulkarni

Civil and Environmental Engineering
Yueyue Fan
Jonathan Herman
Bassam Younis

Computer Science
Computer Science Department
Dipak Ghosal
Yong Jae Lee

Materials Science and Engineering
Jeremy Mason

Mechanical and Aerospace Engineering
Roger Davis
JP Delplanque
Seongkyu Lee