Why use a Cluster?
- High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
- These other systems can be used to do work that would either be impossible or much slower on smaller systems.
- HPC resources are shared by multiple users.
- The standard method of interacting with such systems is via a command line interface.
Connecting to a remote HPC system
- An HPC system is a set of networked machines.
- HPC systems typically provide login nodes and a set of worker nodes.
- The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).
- Files saved on one node are available on all nodes.
Working on a remote HPC system
- “An HPC system is a set of networked machines.”
- “HPC systems typically provide login nodes and a set of worker nodes.”
- “The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).”
- “Files saved on one node are available on all nodes.”
Scheduler Fundamentals
- The scheduler handles how compute resources are shared between users.
- A job is just a shell script.
- Request slightly more resources than you will need.
Environment Variables
- Shell variables are by default treated as strings
- Variables are assigned using “
=” and recalled using the variable’s name prefixed by “$” - Use “
export” to make an variable available to other programs - The
PATHvariable defines the shell’s search path
Accessing software via Modules
- Load software with
module load softwareName. - Unload software with
module unload - The module system handles software versioning and package conflicts for you automatically.
Transferring files with remote computers
-
wgetandcurl -Odownload a file from the internet. -
scpandrsynctransfer files to and from your computer. - You can use an SFTP client like FileZilla to transfer files through a GUI.
Using the Research Data Warehouse
-
cpandrsynctransfer files between RDW and HPC. - try a dry-run of rsync to avoid accidental duplications or deletions
- re-run large rsync commands to confirm success
- output to a log to keep a record
- group permissions are pre-set on RDW can’t be changed from linux
- RDW shares should have a pre-set ‘read’ and ‘modify’ group of campus users
- ?? files on RDW are owned by the user who puts them there
Running a parallel job
- Parallel programming allows applications to take advantage of parallel hardware.
- The queuing system facilitates executing parallel tasks.
- Performance improvements from parallel execution do not scale linearly.
Parallelising with Job Arrays.
- Stuff
Using resources effectively
- Accurate job scripts help the queuing system efficiently allocate shared resources.
Using shared resources responsibly
- Be careful how you use the login node.
- Your data on the system is your responsibility.
- Plan and test large data transfers.
- It is often best to convert many files to a single archive file before transferring.