Instructor Notes
Glossary
The following list captures terms that need to be added to this glossary. This is a great way to contribute.
Accelerator
Central Processing Unit
Cloud Computing
Cluster
Compute Node
Coupling Loose vs. Tight
Compute Unified Device Architecture
Distributed Memory
Execution Node
Flynn’s Taxonomy
Graphics Processing Unit
Grid Computing
Grid Engine
High-Performance Computing
Hyper-Threading
InfiniBand
Interconnect
Massively Parallel
Message Passing Interface
Node
Open Multi-Processing
Parallel
Serial
Server
Shared Memory
Single Instruction, Multiple Data
Single Instruction, Multiple Threads
Simple Linux Utility for Resource Management
Symmetric Multiprocessing
Simultaneous Multithreading
Supercomputer
Worker Node
Workstation
Accelerator
An accelerator in high-performance computing (HPC) is a specialized hardware component designed to offload compute-intensive and parallelizable tasks from the central processing unit, enabling higher performance, energy efficiency, and throughput, particularly for highly parallel workloads.
More info: Hardware Acceleration
See also: Graphics Processing Unit
Central Processing Unit (CPU)
A central processing unit (CPU) or simply processor is the hardware component of a computer that executes the instructions provided by software.
Most systems use multi-core processors (e.g., dual-core, quad-core, and so on), where each core is an independent execution unit. Systems may also have multiple CPUs (sockets), each containing multiple cores.
More info: CPU
Cloud Computing
Cloud computing is the on-demand delivery of computing resources such as physical or virtual servers, data storage, networking, software, and analytics over the internet, typically using a pay-per-use pricing model, enabling scalable and elastic workloads.
More info: Cloud computing
Cluster
A cluster is a collection of computers (nodes) connected via a high-speed interconnect, working together as a unified system to execute parallel workloads.
More info: Cluster
Compute Node
A compute node is a server within a cluster that is dedicated to executing computational jobs. It provides processing power (CPU/GPU), memory, and other resources required to run user workloads, typically managed by a job scheduler (e.g., Slurm).
Coupling, Loose vs. Tight
- Coupling refers to the degree of interdependence between components (e.g., processes or nodes) in a computing system, particularly in how frequently they communicate and synchronize.
- Tightly coupled systems have components that frequently communicate and share data, often with low-latency interconnects and shared memory or fast message passing.
- Loosely coupled systems have components that operate more independently, communicating less frequently, typically through higher-latency networks or asynchronous exchanges.
More info: Loosely vs. Tightly Coupled Multiprocessor System
Compute Unified Device Architecture (CUDA)
CUDA is a proprietary parallel computing platform and application programming interface that allows software to use certain types of graphics processing units for accelerated general-purpose processing, significantly broadening their utility in scientific and high-performance computing.
More info: CUDA
Distributed Memory
Distributed memory is a parallel computer architecture where each processor (node) has its own private, local memory, and nodes communicate (e.g., using MPI) by sending messages over a network interconnect.
More info: Distributed memory
Execution Node
An execution node is a node on which a job or task is actively running within a cluster environment. It is typically a compute node allocated by the scheduler to execute a specific workload.
Flynn’s Taxonomy
Flynn’s taxonomy classifies computing architectures into: - Single instruction stream, single data stream (SISD) - Single instruction stream, multiple data streams (SIMD) - Multiple instruction streams, single data stream (MISD) - Multiple instruction streams, multiple data streams (MIMD)
This taxonomy is a coarse model, as many parallel processors are hybrids of the SISD, SIMD, and MIMD classes.
More info: Flynn’s Taxonomy (SIMD, SIMT)
Graphics Processing Unit (GPU)
A graphics processing unit (GPU) is a specialized accelerator optimized for high-throughput parallel computation using many lightweight parallel cores.
More info: Graphics processing unit
Grid Computing
Grid computing is a distributed system that connects geographically dispersed computers, often aggregating heterogeneous and possibly idle resources to act as a virtual supercomputer.
More info: Grid computing
Grid Engine
Grid engine is typically used on a compute cluster or high-performance computing system and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs.
More info: Grid Engine
High-Performance Computing (HPC)
High-Performance Computing (HPC) uses clustered, interconnected computing nodes (cluster) to solve complex, data-intensive problems far beyond the capacity of standard desktop computers, often utilizing parallel processing.
More info: High-Performance Computing
Hyper-Threading
Hyper-Threading technology is a form of simultaneous-multithreading technology introduced by Intel.
Architecturally, a processor with Hyper-Threading technology consists of two logical processors per core, each of which has its own processor architectural state.
More info: Hyper-Threading(SMT)
InfiniBand
InfiniBand is a computer networking standard used in high-performance computing that features very high throughput and very low latency. It provides high-speed interconnect capabilities within and between computers (nodes).
More info: InfiniBand
Interconnect
Interconnect components are specialized hardware and communication technologies designed to provide extremely fast, low-latency and high-bandwidth communication between compute nodes, storage, and accelerators in a cluster (e.g., InfiniBand) particularly in distributed memory systems.
More info: Interconnect
Massively Parallel
The term massively parallel means using a large number of processors to simultaneously perform a set of computations in parallel.
More info: Massively Parallel
Message Passing Interface (MPI)
MPI is a standardized and portable message-passing interface used for parallel computing. It provides explicit communication, synchronization, and data exchange between processes, typically in distributed memory systems, often relying on high-performance interconnect technologies.
MPI is commonly used for communication between processes across nodes in distributed memory systems, but can also be used within a single node.
More info: Message passing interface
Open Multi-Processing (OpenMP)
OpenMP is an application programming interface which provides a model for parallel programming in shared memory systems within a single node that is portable across architectures from different vendors.
More info: OpenMP
Parallel
Parallel computing or parallel programming is a process where large compute problems are broken down into smaller problems that can be solved simultaneously by multiple processors.
More info: Parallel
Serial
Serial computing refers to a computational model where tasks are executed sequentially, one after another, on a single processing unit.
More info: Serial
Server
A server is a computer that provides resources, services, or functionality to other computers (clients) over a network.
More info: Server
Shared Memory
Shared memory is a high-performance inter-process communication mechanism that allows multiple processes to access a common memory segment directly.
More info: Shared Memory
Single Instruction, Multiple Data (SIMD)
SIMD is a computer architecture technique that enhances performance by applying one instruction to multiple data points simultaneously using specialized vector registers.
Examples: - x86_64 architectures support “SSE”, “AVX”
and “AVX-512” instructions and - aarch64 architectures
support “NEON”, “SVE” instructions.
More info: SIMD
Single Instruction, Multiple Threads (SIMT)
SIMT is a parallel execution model used by GPUs where a single instruction is applied to multiple threads. Threads are grouped (e.g., warps) and execute the same instruction in lockstep, with divergence handled through control flow masking.
More info: SIMT
Simple Linux Utility for Resource Management (Slurm)
Slurm is an open-source, fault tolerant, and highly scalable cluster management and job scheduling system for Linux clusters.
More info: Slurm
Symmetric Multiprocessing (SMP)
Symmetric Multiprocessing (SMP) involves a multiprocessor hardware and software architecture where two or more identical processors are connected to a single shared main memory with equal access to all memory and I/O devices. It is controlled by a single operating system and treats all processes equally with no single processor having privileged access.
More info: Symmetric Multiprocessing
Simultaneous Multithreading (SMT)
Simultaneous Multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading.
More info: Simultaneous Multithreading
Supercomputer
A supercomputer is a type of computer with a high level of performance as compared to general-purpose computers.
More info: Supercomputer
Worker Node
A worker node is a compute node in a cluster that executes assigned computational tasks as part of a distributed system, typically under the coordination of a scheduler or control node.
Workstation
A workstation is a special computer designed for technical or scientific applications intended to be used by a single user.
More info: Workstation
Why use a Cluster?
Connecting to a remote HPC system
Working on a remote HPC system
Scheduler Fundamentals
Environment Variables
Accessing software via Modules
Transferring files with remote computers
Using the Research Data Warehouse
Parallelising with Job Arrays.
Instructor Note
The complete script can be downloaded from: https://raw.githubusercontent.com/NewcastleRSE-Training/hpc-intro-comet/refs/heads/main/episodes/files/job_single_word-freq.sh
Instructor Note
Download the data using https://raw.githubusercontent.com/NewcastleRSE-Training/hpc-intro-comet/refs/heads/main/episodes/files/make-data.sh
Instructor Note
You can download the script from https://raw.githubusercontent.com/NewcastleRSE-Training/hpc-intro-comet/refs/heads/main/episodes/files/job_array_word-freq.sh
Running a parallel job (alternative episode)
Instructor Note
Ideally the code for this episode should be pre-compiled and made available for students to download. We have found that expecting students to write or even compile code causes information overload and confusion.
The code and scripts to compile can be downloaded from https://github.com/NewcastleRSE-Training/HPC_Training_Example_Jobs. After compiling the two versions of the program, copy it to a place where students can copy or download it from.
The binaries of the two programs will be very small so the fact that there would be duplication if all the students copy the binaries to their own working directories should not really matter. In doing it this way, students will also get the opportunity to submit a job where the program they are using is in their local directory (rather than loading a module).