CSE 443/543 High Performance Computing (3 credits)

Catalog description:

Introduction to the practical use of multi-processor workstations and supercomputing clusters. Developing and using parallel programs for solving computationally intensive problems. The course builds on basic concepts of programming and problem solving.

Prerequisite:

CSE 278 or permission of instructor.

Required topics (approximate weeks allocated):

  • Introduction to parallel programming and high performance distributed computing (HPDC) (1)
    • Motivation for HPDC
    • Review of parallel programs and platforms
    • Implicit parallelism and limitations of instruction level parallelism (ILP)
    • Survey of architecture of commonly used HPDC platforms
  • Concurrency and parallelism (1)
    • Introduction to concurrency & Parallelism
    • Levels of parallelism
    • Instruction level parallelism
    • SIMD versus MIMD
  • Review of C programming language and the Linux environment (1.5)
    • Review of basic programming constructs
    • Applying Java/C++ syntax and semantics to C language
    • Introduction to problem solving using the C language
    • Introduction to Linux
    • C programming using Linux
    • C structures
  • Exploring instruction level parallelism (1)
    • Review of instruction level parallelism and sources of hazards
    • Concepts of hazard elimination via code restructuring (dependency reduction, loop unrolling)
    • Timing and statistical comparison of performance of c programs
  • Introduction to parallel programming (2)
    • Principles of parallel algorithms
    • Effects of synchronization and communication latencies
    • Overview of physical and logical communication topologies
    • Using MPE for parallel graphical visualization (parallel libraries)
  • Introduction to message passing paradigm (.5)
    • Principles of message-passing programming
    • The building blocks of message passing
  • Programming in MPI (3)
    • Introduction to MPI: The Message Passing Interface
    • MPI Fundamentals
    • Partitioning data versus partitioning control
    • Blocking communications and parallelism
    • MPI communication models
    • Blocking vs. non-blocking communication and impacts of parallelism
    • Developing MPI programs that exchange derived data types
    • Create MPI programs that use structure derived data types
    • Review of portability and interoperability issue
  • Performance profiling (1)
    • Using software tools for performance profiling
    • Performance profiling of MPI programs
    • Speedup anomalies in parallel algorithms
  • Collective communications (2)
    • Introduction to collective communications
    • Distributed debugging
    • Introduction to MPI scatter/gather operations
    • Exploring the complete collective communication operations in MPI
  • Scalability and performance (1)
    • Understanding notions of scalability and performance
    • Metrics of scalability and performance
    • Asymptotic analysis of scalability and performance
  • Exams/Reviews (1)

Learning Outcomes:

1. Identify various forms of parallelism, their application, advantages, and drawbacks.

1.1. Describe the spectrum of parallelism available for high performance computing.

1.2. Compare and contrast the different form of parallelism.

1.3. Identify applications that can take advantage of a given type of parallelism and vice versa.

1.4. Identify suitable hardware platforms on which the various forms of parallelism can be effectively realized.

1.5. Describe the concept of semantic gap as it pertains to high level languages and HPDC platforms.

2. Effectively utilize instruction level parallelism.

2.1. Describe the concept of instruction level parallelism

2.2. Identify the sources of hazards that impact instruction level parallelism using a contemporary high level programming language.

2.3. Apply source-code level software transformations to minimize hazards and improve instruction level parallelism.

2.4. Compare performance effects of various source-code level software transformations using a performance profiler.

3. Effectively utilize multi-core CPUs and multithreading

3.1. Describe the concept of multi-core architectures

3.2. Describe the concepts of threads and distinguish between processes & threads

3.3. Demonstrate the creating threads using OpenMP compiler directives

3.4. Demonstrate the process of converting a serial program to a data parallel application

3.5. Demonstrate the process of converting a serial program to a task parallel application

3.6. Describe race conditions and side effects

3.7. Demonstrate the process of resolving race condition using OpenMP critical sections

3.8. Describe the performance tradeoff of using critical sections.

3.9. Describe the process of identifying and using multiple independent critical sections

3.10. Measure performance gains of multithreading

4. Specify, trace, and implement parallel and distributed programs using the Message Passing Interface (MPI) that solves a stated problem in a clean, robust, efficient, and scalable manner.

4.1. Describe the SPMD programming model.

4.2. Trace and create, compile, and run an MPI parallel program on a contemporary supercomputing cluster using PBS

4.3. Describe, trace, and implement programs that uses MPI’s point-to- point blocking communications

4.4. Describe, trace, and implement programs that uses MPI’s non-blocking communications

4.5. Describe, trace, and implement programs that uses collective communications

4.6. Describe, trace, and implement programs that use derived data types including vector derived data type and structure derived data types.

4.7. Be able to use 3rd party libraries compatible with MPI to develop programs.

5. Describe and empirically demonstrate concepts of parallel efficiency and scalability.

5.1. Describe the concepts of speedup, efficiency and scalability

5.2. Describe the analytical metrics of speedup, efficiency, and scalability

5.3. Identify efficient and scalable parallel programs (or algorithms) using asymptotic time complexities.

5.4. Use a performance profiler to empirically measure and compare efficiency, scalability, and speedup metrics of parallel programs.

5.5. Use profile data to improve speedup, efficiency, and scalability of a parallel program.

Graduate students:

Students taking the course for graduate credit will be expect​ed to apply ​course ​concepts to solve computationally demanding problems​, analyze experimental results​,​ and draw inferences to verify hypotheses.