Supercomputing Networking Research Education Ohio Supercomputer Center Site Map Staff Directory Support

Performance Tuning Techniques for Multi-core Architectures

Description:

This two-day course demonstrates several techniques for improving the performance of applications on multi-core systems, such as OSC's Glenn Opteron cluster. These techniques involve taking advantage of features common to most modern microprocessors, including multi-level caches and multiple pipelined functional units, as well as parallelism within and across nodes.

Topics covered in the course will include:

Day One: Single-Processor Performance

  • Single-processor performance measurement and analysis tools
    • Timing
    • Compiler reports
    • Profiling
    • Hardware performance counters
  • Processor and memory architecture
    • Processor architecture features
    • Hierarchical memory and caching
  • Single-processor performance tuning techniques
    • Inlining
    • Loop Optimization
    • Memory Optimization
    • Floating point behavior
    • Optimized math libraries
Day Two: Multi-core and Parallel Performance
  • Parallel performance measurement and analysis tools
    • Timing
    • Profiling
  • Threaded performance
    • Threaded programming interfaces
    • Common threaded performance bottlenecks
  • Message passing performance
    • Message passing programming interfaces
    • Interconnect characteristics
    • Common message passing performance bottlenecks

Prerequisites:

Familiarity with UNIX and either Fortran 90 or C/C++ is preferred. Knowledge of a parallel programming method (eg. MPI or OpenMP) is helpful but not required.

Target Audience:

Those interested in improving the performance of their applications on multi-core systems, including PCs and workstations, as well as supercomputers.

Method of Delivery:

Lecture with hands-on exercises and demonstrations

Handouts:

May 2008 (PDF), by Troy Baer
Example Programs (zip)    Example Programs (tar.gz)