CSI2110 Parallel Programming Practice

This is the website of the 2011 course offering, for which the part on programming the Cell processor had been replaced by an introduction to GPGPU-programming with OpenCL.

News

[Oct 24, 2011] We're happy to announce the support of the Intel Academic Community through the micro grant ``From Cells to Sandy Bridges''.

Aim

The course introduces students to parallel programming early on, to avoid 'serialization' until later in the CS curriculum. Practical foundations of concurrency are put under a magnification glass. Individual programming assignments and a group term project provide ample hands-on experience. Parallel design patterns are introduced well into the course, to allow students to connect programming experience with SW engineering principles. The course concludes with an outlook on the not so bleak future of parallel programming language technologies.

Synopsis

A brief outline of the history of computing motivates the recent shift to multicore architectures. Parallelism, execution indeterminism, thread-and-lock-based programming and HW acceleration are introduced in a step by step approach that is accompanied by individual programming assignments. The impact of hardware architectures on programmability and performance is highlighted. Parallel programming design patterns are connected with programming examples on task, data and pipeline parallelism. A practical introduction to the StreamIt stream-parallel programming language starts the course-outlook on parallel programming trends.

Lectures

Lecture Title Slides
1 The Shift to Multicore Architectures Final version pdf
2 Parallelism Final version pdf
3 Programming with Pthreads Final version pdf
4 Thread Synchronization Final version pdf
5 Performance of Parallel Programs Final version pdf
6 Two Scalable Algorithmic Techniques Final version pdf
7 SIMD Vectorization Final version pdf
8 Programming Accelerators with OpenCL Final version pdf
9 Stream Parallelism with StreamIt Final version pdf

Preliminary version = Preliminary (2010) version Final version = Final version

Assignments

Assignment Title Description Material
1 Execution Indeterminism and Array Computations Analysis of thread execution indeterminism on Linux, computation of min/max/avrg of integer arrays using Pthreads, arg passing and scalability. pdf
2 Monte-Carlo Method, RGB to Grayscale Conversion, Thread Synchronization Thread programming and synchronization using mutexes and semaphores. pdf
3 Counting 3's Performance bottlenecks due to false sharing are examined. pdf
4 Doughnut World, vectorization of scalar code Find the exit in a labyrinth, scalar code for matrix multiplication is vectorized using AVX instructions. YSCEC
5 From Pthreads to OpenCL Kernels Simulation of openCL kernel execution with Pthreads YSCEC
6 Matrix Multiplication Compute powers of matrix in openCL YSCEC

Preliminary version = Preliminary (2010) version Final version = Final version

Textbooks

Principles of Parallel Programming, Calvin Lin, Lawrence Snyder, Addison Wesley; 1st edition, 2008.

Patterns for Parallel Programming, Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill, Addison Wesley, 3rd printing, 2007.

OpenCL Programming Guide, Aaftab Munshi, Benedict Gaster, Timothy G. Mattson, James Fung, Dan Ginsburg, Pearson Education, 2011.

History

Three previous course offerings at Yonsei University in 2008, 2009 and 2010 used the synergistic processing elements (SPEs) of the Cell BE architecture for hardware acceleration. The 2011 offering of this course replaced the material on the Cell processor with openCL. Course materials are jointly developed by Bernhard Scholz, The University of Sydney, and Bernd Burgstaller, Yonsei University. A GPGPU-centered version of this course is taught at The University of Sydney.

The Intel Academic Community kindly supports this course through the micro grant ``From Cells to Sandy Bridges''.