Pete Keleher

TreadMarks: distributed shared memory on standard workstations and operating systems http://portal.acm.org/citation.cfm?id=1267084 TreadMarks is a DSM system that exploits Lazy Release Consistency Model and Lazy Diff Creation. Lazy Release Consistency postpone the propgation of modifications in distributed system until the time

Yunheung Paek

Simplification of array access patterns for compiler optimizations http://portal.acm.org/citation.cfm?id=277650.277664 Authors present a noble technique named LMAD(Linear Memory Access Descriptor). LMAD consists of stride/span pairs and base offset. Analyzing complex array access pattern could be simpler with LMAD. Thus, some compiler

Michael Wolfe

Compilers and More: A GPU and Accelerator Programming Model http://www.hpcwire.com/specialfeatures/sc08/features/Compilers_and_More_A_GPU_and_Accelerator_Programming_Model.html Accelators support two levels of parallelism: an outer fully-parallel doall loop level, and an inner synchronous (SIMD or vector) loop level. The keys to tuning are minimizing and perhaps optimizing

Shane Ryoo

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP’08 http://portal.acm.org/citation.cfm?id=1345206.1345220 This is a very empirical paper. They introduced the characteristics of GeForce 8800 and optimization principles. Many threads enough to hide memory latency and memory bandwidth

Zheng Wang

Mapping Parallelism to Multi-cores: A Machine Learning Based Approach, PPoPP’09 http://portal.acm.org/citation.cfm?id=1504176.1504189 They extracted the static code features like operations, control flows, memory access and binary & bitwise operations using LLVM, and got data features like loop counts of kernels, L1 dcache miss and