Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP’08

http://portal.acm.org/citation.cfm?id=1345206.1345220

This is a very empirical paper. They introduced the characteristics of GeForce 8800 and optimization principles. Many threads enough to hide memory latency and memory bandwidth reduction using shared or/and constant memory is them. They ported many benchmarks to CUDA, moreover open them in their web site (http://impact.crhc.illinois.edu/parboil.php) !! They showed how to calculate the limit performance in GFLOPS of a given GPU, and how to get GFLOPS of my programs. It’s very useful! Finally they warned readers that do not get stuck in local optimal but try many approach to achieve the limit performance.

Shane Ryoo
Tagged on:         

Leave a Reply