A compiler framework for optimization of affine loop nests for gpgpus, ICS’08

http://portal.acm.org/citation.cfm?id=1375527.1375562

They showed the characteristics of CUDA such as a coalescing when access the global memory and a bank conflict when access the shared memory. They derived the best performance situations, and generated efficient parallel codes that operate in the efficient mode using polyhedral model.

Muthu Manikandan
Tagged on:                 

Leave a Reply