A compiler framework for optimization of affine loop nests for gpgpus, ICS’08
http://portal.acm.org/citation.cfm?id=1375527.1375562
They showed the characteristics of CUDA such asĀ a coalescing when access the global memory and a bank conflict when access the shared memory. They derived the best performance situations, and generated efficient parallelĀ codes that operate in the efficient mode using polyhedral model.
Muthu Manikandan