A GPGPU compiler for memory optimization and parallelism management

http://portal.acm.org/citation.cfm?id=1806596.1806606

This paper presentsĀ a optimizing compiler for GPGPU. The compiler getsĀ naive kernels, and generates the kernels that optimized in memory coalescing, thread & thread-block merging and removing memory partition camping. This paper assumes that all necessary information such as kernel index space, kernel parameters can be known at compile time. However typical CUDA or OpenCL program get the paremeter on runtime not static. Morever the benchmark that used in this paper is so trivial. The longest kernel code is just only 27 line.

Yi Yang

Leave a Reply