Takehiko Inoue and Minoru Furuya are my favorite comic artists. Specially VAGABOND by Inoue is something more than comic. It tells us a life. I hope Korean and Japanese be well in with each other.
Cedric Bastoul
Code Generation in the Polyhedral Model Is Easier Than You Think http://portal.acm.org/citation.cfm?id=1025992 blahblah
Isaac Gelado
CUBA: an architecture for efficient CPU/co-processor data communication http://portal.acm.org/citation.cfm?id=1375571 This paper presents a hardware-supported double-buffered mechanism that allows the CPU to transfer data for next invocation while the coprocessor is executing the current call.
Comparison of Nvidia graphics processing units
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
List of device bandwidths
http://en.wikipedia.org/wiki/List_of_device_bandwidths
Michael Wolfe
Compilers and More: A GPU and Accelerator Programming Model http://www.hpcwire.com/specialfeatures/sc08/features/Compilers_and_More_A_GPU_and_Accelerator_Programming_Model.html Accelators support two levels of parallelism: an outer fully-parallel doall loop level, and an inner synchronous (SIMD or vector) loop level. The keys to tuning are minimizing and perhaps optimizing
Byoung-Tak Zhang
Teaching an Agent by Playing a Multimodal Memory Game: Challenges for Machine Learners and Human Teachers, AAAI’09 Zhang developed a research platform that implements a cognitive game called multimodal memory game (MMG) to study machine learning architectures and algorithms for
ACM SIG Proceedings Templates
http://www.acm.org/sigs/publications/proceedings-templates
Hubert Nguyen
GPU Gems 3 http://portal.acm.org/citation.cfm?id=1407436 Chapter 31 Lars Nyland – Fast N-Body Simulation with CUDA This article presents a parallel technique for a O(N*N) calculation. All calculations in this algorithm are all parallelable, thus all calculations can executed simultaneously. But it needs N*N
Vasily Volkov
Benchmarking GPUs to tune dense linear algebra, SC’08 http://portal.acm.org/citation.cfm?id=1413370.1413402 They showed the characteristics of GPUs by some empirical experiments, and adopted the old methods that used for vector processors to GPUs. Frankly I cannot understand this paper. 🙁 I will
John Nickolls
Scalable Parallel Programming with CUDA, Queue Vol.6 Issue 2, ’08 http://portal.acm.org/citation.cfm?id=1365500 Just Programming Guide. No more.
Shane Ryoo
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP’08 http://portal.acm.org/citation.cfm?id=1345206.1345220 This is a very empirical paper. They introduced the characteristics of GeForce 8800 and optimization principles. Many threads enough to hide memory latency and memory bandwidth
Compiler Loop Optimization
http://en.wikipedia.org/wiki/Compiler_optimization http://en.wikipedia.org/wiki/Loop_optimization
The Time Zone Converter
http://www.timezoneconverter.com/cgi-bin/tzc.tzc
Christian Lengauer
Loop Parallelization in the Polytope Model, CONCUR’93 http://portal.acm.org/citation.cfm?id=703499 He showed how to use a polyhedral model in generating parallel codes from sequential codes by giving a concrete example.
Muthu Manikandan
A compiler framework for optimization of affine loop nests for gpgpus, ICS’08 http://portal.acm.org/citation.cfm?id=1375527.1375562 They showed the characteristics of CUDA such as a coalescing when access the global memory and a bank conflict when access the shared memory. They derived the best
Zheng Wang
Mapping Parallelism to Multi-cores: A Machine Learning Based Approach, PPoPP’09 http://portal.acm.org/citation.cfm?id=1504176.1504189 They extracted the static code features like operations, control flows, memory access and binary & bitwise operations using LLVM, and got data features like loop counts of kernels, L1 dcache miss and
Iterative Compilation
http://www.lri.fr/~girbal/site_wrapit/iterative.html Evolutinary algorithms may also help in the search of the optimal parameters or transformations.
Louis-Noel Pouchet
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time, CGO’07 http://portal.acm.org/citation.cfm?id=1252537 They made polyhedral models from a sequential program using a method that Paul Feautrier proposed in 1992. Many schedules could come from the polyhedral model, thus they choose only the legal