Uncategorized – Page 3 – BLOG.JUNGWON.KIM

Ken Kennedy

Automatic data layout for distributed-memory machines http://portal.acm.org/citation.cfm?id=291891.291901 blah

TreadMarks: distributed shared memory on standard workstations and operating systems http://portal.acm.org/citation.cfm?id=1267084 TreadMarks is a DSM system that exploits Lazy Release Consistency Model and Lazy Diff Creation. Lazy Release Consistency postpone the propgation of modifications in distributed system until the time

Yunheung Paek

Simplification of array access patterns for compiler optimizations http://portal.acm.org/citation.cfm?id=277650.277664 Authors present a noble technique named LMAD(Linear Memory Access Descriptor). LMAD consists of stride/span pairs and base offset. Analyzing complex array access pattern could be simpler with LMAD. Thus, some compiler

How to delete all files in a CVS directory

ls -1 > tmp cat tmp | xargs rm -f cat tmp | cvs delete cvs commit -m “” cvs update

How to remove all except the newest file

rm -f !(`ls -t | head -1`)

How to install Grub from a live Ubuntu cd

$ sudo -i $ grub grub> find /boot/grub/stage1 (hd0, 5) grub> root (hd0, 5) grub> setup (hd0)

Replace string in file with sed

sed -i ‘s/old/new/g’ filename 😉

How to delete ^M character using vi

In vi, do a :%s/^M//g To get the ^M hold the control key, press V then M (Both while holding the control key) and the ^M will appear. or dos2unix

Cedric Bastoul

Code Generation in the Polyhedral Model Is Easier Than You Think http://portal.acm.org/citation.cfm?id=1025992 blahblah

Isaac Gelado

CUBA: an architecture for efficient CPU/co-processor data communication http://portal.acm.org/citation.cfm?id=1375571 This paper presents a hardware-supported double-bufferedÂ mechanism that allows the CPU to transfer data for next invocation while the coprocessor is executing the current call.

Michael Wolfe

Compilers and More: A GPU and Accelerator Programming Model http://www.hpcwire.com/specialfeatures/sc08/features/Compilers_and_More_A_GPU_and_Accelerator_Programming_Model.html Accelators support two levels of parallelism: an outer fully-parallel doall loop level, and an inner synchronous (SIMD or vector) loop level. The keys to tuning are minimizing and perhaps optimizing

Byoung-Tak Zhang

Teaching an Agent by Playing a Multimodal Memory Game: Challenges for Machine Learners and Human Teachers, AAAI’09 Zhang developed a research platform that implements a cognitive game called multimodal memory game (MMG) to study machine learning architectures and algorithms for

Hubert Nguyen

GPU Gems 3 http://portal.acm.org/citation.cfm?id=1407436 Chapter 31 Lars Nyland – Fast N-Body Simulation with CUDA This article presents a parallel technique forÂ aÂ O(N*N) calculation. All calculations in this algorithm are all parallelable, thus all calculations can executed simultaneously. But it needs N*N

Vasily Volkov

Benchmarking GPUs to tune dense linear algebra, SC’08 http://portal.acm.org/citation.cfm?id=1413370.1413402 They showed the characteristics of GPUs by some empirical experiments, and adopted the old methods that used for vector processors to GPUs. Frankly I cannot understand this paper. 🙁 I will

John Nickolls

Scalable Parallel Programming with CUDA, Queue Vol.6 Issue 2, ’08 http://portal.acm.org/citation.cfm?id=1365500 Just Programming Guide. No more.

Shane Ryoo

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP’08 http://portal.acm.org/citation.cfm?id=1345206.1345220 This is a very empirical paper. They introduced the characteristics of GeForce 8800 and optimization principles. Many threads enough to hide memory latency and memory bandwidth

Christian Lengauer

Loop Parallelization in the Polytope Model, CONCUR’93 http://portal.acm.org/citation.cfm?id=703499 He showed how to use a polyhedral model in generating parallel codes from sequential codes by giving a concrete example.

Muthu Manikandan

A compiler framework for optimization of affine loop nests for gpgpus, ICS’08 http://portal.acm.org/citation.cfm?id=1375527.1375562 They showed the characteristics of CUDA such asÂ a coalescing when access the global memory and a bank conflict when access the shared memory. They derived the best

Zheng Wang

Mapping Parallelism to Multi-cores: A Machine Learning Based Approach, PPoPP’09 http://portal.acm.org/citation.cfm?id=1504176.1504189 They extracted the static code features like operations, control flows, memory access andÂ binary &Â bitwise operations using LLVM, and got data features like loop counts of kernels,Â L1 dcache miss and

Louis-Noel Pouchet

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time, CGO’07 http://portal.acm.org/citation.cfm?id=1252537 They madeÂ polyhedral models from a sequential program usingÂ aÂ method that Paul Feautrier proposed in 1992. Many schedules could come from the polyhedral model, thus they choose only the legal