BLOG.JUNGWON.KIM

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL

http://??

The authors propose a purely static approach based on predictive modeling and program features. They extract static code features from OpenCL kernel source codes and runtime pass the run-time information such as kernel arguments or index space to the model. Using Support Vector Machines, they partition and mapping the workload of OpenCL kernel on heterogeneous CPU-GPU systems. However, the static features is less accurate than runtime features. Therefore, it degrades the quality of the SVM model.

Alex Aletà

Removing communications in clustered microarchitectures through instruction replication

http://portal.acm.org/citation.cfm?id=1011529

The authors presents a compiler technique that removes communications in cluster. When a value is needed in more than one cluster, one alternative to generating a communication is to compute the value in each place where it is needed.

R. Bianchini

Hiding communication latency and coherence overhead in software DSMs

http://portal.acm.org/citation.cfm?id=237185

The authors present a PCI-based programmable protocol controller for hiding communication and coherence overheads. Diff process in the protocol is assisted by hardware.

LaTex equation

http://en.wikipedia.org/wiki/Help:Displaying_a_formula

warning: deprecated conversion from string constant to ‘char*’

-Wno-write-strings

SVN: Using Branches

$ svn mkdir http://svn.example.com/repos/calc/branches -m “make the branches directory to hold all the branches”

$ svn copy http://svn.example.com/repos/calc/trunk http://svn.example.com/repos/calc/branches/my-calc-branch -m “Creating a private branch of /calc/trunk.”

$ svn checkout http://svn.example.com/repos/calc/branches/my-calc-branch

Can new versions of Acrobat create a PDF 1.4 File for FDA Submissions?

http://blogs.adobe.com/acrobatforlifesciences/?p=35

IEEE Author Digital Tool Box

http://www.ieee.org/publications_standards/publications/authors_journals.html

How To: Disable Firewall on RHEL / CentOS / RedHat Linux

http://www.cyberciti.biz/faq/disable-linux-firewall-under-centos-rhel-fedora/

Next enter the following three commands to disable firewall.
# service iptables save
# service iptables stop
# chkconfig iptables off

If you are using IPv6 firewall, enter:
# service ip6tables save
# service ip6tables stop
# chkconfig ip6tables off

Crop pages for PDF

http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS546948FF-6085-4b14-8640-D9EDE30AD8CB.w.html

Crop a page with the Crop tool
1.Choose Tools > Advanced Editing > Crop Tool.
2.Drag a rectangle on the page you want to crop. If necessary, drag the corner handles of the cropping rectangle until the page is the size you want.
3.Double-click inside the cropping rectangle or shift+return.
The Crop Pages dialog box opens, indicating the margin measurements of the cropping rectangle and the page to be cropped. You can override these settings or apply other options by making new selections in the dialog box before clicking OK.

unset export variable

unset name_of_the_variable

sshd slow connect with ‘UseDNS yes’

vi /etc/ssh/sshd_config

UseDNS no

Intel Technology Journal

ftp://download.intel.co.jp/technology/itj/

Massimiliano Fatica

Accelerating linpack with CUDA on heterogenous clusters

http://portal.acm.org/citation.cfm?id=1513895.1513901

The author calculates the bandwidth of PCIe and the peak GFlops of a CPU and a GPU. Then calculate the execution time with the measurement and the data input size, and get the optimal split fraction. The author does not overlap the execution with data transfer. Because the memory system cannot supply data to both the PCIe and the CPU at maximum speed on Intel systems using Front Side Bus (FSB). However, on the new Intel systems with Quick Path Interconnect (QPI), this may not be the case.

Amnon Barak

A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU Devices

http://www.mosix.org/txt_pub.html

The authors provide a package named Many GPUs Package(MGP). MGP runs OpenCL applications on the cluster consists many nodes that have a GPU or GPUs without modifying OpenCL kernel. The user should port an OpenCL host program to a program that uses MGP API. This paper does not address how to distribute workload into nodes.

Chi-Keung Luk

Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5375318&tag=1

The authors present a technique named Qilin that distributes workload into CPUs and a GPU. Qilin maintains a database that provides execution time projection for all the programs it has ever executed. Qilin projects the execution time for a CPU and a GPU by an empirical approach. Qilin runs a program with training set. The set is divided into two chunk. Each chunk is for a CPU and a GPU. Each chunk is divided into sub chunks again. Qilin run with the sub chunks and measure the execution times for all sub chunks. Qilin uses curve fitting to construct two linear equations for a CPU and a GPU. With this databases, Qilin can predict the optimal well-balanced workload distribution.

Canqun Yang

Adaptive Optimization for Patescale Heterogeneous CPU/GPU Computing

http://www.computer.org/portal/web/csdl/doi/10.1109/CLUSTER.2010.12

The authors present an adaptive optimization for heterogeneous CPU/GPU systems. They distribute workload into CPU and GPU by using results from running a program. They measure workload and execution time of CPU and GPU, then recalculate the fraction of the workload mapped to the CPU and GPU. The runtime saves this information into the table, and use the fraction in next time.