CUBA: an architecture for efficient CPU/co-processor data communication
http://portal.acm.org/citation.cfm?id=1375571
This paper presents a hardware-supported double-buffered mechanism that allows the CPU to transfer data for next invocation while the coprocessor is executing the current call.
Isaac Gelado