SUDS: Primitive Mechanisms for Memory Dependence Speculation
Abstract
As VLSI chip sizes and densities increase, it becomes possible to put many processing elements on a single chip and connect them together with a low latency communication network. In this paper we propose a software system, SUDS (Software Un-Do System), that leverages these resources using speculation to exploit parallelism in integer programs with many data dependences. We demonstrate that in order to achieve parallel speedups a speculation system must deliver memory request latencies lower than about 30 cycles. We give a cost breakdown for our current working implementation of SUDS that has a memory request latency that is nearly able to meet this goal. In addition, we identify the three primitive runtime operations that are necessary to efficiently parallelize these programs. The subsystems include (1) a fast communication path for true dependences within the program, (2) a method for renaming variables that have anti and output dependences and (3) a memory dependence speculation mechanism to guarantee that parallel accesses to global data structures don’t violate sequential program semantics. We find that these three subsystems do not interact, so that they can be implemented separately. Each subsystem is then simple enough that it can be built in software using only minimal hardware support. In this paper we focus on the memory dependence subsystem and demonstrate that it can be implemented using a simple but effective low-cost protocol.