Full text loading...
-
oa Compiler-directed design of memory hierarchy for embedded systems
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2013 Issue 1, Nov 2013, Volume 2013, ICTP-055
Abstract
In embedded real-time communication and multimedia processing applications, the manipulation of large amounts of data has a major effect on both power consumption and performance of the system. Due to the significant amount of data transfers between the processing units and the large and energy consuming off-chip memories, these applications are often called data-dominated or data-intensive. Providing sufficient bandwidth to sustain fast and energy-efficient program execution is a challenge for system designers: due to the growing gap of speed between processors and memories, the performance of the whole VLSI system will mainly depend on the memory subsystem, when memory is unable to provide data and instructions at the pace required by the processor. This effect is, sometimes, referred in the literature as the memory wall problem. At system level, the power cost can be reduced by introducing an optimized custom memory hierarchy that exploits the temporal data locality. Hierarchical memory organizations reduce energy consumption by exploiting the non-uniformity of memory accesses: the reduction can be achieved by assigning the frequently-accessed data to low hierarchy levels, a problem being how to optimally assign the data to the memory layers. This hierarchical assignment diminishes the dynamic energy consumption of the memory subsystem - which expands due to memory accesses. Moreover, it diminishes the static energy consumption as well, since this decreases monotonically with the memory size. Moreover, within a given memory hierarchy level, power can be reduced by memory banking - whose principle is to divide the address space in several smaller blocks, and to map these blocks to physical memory banks that can be independently enabled and disabled. Memory partitioning is also a performance-oriented optimization strategy, because of the reduced latency due to accessing smaller memory blocks. Arbitrarily fine partitioning is prevented since an excessively large number of small banks is area inefficient, imposing a severe wiring overhead -- which increases communication power and decreases performance. This presentation will introduce an electronic design automation (EDA) methodology for the design of hierarchical memory architectures in embedded data-intensive applications, mainly in the area of multidimensional signal processing. The input of this memory management framework is the behavioral specifications of the applications, that are assumed to be procedural and affine. Figure 1 shows an illustrative example of behavioral specification with 6 nested loops. This framework employs a formal model operating with integral polyhedra, using techniques specific to the data-dependence analysis employed in modern compilers. Different from previous works, three optimization problems - the data assignment to the memory layers (on-chip scratch-pad memory and off-chip DRAM), the mapping of multidimensional signals to the physical memories, and the banking of the on-chip memory (see Figure 2) - are addressed in a consistent way, based on the same formal model. The main design target is the reduction of the static and dynamic energy consumption in the memory subsystem, but the same formal model and algorithmic principles can be applied for the reduction of the overall time of access to memories, or combinations of these design goals.