Издательство Springer, 2006, -262 pp.
The influence of embedded systems is constantly growing. Increasingly powerful and versatile devices are being developed and put on the market at a fast pace. The number of features is increasing, and so are the constraints on the systems conceing size, performance, energy dissipation and timing predictability. Since most systems today use a processor to execute an application program rather than using dedicated hardware, the requirements can not be fulfilled by hardware architects alone: Hardware and software have to work together in order to meet the tight constraints put on mode devices. This work presents approaches that target the software generation process using an energy and memory architecture aware C-compiler. The consideration of energy dissipation and of the memory architecture leads to a large optimization potential conceing performance and energy dissipation.
This work first presents an overview over the used timing, energy and simulation models for one processor architecture and for different memory architectures like caches, scratchpad memories and main memories in both SRAM, DRAM and Flash technology. Following an introduction to the used compilation framework, the compiler based exploitation of partitioned scratchpad memories is presented. A simple formalized Base model is presented that models the consequences of statically allocating instructions and data to several small scratchpad partitions, followed by a number of extensions that treat memory objects and their dependencies at a finer granularity. A method for allocating objects to separate scratchpad memories for instructions and data, as found in the most recent ARM designs, is also presented. Finally, a model that also considers the leakage power of memories is introduced. Results show that significant savings of up to 80% of the total energy can be achieved by using the presented scratchpad allocation algorithms. The flexibility and extensibility of the presented approaches is another benefit.
Many embedded systems have to respect timing constraints. Therefore, timing predictability is of increasing importance. Whenever guarantees conceing reaction times have to be given, worst case execution time (WCET) analysis techniques are being used during the design of the system in order to provide a guaranteed upper bound on the WCET. The contribution of this work deals with the influence of scratchpad memories on timing predictability. It is shown that scratchpad memories, allocated using the algorithms mentioned above, are inherently predictable, since the positions of all objects in the different memories are fixed at compile time and no dynamic decisions have to be taken at runtime. The results show that the determined WCET values for systems with a scratchpad memory scale with the performance benefit observed during average case simulation, indicating that scratchpad memories lead to improvements both conceing average case and worst case. In particular when compared to caches, the WCET analysis for scratchpad based systems is simpler, yet allows the generation of tighter bounds. The effects of allocating instructions and data to the scratchpad using a dynamic allocation algorithm are shown in this work for the first time. This allocation technique both outperforms the cache and leads to better timing predictability, making scratchpad memories a natural choice for timing constrained embedded systems.
Advances in main memory technology include the availability of memory chips with integrated power management. The first optimization targeting main memories exploits these features by allocating memory objects to a scratchpad partition in order to allow the main memory to be put into power down mode whenever instructions and data are being accessed from the scratchpad memory. The allocation problem uses the standby energy of the main memory in SDRAM technology to allocate objects to the scratchpad memory so as to maximize the power down periods of the main memory. Total energy savings of up to 80% were achieved. In the second main memory optimization, suitable Flash memories are being used as instruction memories using eXecute-In-Place (XIP). By considering the tradeoff between the overhead required to copy instructions to the faster SDRAM and the benefits achieved due to the faster execution, the compiler determines an optimal allocation of instructions to Flash and SDRAM memories. The main benefit of this approach is significant savings in the required amount of instruction memory in SDRAM technology, one of the main cost factors for embedded systems.
Finally, the influence of the size of the register file on the quality of the generated code is studied. It is shown that if the register file is too small, then a lot of code overhead is generated due to the need to spill register values to memory. Beside presenting results for the spill code overhead, performance and energy dissipation of the generated code, a compiler guided method to choose an adequate size for the register file for a certain application is presented.
Introduction
Models and Tools
Scratchpad Memory Optimizations
Main Memory Optimizations
Register File Optimization
Summary
Future Work
The influence of embedded systems is constantly growing. Increasingly powerful and versatile devices are being developed and put on the market at a fast pace. The number of features is increasing, and so are the constraints on the systems conceing size, performance, energy dissipation and timing predictability. Since most systems today use a processor to execute an application program rather than using dedicated hardware, the requirements can not be fulfilled by hardware architects alone: Hardware and software have to work together in order to meet the tight constraints put on mode devices. This work presents approaches that target the software generation process using an energy and memory architecture aware C-compiler. The consideration of energy dissipation and of the memory architecture leads to a large optimization potential conceing performance and energy dissipation.
This work first presents an overview over the used timing, energy and simulation models for one processor architecture and for different memory architectures like caches, scratchpad memories and main memories in both SRAM, DRAM and Flash technology. Following an introduction to the used compilation framework, the compiler based exploitation of partitioned scratchpad memories is presented. A simple formalized Base model is presented that models the consequences of statically allocating instructions and data to several small scratchpad partitions, followed by a number of extensions that treat memory objects and their dependencies at a finer granularity. A method for allocating objects to separate scratchpad memories for instructions and data, as found in the most recent ARM designs, is also presented. Finally, a model that also considers the leakage power of memories is introduced. Results show that significant savings of up to 80% of the total energy can be achieved by using the presented scratchpad allocation algorithms. The flexibility and extensibility of the presented approaches is another benefit.
Many embedded systems have to respect timing constraints. Therefore, timing predictability is of increasing importance. Whenever guarantees conceing reaction times have to be given, worst case execution time (WCET) analysis techniques are being used during the design of the system in order to provide a guaranteed upper bound on the WCET. The contribution of this work deals with the influence of scratchpad memories on timing predictability. It is shown that scratchpad memories, allocated using the algorithms mentioned above, are inherently predictable, since the positions of all objects in the different memories are fixed at compile time and no dynamic decisions have to be taken at runtime. The results show that the determined WCET values for systems with a scratchpad memory scale with the performance benefit observed during average case simulation, indicating that scratchpad memories lead to improvements both conceing average case and worst case. In particular when compared to caches, the WCET analysis for scratchpad based systems is simpler, yet allows the generation of tighter bounds. The effects of allocating instructions and data to the scratchpad using a dynamic allocation algorithm are shown in this work for the first time. This allocation technique both outperforms the cache and leads to better timing predictability, making scratchpad memories a natural choice for timing constrained embedded systems.
Advances in main memory technology include the availability of memory chips with integrated power management. The first optimization targeting main memories exploits these features by allocating memory objects to a scratchpad partition in order to allow the main memory to be put into power down mode whenever instructions and data are being accessed from the scratchpad memory. The allocation problem uses the standby energy of the main memory in SDRAM technology to allocate objects to the scratchpad memory so as to maximize the power down periods of the main memory. Total energy savings of up to 80% were achieved. In the second main memory optimization, suitable Flash memories are being used as instruction memories using eXecute-In-Place (XIP). By considering the tradeoff between the overhead required to copy instructions to the faster SDRAM and the benefits achieved due to the faster execution, the compiler determines an optimal allocation of instructions to Flash and SDRAM memories. The main benefit of this approach is significant savings in the required amount of instruction memory in SDRAM technology, one of the main cost factors for embedded systems.
Finally, the influence of the size of the register file on the quality of the generated code is studied. It is shown that if the register file is too small, then a lot of code overhead is generated due to the need to spill register values to memory. Beside presenting results for the spill code overhead, performance and energy dissipation of the generated code, a compiler guided method to choose an adequate size for the register file for a certain application is presented.
Introduction
Models and Tools
Scratchpad Memory Optimizations
Main Memory Optimizations
Register File Optimization
Summary
Future Work