img_56ad31a4f39af
TECH

Co-synthesis techniques for embedded systems

Share on Facebook0Tweet about this on TwitterShare on LinkedIn0Email this to someone

Kelvin Yuk
EEC282
Akella

June 4, 2002

I. Introduction

What is the problem?

The problem is dealing with a high-level description of a system and finding an efficient way of co-synthesizing that system into hardware and software such that performance and component cost is optimized.

Why is important?

The complexity of systems is growing and performance requirements are increasing. Often the solution to implementing a complex system requires the use of multiple components, processors, memory elements, communication channels, etc. When the solution calls for a distributed embedded systems solution, the implementation and communication between these devices become integral in meeting cost and performance requirements. Good hardware / software co-design and co-synthesis is needed to strike a balance between performance and flexibility for these systems. However, being able to design and partition a system into an optimal implementation is a difficult task since the design space is so broad and combinations of hardware / software configurations explode. Co-synthesis and partitioning also addresses communication between processes and components or different types and architectures. One major aspect that makes co-synthesis difficult is the lack of generality design tools since the target architecture of the co-design needs to be known in order to attain a truly optimal implementation.

II. Solutions

An efficient way of avoiding the design space explosion problem is to use heuristic methods to generate an optimal HW / SW balance. In “An Architectural Co-Synthesis Algorithm for Distributed, Embedded Computing Systems”, Wayne H. Wolf develops a heuristic algorithm that simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost [1]. On the other hand, Francisco Assis Moreira do Nascimento and Wolfgang Rosenstiel describe their use of a Partial Order Model (POM) as a way of representing functional units and a modular partitioning/repartitioning algorithm as well as a hardware/software conversion to reach an optimal implementation [2]. Although the algorithms presented are both heuristic, the are vastly different in their modeling and design approach.

Architectural Co-Synthesis Algorithm

Wolf takes an architectural approach, focusing on the hardware and software available to the designer rather than on the actual component implementation. His algorithm is hierarchical in terms of importance: meeting the specs, reducing the cost of Processing Elements (PEs, i.e. processors, asics, etc…), reducing communication costs, allocating communication channels and allocating devices . His algorithm is performed in five major steps and works as follows:

  1. Generate an initial solution; allocate processes to PE’s such that all tasks are placed on PE’s fast enough to ensure that all deadlines are met; schedule the processes to determine process exclutivity and communication rates.
  2. Reallocate processes to PE’s to minimize PE cost.
  3. Reallocate processes again to minimize inter-PE communication
  4. Allocate communication channels
  5. Allocate devices, either internally to PE’s or externally to communication channels.

Each of the five steps are illustrated in Fig. 1.

Fig. 1. Architectural Co-synthesis algorithm example [1]

First, the algorithm synthesizes the process graph into a realizable system without regard to processor cost. One PE is used to execute one process. There may be a combination of PE’s that are not of the same type. Process scheduling is important in this step as it sets the framework for reducing the number of PEs. Here, as a first order form of cost reduction, Wolf delivers the least computationally intensive process to a cheaper processor(s). Secondly, the algorithm seeks to reduce the PE cost. Since PE’s are usually large computational units, minimizing the number of them will drastically reduce cost. This is done by scheduling all the processes and observing which do not overlap in time. Processes that do not overlap can be performed by the same PE. Thirdly, an analysis of inter communication is performed by observing the communication between processes in different Pes. If the communication requirements are high, separate processes that are performed on different PEs may be move to a single PE to reduce communication costs. In the figure, inter-communication reduction is done by swapping processes between PEs. Step four is to implement the communication channels between PEs once the requirements have been set. Step five is to allocate the system with external devices.

Repartitioning and HW/SW Repartitioning Algorithm

In “A Repartitioning and HW/SW Partitioning Algorithm to the Automatic Design Space Exploration In the Co-Synthesis of Embedded Systems”, Francisco Assis M. do Nascimento and Wolfgang Rosenstial present a design space exploration strategy based on a Partial Order based Model (POM) which is used in modeling the components and processes of the system. The key idea behind their work is to start with a partitioning scheme for an initial set of Hierarchical POMs and then partition and repartition them until the design specifications are met.

In the co-synthesis approach, the specification for the system is represented as Hierarchical POMs (HPOM). Each HPOM represents a process or module in the system and is analyzed in terms of performance and cost as either a hardware implementation or a software implementation. The communication overhead between the HPOMs is also estimated and included in current implementation set of HPOMs.

The optimization of the system takes place in two stages. First, partitioning is explored through different implementations and configurations of HPOMs. This generates a Partitioning Tree (PT), which is a map of different repartitioned sets of the initial HPOMs that are all able to perform the task of the system but with differing costs associated with them (Fig. 2). These different levels of abstraction for the HPOMs are generated through a repartitioning algorithm that combines two HPOMs into a single one based on reducing intercommunication between them. The resulting repartitioned HPOM will have a better performance due to less communication overhead.

The PT is generated by first considering all the individual HPOMs and then partitioning them iteratively until you have a single partitioned HPOM. Consideration is first taken for the single partition where all the initial HPOMs are collapsed into one. This single partition is analyzed for feasibility and if it does not meet the design contraints, the analysis reverts to the previous less-partitioned state. We can see this in Fig. 2 where we first analyze the L1, then L2 if L1 is not feasible, then L3 if L2 is not feasible and so forth.

After the PT is generated, an analysis in terms of HW/SW partitioning within each HPOMs partitioning scheme is performed. The analysis begins with a software implementation of all the HPOMs. Each level of the tree is tested to see if it meets the design specs. For a level that does not meet the requirements, the HPOM with the least communication cost is converted from software to hardware. The implementation is reanalyzed for performance. This process continues until a HW/SW partitioning for the current level that meets the constraints is found or else that level is all implemented in hardware.

Fig. 2 Partitioning tree for robot controller example [2]

III. Analysis and Comparison of Solutions

Architectural co-synthesis works at the top level of the design and does not produce a complete implementation of the hardware and software components of the system but a combined hardware-software architecture for the complete system [1]. This can be seen as both an advantage and a disadvantage. The architectural perspective on the system limits the detailed implementation of the components and thus the optimization is limited also. However, this technique is robust in that it can be applied to any system and maintains itself as a general heuristic. Nascimento’s work, on the other hand, requires that the system be modeled in his particular Hierarchical Partial Order Model (HPOM) form, which may not be of interest to all designers. Nascimento’s algorithm is more detailed in that it starts will basic functional units of the system and is analyzed at every stage of repartitioning. The potential hardware and software components that result from the algorithm are pliable and tentative until after the best partitioning is found. Wolf claims that his algorithm is efficient as compared to other solutions [1]. He estimates the computational complexity of his software-implemented algorithm and demonstrates its speed. Another drawback may be that it considers communication channels only late in the algorithm and prioritizes PE cost, while communication between units is vital in Nascimento’s algorithm.

Nascimento and Rosenstial themselves realize that their work is not the most robust solution to all design problems. In their own example of a Robot Controller, they manually find two communication channels that could be manually optimized into one[2]. They are successful in developing an algorithm that quickly explores the design space in an efficient manner. However, they do not tackle the computational complexity of their system. The estimation and reestimation of each HPOM of each partition of the PT will be costly in terms of design time. Each HPOM needs to be estimated in terms of performance in both hardware and software in order to measure the performance gains.

IV. Conclusion

In conclusion we explore different approaches to a similar problem of efficient co-synthesis of embedded systems. Wolf begins with knowledge about the architecture used during a co-design and improves the cost and performance of the system by process rescheduling. Nascimento and Rosenstial use a more abstract representation of functional modules, which after generation and optimization of these units determine the kind of hardware needed to implement an efficient co-design.

References

[1] W.H. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.5, (no.2), IEEE, June 1997. p.218-29. 36 references.

[2] F.A.M. do Nascimento, W. Rosenstiel, “A repartitioning and HW/SW partitioning algorithm to the automatic design space exploration in the co-synthesis of embedded systems,” Symposium on Integrated Circuits and Systems Design, (Symposium on Integrated Circuits and Systems Design, 14th Symposium on Integrated Circuits and Systems Design, Pirenopolis, Brazil, 10-15 Sept. 2001.) Los Alamitos, CA, USA: IEEE Comput. Soc, 2001. p.85-90. xii+237 pp. 6 references.

Kelvin Yuk
Kelvin Yuk obtained his PhD in Electrical Engineering in 2012.
See Comments

Leave a Reply