Space and Time Hierarchies for Classes of Control Structures and Data Structures
Journal of the ACM (JACM)
Asynchronous and clocked control structures for VLSI based interconnection networks
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Bounds on minimax edge length for complete binary trees
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A complexity theory for VLSI
Optimal dynamic scheduling of task tree on constant-dimensional architectures
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A Modular Systolic Linearization of the Warshall-Floyd Algorithm
IEEE Transactions on Parallel and Distributed Systems
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Modular matrix multiplication on a linear array
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Mapping dynamic programming onto modular linear systolic arrays
Distributed Computing
Mapping Homogeneous Graphs on Linear Arrays
IEEE Transactions on Computers
Modular Matrix Multiplication on a Linear Array
IEEE Transactions on Computers
Hi-index | 0.01 |
Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest means of synchronization is the use of a global clock. Unfortunately, large clocked systems can be difficult to implement because of the inevitable problem of clock skews and delays, which can be especially acute in VLSI systems as feature sizes shrink. For the near term, good engineering and technology improvements can be expected to maintain the feasibility of clocking in such systems; however, clock distribution problems crop up in any technology as systems grow. An alternative means of enforcing necessary synchronization is the use of self-timed, asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best-possible synchronization schemes for large processor arrays are proposed. One set of models is based on assumptions that allow the use of a pipelined clocking scheme, where more than one clock event is propagated at a time. In this case, it is shown that even assuming that physical variations along clock lines can produce skews between wires of the same length, any one-dimensional processor array can be correctly synchronized by a global pipelined clock while enjoying desirable properties such as modularity, expandability and robustness. This result cannot be extended to two-dimensional arrays, however—the paper shows that under this assumption, it is impossible to run a clock such that the maximum clock skew between two communicating cells will be bounded by a constant as systems grow. For such cases or where pipelined clocking is unworkable, a synchronization scheme incorporating both clocked and “asynchronous” elements is proposed.