10.1 History of Wormhole Routing 165
Note that if we allow the flits of the worms to be routed independently,
then several store-and-forward routing algorithms are known that can route
q worms of length L in each input to randomly chosen outputs in in
O(q. L +
log N) steps, w.h.p. (see [Le92]). Hence, the store-and-forward algorithms are
faster. However, as the bandwidth of the links increases, the following results
show that this difference gets smaller.
For B > 1, a non-linear dependence on B has been observed. The first
result is due to Koch [Ko88], who showed that if each input in an N-input
butterfly sends a worm to a randomly chosen output, then the expected
number of worms that reach their destination without ever being delayed is
~(N/(log
N) 1/B)
for any constant B > 1. Cypher et al. [CMSV96] found an
algorithm for routing q worms form each input of an N-input butterfly to
randomly chosen outputs in
O((q. L
log 1/s N + (log N + L) log
N)/B)
steps,
w.h.p. They also proved a lower bound of
~2(q.L
log 1/B N. (log log
N)-2/B/B)
steps. Cole, Maggs, and Sitaraman [CMS96] independently discovered a ran-
domized algorithm that routes any q-relation from the inputs to the outputs
of an N-input butterfly in
O(L(q + logN)logl/BN 9 loglog(qN)/B)
steps.
They also proved an/2(L, q(min{L, log
N})I/B/log
log N) lower bound that
holds for a broad class of algorithms.
Some results are also known for wormhole routing on meshes and tori.
Felperin
et al.
[FRU92] analyze a simple algorithm for the n • n mesh. They
show that it can route n 2 worms of length L, one per node, to random des-
tinations in time
O(L.
nlogn +
n2/logn),
w.h.p., without buffering. This
result was improved by Bar-Noy
et al.
[BRST93]. They present a randomized
wormhole routing protocol for the n • n mesh that routes any permutation
in
O(L. n)
steps, w.h.p., without buffering. This time bound is optimal since
there exist permutations that need ~(L 9 n) steps. Recently, Bock [Bo96]
showed, that for any d and L such that L 9 d 2 = O(lo--~n), there is an algo-
rithm for routing
n d
worms of length L in an (n, d)-torus, one per node, to
random destinations in optimal time
O(L(d + n)),
w.h.p., using buffers that
can store one flit.
10.1.2 Universal Routing
In the area of universal wormhole routing, Greenberg and Oh [GO93] created
a randomized algorithm for arbitrary simple path collections of size n with
link bandwidth 1, congestion C, and dilation D that requires time
O(C.
D 9 g + C 9 L 9 flogn), w.h.p., where f = min{D,L}. This algorithm does
require the ability to buffer blocked worms. Their result was improved by
Cypher
et al.
[CMSV96]. In this paper a randomized algorithm is presented
that routes worms of length L along an arbitrary simple path collection of
size n with bandwidth B, congestion C, and dilation D in time
O((L 9 C 9
D 1/B + (D + L) log
n)/B),
w.h.p., without buffering. Furthermore they present
a randomized algorithm that routes worms of length L along an arbitrary