
190 Martin Dietzfelbinger
Protocol Text Comparison by Fingerprinting
Alice has the sequence T
A
=(a
1
,...,a
n
) of numbers between 0 and d − 1.
Bob has the sequence T
B
=(b
1
,...,b
n
) of numbers between 0 and d − 1.
1. Alice tells Bob what n is. If n = n
, Bob says “different,” and STOP.
2. Alice and Bob agree on a number k of repetitions.
3. Alice finds some prime number m a little larger than d and 10n.
She chooses k numbers r
1
,...,r
k
between 1 and m − 1atrandom,
and tells Bob m and r
1
,...,r
k
.
4. Alice calculates FP
m
(T
A
,r
1
),...,FP
m
(T
A
,r
k
).
(For this, she modifies algorithm FP in such a way that
the text T
A
is only run through once to calculate all k results.)
5. Bob calculates FP
m
(T
B
,r
1
),...,FP
m
(T
B
,r
k
) in the same way.
6. Alice tells her k results to Bob.
7. Bob compares with his k values.
If there are differences, he says “different,” and STOP.
If all values are equal, he says “can’t see a difference,” and STOP.
One can say the following about the result of the protocol.
• If Alice and Bob have the same text, then the sequences of k fingerprints
they calculate are the same. Hence the result always is “can’t see a differ-
ence.”
• If Alice and Bob have different texts (of the same length), then by the Fin-
gerprinting Theorem, among the m −1 numbers Alice chooses from, there
are at most n−1 many values for r that make FP
m
(T
A
,r)andFP
m
(T
B
,r)
coincide. For r randomly chosen, the probability that FP
m
(T
A
,r)and
FP
m
(T
B
,r)areequalisatmost(n − 1)/(m − 1). The probability that
Alice chooses such “bad” r’s in all k trials, making Bob say “can’t see a
difference” erroneously, is no larger than
(n − 1)
k
(m − 1)
k
=
n − 1
m − 1
k
<
n
m
k
.
Since we have assumed that m ≥ 10n, the bound is smaller than 1/10
k
,
and by choosing k large enough Alice and Bob can adjust the error probability
bound to as tiny a value as they wish.
If m ≈ 10n,andn has exactly l decimal digits, and Alice and Bob want the
error probability to be at most 10
−k
, it is sufficient that Alice communicate
(l +1)· (2 + 2k) digits. It is astonishing that this figure changes only very
slowly when the length of the text is increased: When comparing texts that
are ten times as long, i.e., n increases by a factor of ten, the number of digits
that have to be communicated increases only by 2k.