
10 PageRank – What Is Really Relevant in the World-Wide Web? 93
Experiment (10 or more subjects, e.g., your class at school)
Every person starts from any of the pages in the above network and starts
moving through the network by following links at random. If necessary, or
out of the blue, any page can be chosen. After one minute everyone stops
upon a signal and remembers the last page visited. For each page, the number
of subjects that have stopped there is recorded.
A less contrived and somewhat larger example of links between entities is
the present book itself. If chapters are viewed as reading locations, referrals
can be interpreted as links between them.
In the network diagram depicted on the next page every chapter in this
book is represented by a rectangle, and their width and height is determined
by the number of links to and from other chapters. A slim, tall rectangle thus
represents a chapter that refers to few others, but is referred to often. Colors
are computed from the equation system and therefore indicate the PageRank
in the referral network: Turquoise rectangles represent chapters with lowest
PageRank, orange rectangles those with highest PageRank, and appropriately
mixed colors are used for values in between.
The chapters on variants of sorting are indeed the ones that a reader ends
up being referred to most often when reading in random fashion, following
links as if on the Web. This matches their crucial role in algorithmics.
Solutions
The model described above cannot be used directly to compare the relevance
of hits returned for a query, since the network that is to be evaluated by an
Internet search engine yields a system with billions of equations. Even the
fastest computers cannot solve such systems using the methods we learn in
school.
Fortunately, the systems resulting from our approach exhibit some special
properties that can be exploited in the search for a solution. Moreover, we do
not really need an exact solution, so that a very simple and efficient algorithm
can instead be used to quickly approximate a solution. There is one equation
for every variable, and if we know the values of all other variables, we would
simply substitute them and thus determine the value of the final variable. The
algorithm therefore starts with an arbitrary initial assignment (e.g., the same
non-zero value for all variables) and solves for every variable of the associated
equation relative to the values currently assigned for all other variables. With
these newly obtained assignments the process is repeated, then repeated, and
repeated, and so on.