Related" algorithm in Netscape is based on technology developed by Alexa, Inc., and computes its
answers based on connectivity information, content information, and usage information [11].
To compare the performance of our two algorithms and Netscape's algorithm, we performed a
user study on 59 URLs chosen by 18 volunteers. Our study results show that the precision at 10
computed over all 59 URLs of our two algorithms are 73% better and 51% better than Netscape's.
Not all algorithms gave answers for all URLs in our study. If we restrict the comparison to only
the 37 URLs for which all three algorithms returned answers, then the precision at 10 of our
two algorithms are 40% and 22% better than Netscape's algorithm. This is surprising since our
algorithms are based only on connectivity information.
Netscape's algorithm gives answers for about 17 million URLs [11], while our algorithms can
give answers for a much larger set of URLs (we have connectivity information on 180 million URLs).
This is important because it means that we can give related URL information for more URLs. Our
algorithms are also fast: in our environment, both average less than 200 msec of computation per
input URL.
The example shown in Table 1 is for a URL with a very high level of connectivity (www.nytimes.com
contains 47,751 inlinks in our Connectivity Server), and all three algorithms generally perform quite
well for well-connected URLs. Our algorithms can also work well when there is much less connectivity, as shown by the example in Table 2. This table shows the answers for the Companion and
Netscape algorithms for linas.org/linux/corba.html, one of the input URLs chosen by a user
as part of our user study. Alongside each answer is the user's rating for each answer, with a '1'
meaning that the user considered the page related, '0' meaning that the user considered the page
unrelated, and '{' meaning that the user was unable to access the page at all. The original page
was about CORBA implementations for Linux, and there were 123 pages pointing to this page in
our Connectivity Server. Nine of the ten answers given by the Companion algorithm were deemed
related by our user, while only one page from Netscape's set of answers was deemed related. Most
of Netscape's answers were about the much broader topic of Linux, rather than specically about
CORBA implementations on Linux.
Section 2 presents our algorithms in detail and describes Netscape's service, while Section 3
discusses the implementation of our algorithms. Section 4 describes the user study we performed
and presents its results, and also provides a brief performance evaluation of our algorithms. Finally,
Section 6 presents our conclusions.
2 Related Page Algorithms
In this section we describe our two algorithms (the algorithm and the algorithm), as well as Netscape's algorithm. Unlike Netscape's algorithm, both of our algorithms
exploit only the hyperlink-structure (i.e. graph connectivity) of the web and do not examine the
information about the content or usage of pages. Netscape's algorithm uses all three kinds of
information to arrive at its results.
In the sections below, we use the terms and . If there is a hyperlink from page w to
page v, we say that w is a of v and that v is a of w.
The Companion algorithm takes as input a starting URL u and consists of four steps:
1. Build a for u.
3