Input: www.nytimes.com
www.usatoday.com USA Today newspaper
www.washingtonpost.com Washington Post newspaper
www.cnn.com Cable News Network
www.latimes.com Los Angeles Times newspaper
www.wsj.com Wall Street Journal newspaper
www.msnbc.com MSNBC cable news station
www.sjmercury.com San Jose Mercury News newspaper
www.chicago.tribune.com Chicago Tribune newspaper
www.nando.net Nando Times on-line news service
www.the-times.co.uk The Times newspaper
Table 1: Example results produced by the Companion algorithm
Input: linas.org/linux/corba.html
Companion Netscape
1 www.cs.wustl.edu/~schmidt/TAO.html 0 labserver.kuntrynet.com/~linux
1 dsg.harvard.edu/public/arachne
0 web.singnet.com.sg/~siuyin
1 jorba.castle.net.au/
{ www.clark.net/pub/srokicki/linux
1 www-b0.fnal.gov:8000/ROBIN
{ www.earth.demon.co.uk/linux/uk...
1 www.paragon-software.com/products/oak 0 www.emry.net/webwatcher
1 www.tatanka.com/orb1.htm
0 www.interl.net/~jasoneng/NLL/lwr
1 www.oot.co.uk/dome-index.html
0 www.jnpcs.com/mkb/linux
0 www.intellisoft.com/~mark
1 www.linuxresources.com/
1 www.dstc.edu.au/AU/sta/mart... 0 www.liszt.com/
1 www.inf.fu-berlin.de/~brose/jacorb 0 www.local.net/~jgo/linuxhelp.html
Table 2: Comparison of results for the Companion and Netscape algorithms. A \1" means that the
page was valuable, a \0" means that the page was not valuable, a \{" means that the page could
not be accessed.
input (in this case, the results for the Cocitation algorithm are identical and the results for Netscape
are very similar, although this is not always true).
One of our goals was to design algorithms with high precision that are very fast and that do not
require a large number of dierent kinds of input data. Since we have a tool that gives us access to
the hyperlink structure of the web (the [4]), we focused on algorithms that only
use connectivity information to identify related pages. Our algorithms use only the information
about the links that appear on each page and the order in which the links appear. They neither
examine the content of pages, nor do they examine patterns of how users tend to navigate among
pages.
Our Companion algorithm is derived from the HITS algorithm proposed by Kleinberg for ranking search engine queries [17]. Kleinberg suggested that the HITS algorithm could be used for
nding related pages as well, and provided anecdotal evidence that it might work well. In this
paper, we extend the algorithm to exploit not only links but also their order on a page (see Section 2.1.1) and present the results of a user-study showing that the resulting algorithm works very
well.
The Cocitation algorithm nds pages that are frequently cocited with the input URL u (that
is, it nds other pages that are pointed to by many other pages that all also point to u).
Netscape Communicator Version 4.06 introduced a related pages service that is built into the
browser [12] (see Section 2.3 for a more detailed discussion). On the browser screen, there is a
\What's Related" button, which presents a menu of related web pages in some cases. The \What's
2