<- Previous | First | Next ->

Input: www.nytimes.com

www.usatoday.com USA Today newspaper

www.washingtonpost.com Washington Post newspaper

www.cnn.com Cable News Network

www.latimes.com Los Angeles Times newspaper

www.wsj.com Wall Street Journal newspaper

www.msnbc.com MSNBC cable news station

www.sjmercury.com San Jose Mercury News newspaper

www.chicago.tribune.com Chicago Tribune newspaper

www.nando.net Nando Times on-line news service

www.the-times.co.uk The Times newspaper

Table 1: Example results produced by the Companion algorithm

Input: linas.org/linux/corba.html

Companion Netscape

1 www.cs.wustl.edu/~schmidt/TAO.html 0 labserver.kuntrynet.com/~linux

1 dsg.harvard.edu/public/arachne

0 web.singnet.com.sg/~siuyin

1 jorba.castle.net.au/

{ www.clark.net/pub/srokicki/linux

1 www-b0.fnal.gov:8000/ROBIN

{ www.earth.demon.co.uk/linux/uk...

1 www.paragon-software.com/products/oak 0 www.emry.net/webwatcher

1 www.tatanka.com/orb1.htm

0 www.interl.net/~jasoneng/NLL/lwr

1 www.oot.co.uk/dome-index.html

0 www.jnpcs.com/mkb/linux

0 www.intellisoft.com/~mark

1 www.linuxresources.com/

1 www.dstc.edu.au/AU/sta /mart... 0 www.liszt.com/

1 www.inf.fu-berlin.de/~brose/jacorb 0 www.local.net/~jgo/linuxhelp.html

Table 2: Comparison of results for the Companion and Netscape algorithms. A \1" means that the

page was valuable, a \0" means that the page was not valuable, a \{" means that the page could

not be accessed.

input (in this case, the results for the Cocitation algorithm are identical and the results for Netscape

are very similar, although this is not always true).

One of our goals was to design algorithms with high precision that are very fast and that do not

require a large number of di erent kinds of input data. Since we have a tool that gives us access to

the hyperlink structure of the web (the [4]), we focused on algorithms that only

use connectivity information to identify related pages. Our algorithms use only the information

about the links that appear on each page and the order in which the links appear. They neither

examine the content of pages, nor do they examine patterns of how users tend to navigate among

pages.

Our Companion algorithm is derived from the HITS algorithm proposed by Kleinberg for ranking search engine queries [17]. Kleinberg suggested that the HITS algorithm could be used for

nding related pages as well, and provided anecdotal evidence that it might work well. In this

paper, we extend the algorithm to exploit not only links but also their order on a page (see Section 2.1.1) and present the results of a user-study showing that the resulting algorithm works very

well.

The Cocitation algorithm nds pages that are frequently cocited with the input URL u (that

is, it nds other pages that are pointed to by many other pages that all also point to u).

Netscape Communicator Version 4.06 introduced a related pages service that is built into the

browser [12] (see Section 2.3 for a more detailed discussion). On the browser screen, there is a

\What's Related" button, which presents a menu of related web pages in some cases. The \What's

2


<- Previous | First | Next ->