dened by the combined link structure of a large number of starting nodes on the topic. Kleinberg
also proposed that the HITS algorithm could be used to nd related pages when the topic was
dened by just a single node. The Companion algorithm used HITS algorithm as a starting point
and extended and modied it in four main ways:
1. Kleinberg suggested using the following graph to nd related pages: Take a xed number (say
200) of parents of the given URL and call the set consisting of the URL and these parents the
. Now build the graph consisting of all nodes pointing to a node in the start set or
pointed to by a node in the start set. This means that \grandparents" of u are included in
the graph, while nodes that share a child with u are not included in the graph. We believe
that the latter nodes are more likely to be related to u than are the \grandparent" nodes.
Therefore our vicinity graph is structured to exclude grandparent nodes but to include nodes
that share a child with u.
2. We exploit the order of the links on a page to determine which \siblings" of u to include.
When we added this feature, the precision of our algorithm improved noticably.
3. The original HITS algorithm did not have edge weights. We use edge weights to reduce the
in uence of pages that all reside on one host, since Bharat and Henzinger have shown that
edge weights improve the precision .
4. We also merge nodes in our vicinity graph that have a large number of duplicate links. Duplicate nodes are not such a serious problem when using the HITS or algorithms to rank
query results, since the start set consists of a large number of URLs. However, when forming
the vicinity graph starting with just a single URL, the in uence of duplicate nodes is increased
because duplicate nodes with a large number of out links will quickly dominate the hub and
Kleinberg also showed that HITS computes the principal eigenvector of the matrix AA
A is the adjacency matrix of the above graph, and suggested using non-principal eigenvectors for
nding related pages. Finally, he gave anecdotal evidence that HITS might work well.
Consecutively, a sequence of papers [10, 8] presented improvements on HITS and used it to
populate a given hierarchy of categories. These improvements are not directly relevant to the task
of nding related pages.
We have presented two dierent algorithms for nding related pages in the WWW. They significantly outperform Netscape's algorithm for nding related pages. The algorithms can be implemented eciently and are suitable for use within a large scale web service providing a related pages
Our two algorithms can be extended to handle more than one input URL. In this case, the
algorithms would compute pages that are related to all input URLs. We are currently exploring
This work has beneted greatly from discussions we have had with Krishna Bharat, Andrei Broder,
Puneet Kumar, and Hannes Marais. We are also indebted to Puneet for his work on the Connectivity Server. As some of the earliest users of the server, Puneet answered our many questions and
implemented many improvements to the server in response to our suggestions. We are also grateful