- ... synthesis.
- An index page is a page
consisting of links to a set of pages that cover a particular topic
(e.g., electric guitars).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...[5]
- http://www.cs.cmu.edu/webwatcher/
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...[7]
- http://zeus.gmd.de/projects/avanti.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
Firefly
- http://www.firefly.com
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... site,
- http://machines.hyperreal.org
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... server.
- A web site is restricted to a collection of HTML
documents residing at a single server -- we are not yet able to handle
dynamically-generated pages or multiple servers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... visitor.
- In fact,
this is not necessarily the case. Many Internet service providers
channel their users' HTTP requests through a small number of gateway
machines, and two users might simultaneously
visit the site from the same machine. Fortunately, such coincidences are too
uncommon to affect the data significantly; if necessary, however, more
accurate logs can be generated using cookies or
visitor-tracking software such as WebThreads.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... visitor,
- We consider
an entire day's page views to be one visit, even if a user made, for
example, one morning visit and one evening visit. This simplification does
not greatly affect the data; if necessary, however, the series of
page views could be divided at significant time gaps.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... cliques.
- Note that we place a maximum
size on discovered
clusters not only in the interest of performance but because large
clusters are not useful output -- we cannot, practically speaking,
create a new web page containing hundreds of links.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... site,
- http://machines.hyperreal.org
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
period.
- Data sets are publicly available from the authors.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.