Our survey of other work in this area has led us to formulate five desiderata for an adaptive web site.
We took the above desiderata as constraints on our approach to creating adaptive web sites. We use transformation rather than customization, both to avoid confronting visitors with questionnaires and to facilitate the sharing of site improvements for a wide range of visitors. We focus on an access-based approach, as automatic understanding of free text is difficult. We do not assume any annotations on Web pages beyond HTML. For safety, we limit ourselves to nondestructive transformations: changes to the site that leave existing structure intact. We may add links but not remove them, create pages but not destroy them, add new structures but not scramble existing ones. Finally, we restrict ourselves to generating candidate adaptations and presenting them to the human webmaster -- any non-trivial changes to the web site are under webmaster control.
|
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:05 -0800] "GET /home/jones/collectors.html HTTP/1.0" 200 13119 24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/jones/madewithmac.gif HTTP/1.0" 200 855 cs106-14.u.washington.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/chinn/ HTTP/1.0" 200 1896 24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/jones/gustop2.gif HTTP/1.0" 200 25460 x67-122.ejack.umn.edu - - [21/Nov/1996:00:01:08 -0800] "GET /home/rich/aircrafts.html HTTP/1.0" 404 617 x67-122.ejack.umn.edu - - [21/Nov/1996:00:01:08 -0800] "GET /general/info.gif HTTP/1.0" 200 331 203.147.0.10 - - [21/Nov/1996:00:01:09 -0800] "GET /home/smith/kitty.html HTTP/1.0" 200 5160 24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:10 -0800] "GET /home/jones/thumbnails/awing-bo.gif HTTP/1.0" 200 5117 |
The main source of information we rely on is the site's web server log, which records the pages visited by a user at the site. Our underlying intuition is what we call the visit-coherence assumption: the pages a user visits during one interaction with the site tend to be conceptually related. We do not assume that all pages in a single visit are related. After all, the information we glean from individual visits is noisy; for example, a visitor may pursue multiple distinct tasks in a single visit. However, if a large number of visitors continue to visit and re-visit the same set of pages, that provides strong evidence that the pages in the set are related. Thus, we accumulate statistics over many visits by numerous visitors and search for overall trends.
It is not difficult to devise a number of simple, non-destructive transformations that could improve a site; we describe several in [14]. Examples include highlighting popular links, promoting popular links to the top of a page or to the site's front page, and linking together pages that seem to be related. We have implemented one such transformation: shortcutting, in which we attempt to provide links on each page to visitors' eventual goals, thus skipping the in-between pages. As reported in [13], we found a significant number of visitors used these automatic shortcuts.
However, our long-term goal is to demonstrate that more fundamental
adaptations are feasible. An example of this is change in view,
where a site could offer an alternative organization of its contents
based on user access patterns. Consider, for example, the Music
Machines web site,
which has
been our primary testbed, as it is maintained by one of the authors,
and we have full access to all documents and access logs. Music
Machines is devoted to information about various kinds of electronic
musical instruments. Most of the data at the site is organized by the
manufacturer of the instrument and the particular model number. That
is, there is a page for the manufacturer Roland and, on that
page, links to pages for each instrument Roland produces. However,
imagine a visitor to the site who is interested in a comprehensive
overview of all the keyboards available from various manufacturers.
She would have to first visit the Roland page and look at each of the
Roland keyboards, then visit each of the other keyboard manufacturers for
its offerings as well. Now, imagine if the site repeatedly observed
this kind of behavior and automatically created a new web page
containing all the
links to all the keyboards. Now our visitor need only visit this new
page rather than search for all the keyboards. This page represents a
change in view, from the former ``manufacturer-centric'' organization
to one based on type of instrument. If we can discover these user
access patterns and create new web pages to facilitate them, we
should in theory be able to create new views of the site.