** Next:** Experimental Validation
** Up:** A Case Study: Index
** Previous:** Time Complexity

##

Work Related to PageGather

PageGather addresses the problem of finding sets of items based
on their access patterns. Previous work in both clustering and
data mining can also be applied to this problem.
As discussed above, cluster mining is a variation on traditional
clustering. PageGather uses a graph-based clustering component
which is specialized to cluster mining, but it is possible to
adapt traditional clustering algorithms to this problem.
In [15] and follow-up work, we compared
PageGather's clustering component (both *PG*_{CLIQUE} and *PG*_{CC} variants)
to two standard algorithms: K-means clustering [18]
and hierarchical agglomerative clustering (HAC)[22].
There are literally hundreds of clustering algorithms and variations thereof.
We chose K-means as it is particularly fast, and HAC as it is widely used.
We found that PageGather's clustering component was faster
than both HAC and K-means and found higher-quality clusters.

Frequent set algorithms are designed to find sets of similar items
in large collections (see, for example,
[1,2,19,21]).
We therefore compare PageGather to the standard *Apriori*
algorithm for finding frequent sets (see [3]).
In a traditional
frequent set problem, the data is a collection of *market basket*
information. Each basket contains a set of items, and the goal is to find
sets of items that appear together in many baskets.
In our problem domain, a user visit corresponds to a market basket, and
the set of pages visited by the user corresponds to the set of items in
the basket.
In section 3, we compare the performance of
PageGather to that of Apriori on our test data.

** Next:** Experimental Validation
** Up:** A Case Study: Index
** Previous:** Time Complexity
*Mike Perkowitz*

*1999-03-02*