next up previous
Next: Work Related to PageGather Up: A Case Study: Index Previous: The PageGather Algorithm

Time Complexity

What is the running time of the PageGather algorithm? We summarize here; a more complete analysis is found in [15]. Let L be the number of page views in the log and N the number of pages at the site. In step (1), we must group the page views by their originating machine. We do this by sorting page views by origin and time, which requires O(L log L) time. In step (2), we must process the log and create a matrix of size O(N2), which requires O(L + N2) time. Note that we implement our algorithm so that we can create the matrix from a collection of logs and use it repeatedly to generate candidate clusters. As we bound the size of discovered clusters, step (3) is polynomial in N.

Mike Perkowitz