Cut-and-Pick Transactions for Proxy Log Mining

  • Authors:
  • Wenwu Lou;Guimei Liu;Hongjun Lu;Qiang Yang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web logs collected by proxy servers, referred to as proxy logs or proxy traces, contain information about Web document accesses by many users against many Web sites. This "many-to-many" characteristic poses a challenge to Web log mining techniques due to the difficulty in identifying individual access transactions. This is because in a proxy log, user transactions are not clearly bounded and are sometimes interleaved with each other as well as with noise. Most previous work has used simplistic measures such as a fixed time interval as a determination method for the transaction boundaries, and has not addressed the problem of interleaving and noisy transactions. In this paper, we show that this simplistic view can lead to poor performance in building models to predict future access patterns. We present a more advanced cut-and-pick method for determining the access transactions from proxy logs, by deciding on more reasonable transaction boundaries and by removing noisy accesses. Our method takes advantage of the user behavior that in most transactions, the same user typically visits multiple, related Web sites that form clusters. These clusters can be discovered by our algorithm based on the connectivity among Web sites. By using real-world proxy logs, we experimentally show that this cut-and-pick method can produce more accurate transactions that result in Web-access prediction models with higher accuracy.