webSPADE: A Parallel Sequence Mining Algorithm to Analyze Web Log Data

  • Authors:
  • Ayhan Demiriz

  • Affiliations:
  • -

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Enterprise-class web sites receive a large amountof traffic, from both registered and anonymous users.Data warehouses are built to store and help analyze the click streams within this traffic to providecompanies with valuable insights into the behaviorof their customers. This article proposes a parallelsequence mining algorithm, webSPADE, to analyzethe click streams found in site web logs. In this process, raw web logs are first cleaned and inserted intoa data warehouse. The click streams are then minedby webSPADE. An innovative web-based front-endis used to visualize and query the sequence miningresults. The webSPADE algorithm is currently usedby Verizon to analyze the daily traffic of the Verizon.com web site.