An effective system for mining web log

  • Authors:
  • Zhenglu Yang;Yitong Wang;Masaru Kitsuregawa

  • Affiliations:
  • Institute of Industrial Science, The University of Tokyo, Tokyo, Japan;Institute of Industrial Science, The University of Tokyo, Tokyo, Japan;Institute of Industrial Science, The University of Tokyo, Tokyo, Japan

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The WWW provides a simple yet effective media for users to search, browse, and retrieve information in the Web. Web log mining is a promising tool to study user behaviors, which could further benefit web-site designers with better organization and services. Although there are many existing systems that can be used to analyze the traversal path of web-site visitors, their performance is still far from satisfactory. In this paper, we propose our effective Web log mining system consists of data preprocessing, sequential pattern mining and visualization. In particular, we propose an efficient sequential mining algorithm (LAPIN_WEB: LAst Position INduction for WEB log), an extension of previous LAPIN algorithm to extract user access patterns from traversal path in Web logs. Our experimental results and performance studies demonstrate that LAPIN_WEB is very efficient and outperforms well-known PrefixSpan by up to an order of magnitude on real Web log datasets. Moreover, we also implement a visualization tool to help interpret mining results as well as predict users’ future requests.