Integrating web content clustering into web log association rule mining

  • Authors:
  • Jiayun Guo;Vlado Kešelj;Qigang Gao

  • Affiliations:
  • Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada;Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada;Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada

  • Venue:
  • AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the effects of the general Internet growth is an immense number of user accesses to WWW resources These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining Current Web usage mining applications rely exclusively on the web server log files The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators It is demonstrated that novel and interesting association rules can be mined from the combined data source The rules can be used further in various applications, including Web user profiling and Web site construction We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better) On the other hand, word-based cluster profiles are easier to manually summarize Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.