Integrating web content clustering into web log association rule mining

Authors:
Jiayun Guo;Vlado Kešelj;Qigang Gao
Affiliations:
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada;Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada;Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Venue:
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Year:
2005

Citing 11
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The World-Wide Web: quagmire or gold mine?

Communications of the ACM
Data mining: concepts and techniques

Data mining: concepts and techniques
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
A vector space model for automatic indexing

Communications of the ACM
Mining the Web's Link Structure

Computer
Research Issues in Web Data Mining

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Integrating Web Usage and Content Mining for More Effective Personalization

EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
Data Mining of User Navigation Patterns

WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Web personalization integrating content semantics and navigational patterns

Proceedings of the 6th annual ACM international workshop on Web information and data management

Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests

Data & Knowledge Engineering
A web page usage prediction scheme using sequence indexing and clustering techniques

Data & Knowledge Engineering
Introducing semantics in web personalization: the role of ontologies

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
An efficient info-gain algorithm for finding frequent sequential traversal patterns from web logs based on dynamic weight constraint

Proceedings of the CUBE International Information Technology Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the effects of the general Internet growth is an immense number of user accesses to WWW resources These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining Current Web usage mining applications rely exclusively on the web server log files The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators It is demonstrated that novel and interesting association rules can be mined from the combined data source The rules can be used further in various applications, including Web user profiling and Web site construction We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better) On the other hand, word-based cluster profiles are easier to manually summarize Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.