Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The World-Wide Web: quagmire or gold mine?
Communications of the ACM
Data mining: concepts and techniques
Data mining: concepts and techniques
ACM SIGKDD Explorations Newsletter
A vector space model for automatic indexing
Communications of the ACM
Mining the Web's Link Structure
Computer
Research Issues in Web Data Mining
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Integrating Web Usage and Content Mining for More Effective Personalization
EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
Data Mining of User Navigation Patterns
WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Web personalization integrating content semantics and navigational patterns
Proceedings of the 6th annual ACM international workshop on Web information and data management
A web page usage prediction scheme using sequence indexing and clustering techniques
Data & Knowledge Engineering
Introducing semantics in web personalization: the role of ontologies
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Proceedings of the CUBE International Information Technology Conference
Hi-index | 0.00 |
One of the effects of the general Internet growth is an immense number of user accesses to WWW resources These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining Current Web usage mining applications rely exclusively on the web server log files The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators It is demonstrated that novel and interesting association rules can be mined from the combined data source The rules can be used further in various applications, including Web user profiling and Web site construction We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better) On the other hand, word-based cluster profiles are easier to manually summarize Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.