From user access patterns to dynamic hypertext linking
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge and Information Systems
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
SEWeP: using site semantics and a taxonomy to enhance the Web personalization process
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Advanced Data Preprocessing for Intersites Web Usage Mining
IEEE Intelligent Systems
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
An Efficient Technique for Mining Usage Profiles Using Relational Fuzzy Subtractive Clustering
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Discovering better navigation sequences for the session construction problem
Data & Knowledge Engineering
Hi-index | 0.00 |
Generalization of web sessions is an effective approach used to overcome two major challenges in web usage mining, namely quality and scalability. Given a concept hierarchy, such as a website, generalization replaces actual page-clicks with their general concepts, i.e., nodes at higher levels. Presently known methods do this by choosing a level in the hierarchy, below which all the nodes are generalized to nodes at this level. The problem with this is that significant items may be coalesced, and insignificant ones may be retained. We present a usage driven generalization algorithm, which coalesces less significant pages into more general ones, independent of their level in the hierarchy. Based on actual usage set of sessions, item significance is estimated approximately but fast, using a small stratified sample of the large dataset. While providing scalability, the proposed generalization technique results in improved efficiency and quality of the discovered usage model, demonstrated through numerous experiments in our work.