Characterizing browsing strategies in the World-Wide Web
Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Fast discovery of association rules
Advances in knowledge discovery and data mining
Revisitation patterns in World Wide Web navigation
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
In search of reliable usage data on the WWW
Selected papers from the sixth international conference on World Wide Web
Adaptive Web sites: automatically synthesizing Web pages
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Prediction of Web Page Accesses by Proxy Server Log
World Wide Web
Efficient Data Mining for Path Traversal Patterns
IEEE Transactions on Knowledge and Data Engineering
WUM - A Tool for WWW Ulitization Analysis
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
User-Driven Navigation Pattern Discovery from Internet Data
WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Data Mining of User Navigation Patterns
WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Analysis of navigation behaviour in web sites integrating multiple information systems
The VLDB Journal — The International Journal on Very Large Data Bases
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs
ADL '98 Proceedings of the Advances in Digital Libraries Conference
Knowledge discovery from users Web-page navigation
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Validation and interpretation of Web users' sessions clusters
Information Processing and Management: an International Journal
Filtering of web recommendation lists using positive and negative usage patterns
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Model-Based cluster analysis for web users sessions
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
An overview of web data clustering practices
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
UP-DRES: user profiling for a dynamic REcommendation system
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
A novel model for user clicks identification based on hidden semi-Markov
Journal of Network and Computer Applications
Hi-index | 0.00 |
Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of some proposed structures. We present theoretical analysis of the algorithms and prove that both algorithms have optimal time complexity and certain error-tolerant properties as well. We conduct empirical performance analysis of the algorithms with web logs ranging from 100 megabytes to 500 megabytes. The empirical analysis shows that the algorithms just take several seconds more than the baseline time, i.e., the time needed for reading the web log once sequentially from disk to RAM, testing whether each user access record is valid or not, and writing each valid user access record back to disk. The empirical analysis also shows that our algorithms are substantially faster than the sorting based session finding algorithms. Finally, optimal algorithms for finding user access sessions from distributed web logs are also presented.