Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs

Authors:
Zhixiang Chen;Ada Wai-Chee Fu;Frank Chi-Hung Tong
Affiliations:
Department of Computer Science, University of Texas-Pan American, USA chen@cs.panam.edu;Department of Computer Science, Chinese University of Hong Kong, Hong Kong adafu@cse.cuhk.edu.hk;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong ftong@eti.hku.hk
Venue:
World Wide Web
Year:
2003

Citing 17
Cited 6

Characterizing browsing strategies in the World-Wide Web

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Silk from a sow's ear: extracting usable structures from the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Fast discovery of association rules

Advances in knowledge discovery and data mining
Revisitation patterns in World Wide Web navigation

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
In search of reliable usage data on the WWW

Selected papers from the sixth international conference on World Wide Web
Adaptive Web sites: automatically synthesizing Web pages

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Discovering Internet marketing intelligence through online analytical web usage mining

ACM SIGMOD Record
Integrating Web Prefetching and Caching Using Prediction Models

World Wide Web
Prediction of Web Page Accesses by Proxy Server Log

World Wide Web
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
WUM - A Tool for WWW Ulitization Analysis

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
User-Driven Navigation Pattern Discovery from Internet Data

WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Data Mining of User Navigation Patterns

WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Analysis of navigation behaviour in web sites integrating multiple information systems

The VLDB Journal — The International Journal on Very Large Data Bases
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs

ADL '98 Proceedings of the Advances in Digital Libraries Conference
Knowledge discovery from users Web-page navigation

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence

Validation and interpretation of Web users' sessions clusters

Information Processing and Management: an International Journal
Filtering of web recommendation lists using positive and negative usage patterns

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Model-Based cluster analysis for web users sessions

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
An overview of web data clustering practices

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
UP-DRES: user profiling for a dynamic REcommendation system

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
A novel model for user clicks identification based on hidden semi-Markov

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of some proposed structures. We present theoretical analysis of the algorithms and prove that both algorithms have optimal time complexity and certain error-tolerant properties as well. We conduct empirical performance analysis of the algorithms with web logs ranging from 100 megabytes to 500 megabytes. The empirical analysis shows that the algorithms just take several seconds more than the baseline time, i.e., the time needed for reading the web log once sequentially from disk to RAM, testing whether each user access record is valid or not, and writing each valid user access record back to disk. The empirical analysis also shows that our algorithms are substantially faster than the sorting based session finding algorithms. Finally, optimal algorithms for finding user access sessions from distributed web logs are also presented.