Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Fast discovery of association rules
Advances in knowledge discovery and data mining
Using path profiles to predict HTTP requests
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Towards adaptive Web sites: conceptual framework and case study
Artificial Intelligence - Special issue on Intelligent internet systems
A prediction system for multimedia pre-fetching in Internet
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Personalizing web sites for mobile users
Proceedings of the 10th international conference on World Wide Web
Mining web logs for prediction models in WWW caching and prefetching
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining for Measuring and Improving the Success of Web Sites
Data Mining and Knowledge Discovery
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Finding Generalized Path Patterns for Web Log Data Mining
ADBIS-DASFAA '00 Proceedings of the East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems
An Heuristic to Capture Longer User Web Navigation Patterns
EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
Improving the Effectiveness of a Web Site with Web Usage Mining
WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Data Mining of User Navigation Patterns
WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Data mining for path traversal patterns in a web environment
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Mining longest repeating subsequences to predict world wide web surfing
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Efficient prediction of web accesses on a proxy server
Proceedings of the eleventh international conference on Information and knowledge management
Intelligent Support for Information Retrieval in WWW Environment
ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
A clustering-based prefetching scheme on a Web cache environment
Computers and Electrical Engineering
Hi-index | 0.00 |
Web logs collected by proxy servers, referred to as proxy logs or proxy traces, contain information about Web document accesses by many users against many Web sites. This "many-to-many" characteristic poses a challenge to Web log mining techniques due to the difficulty in identifying individual access transactions. This is because in a proxy log, user transactions are not clearly bounded and are sometimes interleaved with each other as well as with noise. Most previous work has used simplistic measures such as a fixed time interval as a determination method for the transaction boundaries, and has not addressed the problem of interleaving and noisy transactions. In this paper, we show that this simplistic view can lead to poor performance in building models to predict future access patterns. We present a more advanced cut-and-pick method for determining the access transactions from proxy logs, by deciding on more reasonable transaction boundaries and by removing noisy accesses. Our method takes advantage of the user behavior that in most transactions, the same user typically visits multiple, related Web sites that form clusters. These clusters can be discovered by our algorithm based on the connectivity among Web sites. By using real-world proxy logs, we experimentally show that this cut-and-pick method can produce more accurate transactions that result in Web-access prediction models with higher accuracy.