Characterizing browsing strategies in the World-Wide Web
Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
Using terminological feedback for web search refinement: a log-based study
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis
INFORMS Journal on Computing
Advanced Data Preprocessing for Intersites Web Usage Mining
IEEE Intelligent Systems
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
LODAP: a log data preprocessor for mining web browsing patterns
AIKED'07 Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases - Volume 6
Web robot detection: A probabilistic reasoning approach
Computer Networks: The International Journal of Computer and Telecommunications Networking
An investigation of web crawler behavior: characterization and metrics
Computer Communications
How are we searching the World Wide Web? A comparison of nine search engine transaction logs
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
A characterization of online browsing behavior
Proceedings of the 19th international conference on World wide web
Web robot detection techniques: overview and limitations
Data Mining and Knowledge Discovery
Preprocessing the web server logs: an illustrative approach for effective usage mining
ACM SIGSOFT Software Engineering Notes
Language intent models for inferring user browsing behavior
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A Web Crawler Detection Algorithm Based on Web Page Member List
IHMSC '12 Proceedings of the 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 01
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
ArcLink: optimization techniques to build and retrieve the temporal web graph
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Although user access patterns on the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Based on samples from the Internet Archive's public Wayback Machine, we propose a set of basic usage patterns: Dip (a single access), Slide (the same page at different archive times), Dive (different pages at approximately the same archive time), and Skim (lists of what pages are archived, i.e., TimeMaps). Robots are limited almost exclusively to Dips and Skims, but human accesses are more varied between all four types. Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of megabytes transferred. Robots almost always access TimeMaps (95% of accesses), but humans predominately access the archived web pages themselves (82% of accesses). In terms of unique archived web pages, there is no overall preference for a particular time, but the recent past (within the last year) shows significant repeat accesses.