Conceptual Modeling of Data-Intensive Web Applications
IEEE Internet Computing
Indexing aids at corporate websites: the use of robots.txt and META Tags
Information Processing and Management: an International Journal
Communications of the ACM - ACM at sixty: a look back in time
A large-scale study of robots.txt
Proceedings of the 16th international conference on World Wide Web
Determining Bias to Search Engines from Robots.txt
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
BotSeer: An Automated Information System for Analyzing Web Robots
ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
Proceedings of the VLDB Endowment
VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Hi-index | 0.00 |
Understanding the availability of site metadata on the Web is a foundation for any system or application that wants to work with the pages published by Web sites, and also wants to understand a Web site's structure. There is little information available about how much information Web sites make available about themselves, and this paper presents data addressing this question. Based on this analysis of available Web site metadata, it is easier for Web-oriented applications to be based on statistical analysis rather than assumptions when relying on Web site metadata. Our study of robots.txt files and sitemaps can be used as a starting point for Web-oriented applications wishing to work with Web site metadata.