Web Site Metadata

Authors:
Erik Wilde;Anuradha Roy
Affiliations:
School of Information, UC Berkeley,;School of Information, UC Berkeley,
Venue:
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Year:
2009

Citing 8
Cited 0

Conceptual Modeling of Data-Intensive Web Applications

IEEE Internet Computing
Indexing aids at corporate websites: the use of robots.txt and META Tags

Information Processing and Management: an International Journal
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time
A large-scale study of robots.txt

Proceedings of the 16th international conference on World Wide Web
Determining Bias to Search Engines from Robots.txt

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
BotSeer: An Automated Information System for Analyzing Web Robots

ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
Google's Deep Web crawl

Proceedings of the VLDB Endowment
Web engineering revisited

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the availability of site metadata on the Web is a foundation for any system or application that wants to work with the pages published by Web sites, and also wants to understand a Web site's structure. There is little information available about how much information Web sites make available about themselves, and this paper presents data addressing this question. Based on this analysis of available Web site metadata, it is easier for Web-oriented applications to be based on statistical analysis rather than assumptions when relying on Web site metadata. Our study of robots.txt files and sitemaps can be used as a starting point for Web-oriented applications wishing to work with Web site metadata.