30% accessible—a survey of the UK Wide Web
Selected papers from the sixth international conference on World Wide Web
Accessibility of information on the Web
intelligence
Comparative analysis of six XML schema languages
ACM SIGMOD Record
Algorithms and programming models for efficient representation of XML for Internet applications
Proceedings of the 10th international conference on World Wide Web
XML Bible
Queries and Computation on the Web
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Proceedings of the 27th International Conference on Very Large Data Bases
Metrics for XML Document Collections
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask (Extended Abstract)
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
DTDs versus XML schema: a practical study
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Accessibility of Internet websites through time
Assets '04 Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Expressiveness of XSDs: from practice to theory, there and back again
WWW '05 Proceedings of the 14th international conference on World Wide Web
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
Impact of XML schema evolution on valid documents
Proceedings of the 7th annual ACM international workshop on Web information and data management
Environmental Web Sites: An Empirical Investigation of Functionality and Accessibility
HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 06
Transforming web pages to become standard-compliant through reverse engineering
W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Expressiveness and complexity of XML Schema
ACM Transactions on Database Systems (TODS)
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
Overview and Framework for Data and Information Quality Research
Journal of Data and Information Quality (JDIQ)
Simplifying XML schema: single-type approximations of regular tree languages
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Which XML schemas admit 1-pass preorder typing?
ICDT'05 Proceedings of the 10th international conference on Database Theory
Web Semantics: Science, Services and Agents on the World Wide Web
Learning queries for relational, semi-structured, and graph databases
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
On repairing structural problems in semi-structured data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We collect evidence to answer the following question: Is the quality of the XML documents found on the web sufficient to apply XML technology like XQuery, XPath and XSLT? XML collections from the web have been previously studied statistically, but no detailed information about the quality of the XML documents on the web is available to date. We address this shortcoming in this study. We gathered 180K XML documents from the web. Their quality is surprisingly good; 85.4% is well-formed and 99.5% of all specified encodings is correct. Validity needs serious attention. Only 25% of all files contain a reference to a DTD or XSD, of which just one third is actually valid. Errors are studied in detail. Automatic error repair seems promising. Our study is well documented and easily repeatable. This paves the way for a periodic quality assessment of the XML web.