30% accessible—a survey of the UK Wide Web
Selected papers from the sixth international conference on World Wide Web
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
DTD inference for views of XML data
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Queries and Computation on the Web
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Proceedings of the 27th International Conference on Very Large Data Bases
Metrics for XML Document Collections
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask (Extended Abstract)
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
DTDs versus XML schema: a practical study
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Expressiveness of XSDs: from practice to theory, there and back again
WWW '05 Proceedings of the 14th international conference on World Wide Web
Environmental Web Sites: An Empirical Investigation of Functionality and Accessibility
HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 06
Transforming web pages to become standard-compliant through reverse engineering
W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Expressiveness and complexity of XML Schema
ACM Transactions on Database Systems (TODS)
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
Web Semantics: Science, Services and Agents on the World Wide Web
Overview and Framework for Data and Information Quality Research
Journal of Data and Information Quality (JDIQ)
Proceedings of the 18th international conference on World wide web
DBpedia - A crystallization point for the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
Web Semantics: Science, Services and Agents on the World Wide Web
Simplifying XML schema: single-type approximations of regular tree languages
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 20th ACM international conference on Information and knowledge management
Which XML schemas admit 1-pass preorder typing?
ICDT'05 Proceedings of the 10th international conference on Database Theory
Hi-index | 0.00 |
We collect evidence to answer the following question: Is the quality of the XML documents found on the Web sufficient to apply XML technology like XQuery, XPath and XSLT? XML collections from the Web have been previously studied statistically, but no detailed information about the quality of the XML documents on the Web is available to date. We address this shortcoming in this study. We gathered 180K XML documents from the Web. Their quality is surprisingly good; 85.4% are well-formed and 99.5% of all specified encodings is correct. Validity needs serious attention. Only 25% of all files contain a reference to a DTD or XSD, of which just one-third are actually valid. Well-formedness errors and validity errors are studied in detail. Our study is well-documented, easily repeatable and all data is publicly available [21], (Grijzenhout, 2010) [52]. This paves the way for a periodic quality assessment of the XML Web.