The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Validating streaming XML documents
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Views in a large-scale XML repository
The VLDB Journal — The International Journal on Very Large Data Bases
Queries and Computation on the Web
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Incremental Validation of XML Documents
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Relational Databases for Querying XML Documents: Limitations and Opportunities
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Answering XML Queries on Heterogeneous Data Sources
Proceedings of the 27th International Conference on Very Large Data Bases
Xyleme: A Dynamic Warehouse for XML Data of the Web
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
The VLDB Journal — The International Journal on Very Large Data Bases
Anatomy of a native XML base management system
The VLDB Journal — The International Journal on Very Large Data Bases
E-services: a look behind the curtain
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
WWW '03 Proceedings of the 12th international conference on World Wide Web
From XML Schema to Relations: A Cost-Based Approach to XML Storage
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Incremental Validation of XML Documents
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
DTDs versus XML schema: a practical study
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
SchemaScope: a system for inferring and cleaning XML schemas
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the Synthetic Generation of Semantic Web Schemas
Semantic Web, Ontologies and Databases
Teaching XML data type definition: a visual method
Journal of Computing Sciences in Colleges
An X-ray on web-available XML schemas
ACM SIGMOD Record
Simplifying XML schema: effortless handling of nondeterministic regular expressions
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Simplifying XML schema: single-type approximations of regular tree languages
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
ACM Transactions on the Web (TWEB)
Compact ancestry labeling schemes for XML trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
SIAM Journal on Computing
XEvolve: an XML schema evolution framework
Proceedings of the 2011 ACM Symposium on Applied Computing
Analysing complexity of XML schemas in geospatial web services
Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications
Proceedings of the 20th ACM international conference on Information and knowledge management
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
Web Semantics: Science, Services and Agents on the World Wide Web
Simplifying XML Schema: Single-type approximations of regular tree languages
Journal of Computer and System Sciences
Hi-index | 0.00 |
XML has emerged as the language for exchanging data on the web and has attracted considerable interest both in industry and in academia. Nevertheless, to date, little is known about the XML documents published on the web. This paper presents a comprehensive analysis of a sample of about 200,000 XML documents on the web, and is the first study of its kind. We study the distribution of XML documents across the web in several ways; moreover, we provided a detailed characterization of the structure of real XML documents. Our results provide valuable input to the design of algorithms, tools and systems that use XML in one form or another.