The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Views in a large-scale XML repository
The VLDB Journal — The International Journal on Very Large Data Bases
Queries and Computation on the Web
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Data Exchange: Semantics and Query Answering
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Incremental Validation of XML Documents
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Proceedings of the 27th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Answering XML Queries on Heterogeneous Data Sources
Proceedings of the 27th International Conference on Very Large Data Bases
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient Incremental Validation of XML Documents
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Fine-grained, structured configuration management for web projects
Proceedings of the 13th international conference on World Wide Web
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Web Searching and Information Retrieval
Computing in Science and Engineering
Dynamically growing hypertext collections
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
Graph transformation to infer schemata from XML documents
Proceedings of the 2005 ACM symposium on Applied computing
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Crimson: a data management system to support evaluating phylogenetic tree reconstruction algorithms
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Node labeling schemes for dynamic XML documents reconsidered
Data & Knowledge Engineering
An efficient infrastructure for native transactional XML processing
Data & Knowledge Engineering
Toward microbenchmarking XQuery
Information Systems
Structured materialized views for XML queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An efficient mechanism for matching multiple patterns with streamed XML data
SE'07 Proceedings of the 25th conference on IASTED International Multi-Conference: Software Engineering
Measuring the structural similarity among XML documents and DTDs
Journal of Intelligent Information Systems
Efficient processing of branch queries for high-performance XML filtering
Proceedings of the 2nd international conference on Scalable information systems
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
Efficient Parallel Tree Reductions on Distributed Memory Environments
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Discovering XML keys and foreign keys in queries
Proceedings of the 2009 ACM symposium on Applied Computing
Towards inference of more realistic XSDs
Proceedings of the 2009 ACM symposium on Applied Computing
Simplifying XML schema: effortless handling of nondeterministic regular expressions
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Inference of concise regular expressions and DTDs
ACM Transactions on Database Systems (TODS)
An XML publish/subscribe algorithm implemented by relational operators
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
XFlab: a technique of query processing over XML fragment stream
BNCOD'07 Proceedings of the 24th British national conference on Databases
A unified conflict resolution algorithm
SDM'07 Proceedings of the 4th VLDB conference on Secure data management
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Storage techniques for multi-versioned XML documents
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
A weighted common structure based clustering technique for XML documents
Journal of Systems and Software
Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
XML: some papers in a haystack
ACM SIGMOD Record
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
ACM Transactions on the Web (TWEB)
On inference of XML schema with the knowledge of an obsolete one
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Analyzer: a framework for file analysis
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Proceedings of the 20th ACM international conference on Information and knowledge management
SIRIUS: a lightweight XML indexing and approximate search system at INEX 2005
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Clustering and retrieval of XML documents by structure
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Maintaining versions of dynamic XML documents
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Implementation of XPath axes in the multi-dimensional approach to indexing XML data
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
What's next in XML and databases?
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
An improved prefix labeling scheme: a binary string approach for dynamic ordered XML
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
An extended preorder index for optimising XPath expressions
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
MemBeR: a micro-benchmark repository for XQuery
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Hybrid authorizations and conflict resolution
SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
FXProj: a fuzzy XML documents projected clustering based on structure and content
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Topological XML data cube construction
International Journal of Web Engineering and Technology
Hi-index | 0.00 |
Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of interest on XML both in industry and in academia. Nevertheless, to date no comprehensive study on the XML Web (i.e., the subset of the Web made of XML documents only) nor on its contents has been made. This paper is the first attempt at describing the XML Web and the documents contained in it. Our results are drawn from a sample of a repository of the publicly available XML documents on the Web, consisting of about 200,000 documents. Our results show that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically. Also, our results about the contents of the XML Web provide valuable input for the design of algorithms, tools and systems that use XML in one form or another.