Inferring decision trees using the minimum description length principle
Information and Computation
Algorithms for multilevel logic optimization
Algorithms for multilevel logic optimization
Efficient identification of regular expressions from representative examples
COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Storing semistructured data with STORED
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Relational Databases for Querying XML Documents: Limitations and Opportunities
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
MDL learning of unions of simple pattern languages from positive examples
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Inductive Inference, DFAs, and Computational Complexity
AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Querying websites using compact skeletons
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Systems support for scalable data mining
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Induction of integrated view for XML data with heterogeneous DTDs
Proceedings of the tenth international conference on Information and knowledge management
Structural inference for semistructured data
Proceedings of the tenth international conference on Information and knowledge management
Schema extraction from XML collections
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
NeT & CoT: translating relational schemas to XML schemas using semantic constraints
Proceedings of the eleventh international conference on Information and knowledge management
SigDAQ: an enhanced XML query optimization technique
Journal of Systems and Software
PIPE: Web Personalization by Partial Evaluation
IEEE Internet Computing
Efficient extraction of schemas for XML documents
Information Processing Letters
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Generating Relations from XML Documents
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Storage and Retrieval of XML Data Using Relational Databases
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Evolving a Set of DTDs According to a Dynamic Set of XML Documents
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Matching an XML Document against a Set of DTDs
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
A two phase optimization technique for XML queries with multiple regular path expressions
Journal of Systems and Software
XML query processing using document type definitions
Journal of Systems and Software
Handbook of massive data sets
Techniques for the evaluation of XML queries: a survey
Data & Knowledge Engineering
Querying websites using compact skeletons
Journal of Computer and System Sciences - Special issu on PODS 2001
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
RE-tree: an efficient index structure for regular expressions
The VLDB Journal — The International Journal on Very Large Data Bases
Fine-grain web site structure discovery
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
A bag of paths model for measuring structural similarity in Web documents
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On the complexity of schema inference from web pages in the presence of nullable data attributes
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Information Systems - Special issue on web data integration
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
View inference for heterogeneous XML information integration
Journal of Intelligent Information Systems - Special issue on web intelligence
A programmable editor for developing structured documents based on bidirectional transformations
Proceedings of the 2004 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Twig query processing over graph-structured XML data
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
A partition index for XML and semi-structured data
Data & Knowledge Engineering
Indexing Hierarchical Structures Using Graph Spectra
IEEE Transactions on Pattern Analysis and Machine Intelligence
Effective structural inference for large XML documents
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The eShopmonitor: a comprehensive data extraction tool for monitoring web sites
IBM Journal of Research and Development
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
Interactive wrapper generation with minimal user effort
Proceedings of the 15th international conference on World Wide Web
A methodology for clustering XML documents by structure
Information Systems
A multidimensional scaling approach for representing XML documents
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Building automatic mapping between XML documents using approximate tree matching
Proceedings of the 2007 ACM symposium on Applied computing
RE-Tree: an efficient index structure for regular expressions
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient structural joins on indexed XML documents
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Measuring the structural similarity of semistructured documents using entropy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
From dirt to shovels: fully automatic tool generation from ad hoc data
Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Measuring the structural similarity among XML documents and DTDs
Journal of Intelligent Information Systems
OntoMiner: automated metadata and instance mining from news websites
International Journal of Web and Grid Services
A heuristic algorithm for clustering rooted ordered trees
Intelligent Data Analysis
MARS: A metamodel recovery system using grammar inference
Information and Software Technology
A programmable editor for developing structured documents based on bidirectional transformations
Higher-Order and Symbolic Computation
Ad Hoc Data and the Token Ambiguity Problem
PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
Towards inference of more realistic XSDs
Proceedings of the 2009 ACM symposium on Applied Computing
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Informatica
Learning string transformations from examples
Proceedings of the VLDB Endowment
A methodology for clustering XML documents by structure
Information Systems
Structural similarity between XML documents and DTDs
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
BioDIFF: an effective fast change detection algorithm for biological annotations
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Clustering XML documents based on structural similarity
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
On inference of XML schema with the knowledge of an obsolete one
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Metamodel evolution through metamodel inference
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Minimal tree language extensions: a keystone of XML type compatibility and evolution
ICTAC'10 Proceedings of the 7th International colloquium conference on Theoretical aspects of computing
Inferring meta-models for runtime system data from the clients of management APIs
MODELS'10 Proceedings of the 13th international conference on Model driven engineering languages and systems: Part II
Facility location problems: A parameterized view
Discrete Applied Mathematics
Efficient schema extraction from a large collection of XML documents
Proceedings of the 49th Annual Southeast Regional Conference
Automatic extraction rules generation based on XPath pattern learning
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Enabling information extraction by inference of regular expressions from sample entities
Proceedings of the 20th ACM international conference on Information and knowledge management
Computing compressed XML data from relational databases
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Sequential pattern mining for structure-based XML document classification
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Semantic partitioning of web pages
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Relaxing result accuracy for performance in publish/subscribe systems
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Fast approximate matching between XML documents and schemata
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Querying tree-structured data using dimension graphs
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Semantic integration of tree-structured data using dimension graphs
Journal on Data Semantics IV
Clustering XML documents using structural summaries
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Path bitmap indexing for retrieval of XML documents
MDAI'06 Proceedings of the Third international conference on Modeling Decisions for Artificial Intelligence
Finding optimal probabilistic generators for XML collections
Proceedings of the 15th International Conference on Database Theory
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
Improving recall of regular expressions for information extraction
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Discovering interesting information with advances in web technology
ACM SIGKDD Explorations Newsletter
Learning regular expressions to template-based FAQ retrieval systems
Knowledge-Based Systems
Conservative type extensions for XML data
Transactions on Large-Scale Data- and Knowledge-centered systems IX
Hi-index | 0.00 |
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus have a crucial role in the efficient storage of XML data, as well as the effective formulation and optimization of XML queries. In this paper, we propose XTRACT, a novel system for inferring a DTD schema for a database of XML documents. Since the DTD syntax incorporates the full expressive power of regular expressions, naive approaches typically fail to produce concise and intuitive DTDs. Instead, the XTRACT inference algorithms employ a sequence of sophisticated steps that involve: (1) finding patterns in the input sequences and replacing them with regular expressions to generate “general” candidate DTDs, (2) factoring candidate DTDs using adaptations of algorithms from the logic optimization literature, and (3) applying the Minimum Description Length (MDL) principle to find the best DTD among the candidates. The results of our experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.