World Wide Web Journal - Special issue on XML: principles, tools, and techniques
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ICDT '97 Proceedings of the 6th International Conference on Database Theory
The VLDB Journal — The International Journal on Very Large Data Bases
Information Systems - Special issue on web data integration
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Report on the XML Mining Track at INEX 2005 and INEX 2006
Comparative Evaluation of XML Information Retrieval Systems
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast approximate matching between XML documents and schemata
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Learning regular expressions from noisy sequences
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Hi-index | 0.00 |
XML is becoming the standard format for data exchange on the Internet. In this paper, we present a system that is effective in extracting schema information from a large collection of XML documents. Based on Xtract, we propose using the cost of an NFA simulation to compute the Minimum Length Description. We also studied using frequencies of the sample inputs to improve the effectiveness of the schema extraction. Experimental studies were conducted on synthesized XML data sets, suggesting the efficiency and effectiveness of our approach as a solution for schema inference.