Information retrieval
A softbot-based interface to the Internet
Communications of the ACM
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Relevance Ranking Tuning for Similarity Queries on XML Data
Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
View inference for heterogeneous XML information integration
Journal of Intelligent Information Systems - Special issue on web intelligence
XML schema clustering with semantic and hierarchical similarity measures
Knowledge-Based Systems
A framework for integrating XML transformations
ER'06 Proceedings of the 25th international conference on Conceptual Modeling
Hi-index | 0.00 |
This paper proposes a novel approach to integrating heterogeneous XML DTDs. With this approach, an information agent can be easily extended to integrate heterogeneous XML-based contents and perform federated search. Based on a tree grammar inference technique, this approach derives an integrated view of XML DTDs in an information integration framework. The derivation takes advantages of naming and structural similarities among DTDs in similar domains. The complete approach consists of three main steps. (1) DTD clustering clusters DTDs in similar domains into classes. (2) Schema learning applies a tree grammar inference technique to generate a set of tree grammar rules from the DTDs in a class from the previous step. (3) Minimization optimizes the rules generated in the previous step and transforms them into an integrated view. We have implemented the proposed approach into a system called DEEP and tested the system on artificial and real domains. The experimental results reveal that this system can effectively and efficiently integrate radically different DTDs.