Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
A softbot-based interface to the Internet
Communications of the ACM
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Induction of integrated view for XML data with heterogeneous DTDs
Proceedings of the tenth international conference on Information and knowledge management
Planning to gather inforrnation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Unification of XML DTD for XML documents with similar structure
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III
Hi-index | 0.02 |
This paper proposes a novel approach to integrating heterogeneous XML DTDs. With this approach, an information agent can be easily extended to integrate heterogeneous XML-based contents and perform federated search. Based on a tree grammar inference technique, this approach derives an integrated view of XML DTDs in an information integration framework. The derivation takes advantages of naming and structural similarities among DTDs in similar domains. The complete approach consists of three main steps. (1) DTD clustering clusters DTDs in similar domains into classes. (2) Schema learner applies a tree grammar inference technique to generate a set of tree grammar rules from the DTDs in a class from the previous step. (3) Minimizer optimizes the rules generated in the previous step, transforms them into an integrated view, and generates source descriptions. We have implemented the approach into a system called DEEP and tested the system on several domains. Experimental results reveal that this system can effectively and efficiently integrate radically different DTDs.