Automatic text processing
Probabilistic Datalog—a logic for powerful retrieval methods
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
InfoSleuth: agent-based semantic integration of information in open and dynamic environments
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Infomaster: an information integration system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The distributed information search component (Disco) and the World Wide Web
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A first course in database systems
A first course in database systems
Regular path queries with constraints
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Araneus Web-based management system
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
User-oriented smart-cache for the Web: what you seek is what you get!
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Management of Probabilistic Data
IEEE Transactions on Knowledge and Data Engineering
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query Decomposition and View Maintenance for Query Languages for Unstructured Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Searching web databases by structuring keyword-based queries
Proceedings of the eleventh international conference on Information and knowledge management
Integrating Information Visualization and Retrieval for Discovering Internet Sources
DS '00 Proceedings of the Third International Conference on Discovery Science
Integrating information visualization and retrieval for WWW information discovery
Theoretical Computer Science
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Keyword-based queries over web databases
Effective databases for text & document management
A Bayesian network approach to searching Web databases through keyword-based queries
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
Hi-index | 0.00 |
The degree to which information sources are pre-processed by Web-based information systems varies greatly. In search engines like Altavista, little pre-processing is done, while in “knowledge integration” systems, complex site-specific “wrappers” are used to integrate different information sources into a common database representation. In this paper we describe an intermediate point between these two models. In our system, information sources are converted into a highly structured collection of small fragments of text. Database-like queries to this structured collection of text fragments are approximated using a novel logic called WHIRL, which combines inference in the style of deductive databases with ranked retrieval methods from information retrieval (IR). WHIRL allows queries that integrate information from multiple Web sites, without requiring the extraction and normalization of object identifiers that can be used as keys; instead, operations that in conventional databases require equality tests on keys are approximated using IR similarity metrics for text. This leads to a reduction in the amount of human engineering required to field a knowledge integration system. Experimental evidence is given showing that many information sources can be easily modeled with WHIRL, and that inferences in the logic are both accurate and efficient.