Providing database-like access to the Web using queries based on textual similarity

Authors:
William W. Cohen
Affiliations:
AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ
Venue:
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Year:
1998

Citing 10
Cited 6

Automatic text processing

Automatic text processing
Probabilistic Datalog—a logic for powerful retrieval methods

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The distributed information search component (Disco) and the World Wide Web

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Answering recursive queries using views

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Regular path queries with constraints

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A Web-based information system that reasons with structured collections of text

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Query planning in infomaster

SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query Decomposition and View Maintenance for Query Languages for Unstructured Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems (TODS)
Query result ranking over e-commerce web databases

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Probabilistic ranking of database query results

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Unsupervised methods for determining object and relation synonyms on the web

Journal of Artificial Intelligence Research
Mining soft-matching rules from textual data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A propositional approach to textual case indexing

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most databases contain “name constants” like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. Here we assume instead that the names are given in natural language text. We then propose a logic for database integration called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. An implemented data integration system based on WHIRL has been used to successfully integrate information from several dozen Web sites in two domains.