Cyc: toward programs with common sense
Communications of the ACM
Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems
Journal of the ACM (JACM)
Modern Information Retrieval
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Finding parts in very large corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
ACM SIGKDD Explorations Newsletter
Light-weight domain-based form assistant: querying web databases on the fly
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Merging Source Query Interfaces onWeb Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Meaningful labeling of integrated query interfaces
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Using Google distance to weight approximate ontology matches
Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Schema label normalization for improving schema matching
Data & Knowledge Engineering
Deep web integration with VisQI
Proceedings of the VLDB Endowment
Web Query Interface Parsing for Building Web-Based Metasearch Systems
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Athena: text mining based discovery of scientific workflows in disperse repositories
RED'10 Proceedings of the Third international conference on Resource Discovery
Automatic classification of web databases using domain-dictionaries
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.01 |
The goal of recent research projects on integrating Web databases has been to enable uniform access to the large amount of data behind query interfaces. Among the tasks addressed are: source discovery, query interface extraction, schema matching, etc. There are also a number of tasks that are commonly ignored or assumed to be apriori solved either manually or by some oracle. These tasks include (1) finding the set of stop words and (2) handling occurrences of "semantic enrichment words" within labels. These two subproblems have a direct impact on determining the synonymy and hyponymy relationships between labels. In (1), a word like "from" is a stop word in general but it is a content word in domains such as Airline and Real Estate. We formulate the stop word problem, prove its complexity and provide an approximation algorithm. In (2), we study the impact of words like AND and OR on establishing semantic relationships between labels (e.g. "departure date and time" is a hypernym of "departure date"). In addition, we develop a theoretical framework to differentiate synonymy relationship from hyponymy relationship among labels involving multiple words. We scrutinize its strength and limitations both analytically and experimentally. We use real data from the Web in our experiments. We analyze over 2300 labels of 220 user interfaces in 9 distinct domains.