Do we mean the same?: disambiguation of extracted keyword queries for database search

Authors:
Elena Demidova;Irina Oelze;Peter Fankhauser
Affiliations:
L3S Research Center, Hannover, Germany;L3S Research Center, Hannover, Germany;L3S Research Center, Hannover, Germany
Venue:
Proceedings of the First International Workshop on Keyword Search on Structured Data
Year:
2009

Citing 18
Cited 2

Highlights: language- and domain-independent automatic indexing terms for abstracting

Journal of the American Society for Information Science
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic integration in text: from ambiguous names to identifiable entities

AI Magazine - Special issue on semantic integration
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
Integrating Unstructured Data into Relational Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Avatar semantic search: a database approach to information retrieval

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficiently linking text documents with relevant structured information

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Spark: top-k keyword query in relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards keyword-driven analytical processing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SQAK: doing more with keywords

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval

Introduction to Information Retrieval
Information Extraction

Foundations and Trends in Databases
Ontology-based interpretation of keywords for semantic search

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
SPARK: adapting keyword query to semantic search

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

Multi-dimensional keyword-based image annotation and search

Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
SODA: generating SQL for business users

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users often try to accumulate information on a topic of interest from multiple information sources. In this case a user's informational need might be expressed in terms of an available relevant document, e.g. a web-page or an e-mail attachment, rather than a query. Database search engines are mostly adapted to the queries manually created by the users. In case a user's informational need is expressed in terms of a document, we need algorithms that map keyword queries automatically extracted from this document to the database content. In this paper we analyze the impact of selected document and database statistics on the effectiveness of keyword disambiguation for manually created as well as automatically extracted keyword queries. Our evaluation is performed using a set of user queries from the AOL query log and a set of queries automatically extracted from Wikipedia articles both executed against the Internet Movie Database (IMDB). Our experimental results show that (1) knowledge of the document context is crucial in order to extract meaningful keyword queries; (2) statistics which enable effective disambiguation of user queries are not sufficient to achieve the same quality for the automatically extracted requests.