Expressive and flexible access to web-extracted data: a keyword-based structured query language

Authors:
Jeffrey Pound;Ihab F. Ilyas;Grant Weddell
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 19
Cited 14

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Determining Semantic Similarity among Entity Classes from Different Ontologies

IEEE Transactions on Knowledge and Data Engineering
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Semantic search

WWW '03 Proceedings of the 12th international conference on World Wide Web
The description logic handbook: theory, implementation, and applications

The description logic handbook: theory, implementation, and applications
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval

IEEE Transactions on Knowledge and Data Engineering
Measuring semantic similarity between Gene Ontology terms

Data & Knowledge Engineering
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
ESTER: efficient search on text, entities, and relations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing SQL Queries over Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
XOntoRank: Ontology-Aware Search of Electronic Medical Records

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Keyword++: a framework to improve keyword search over entity databases

Proceedings of the VLDB Endowment
QUICK: expressive and flexible search over knowledge bases and text collections

Proceedings of the VLDB Endowment
Keyword-based, context-aware selection of natural language query patterns

Proceedings of the 14th International Conference on Extending Database Technology
S3K: seeking statement-supporting top-K witnesses
Language models for keyword search over data graphs

Proceedings of the fifth ACM international conference on Web search and data mining
Semantic relevance ranking for XML keyword search

Information Sciences: an International Journal
Deep answers for naturally asked questions on the web of data

Proceedings of the 21st international conference companion on World Wide Web
Relevance feedback between web search and the semantic web

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Answering table queries on the web using column keywords

Proceedings of the VLDB Endowment
Natural language questions for the web of data

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Interpreting keyword queries over web knowledge bases

Proceedings of the 21st ACM international conference on Information and knowledge management
Cross-language hybrid keyword and semantic search

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Learning joint query interpretation and response ranking

Proceedings of the 22nd international conference on World Wide Web
Robust question answering over the web of linked data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer size of schema information available to the user. We address this challenge by proposing a new query language that blends keyword search with structured query processing over large information graphs with rich semantics. Our formalism for structured queries based on keywords combines the flexibility of keyword search with the expressiveness of structures queries. We propose a solution to the resulting disambiguation problem caused by introducing keywords as primitives in a structured query language. We show how expressions in our proposed language can be rewritten using the vocabulary of the web-extracted KB, and how different possible rewritings can be ranked based on their syntactic relationship to the keywords in the query as well as their semantic coherence in the underlying KB. An extensive experimental study demonstrates the efficiency and effectiveness of our approach. Additionally, we show how our query language fits into QUICK, an end-to-end information system that integrates web-extracted data graphs with full-text search. In this system, the rewritten query describes an arbitrary topic of interest for which corresponding entities, and documents relevant to the entities, are efficiently retrieved.