Keyword query cleaning

Authors:
Ken Q. Pu;Xiaohui Yu
Affiliations:
University of Ontario Inst. of Technology;York University
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 15
Cited 16

WordNet: a lexical database for English

Communications of the ACM
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modern Information Retrieval

Modern Information Retrieval
DBXplorer: enabling keyword search over relational databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate String-Matching over Suffix Trees

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A compression-based algorithm for Chinese word segmentation

Computational Linguistics
Keyword Proximity Search in XML Trees

IEEE Transactions on Knowledge and Data Engineering
Integrating Unstructured Data into Relational Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Spark: top-k keyword query in relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Keyword search on relational data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards keyword-driven analytical processing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Query segmentation using conditional random fields

Proceedings of the First International Workshop on Keyword Search on Structured Data
XML keyword query refinement

Proceedings of the First International Workshop on Keyword Search on Structured Data
Keyword search on structured and semi-structured data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Structured annotations of web queries

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online annotation of text streams with structured entities

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Helix: online enterprise data analytics

Proceedings of the 20th international conference companion on World wide web
View-based model-driven architecture for enhancing maintainability of data access services

Data & Knowledge Engineering
Matching unstructured product offers to structured product specifications

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Keyword query cleaning with query logs

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Spelling suggestion for XML keyword search based on pairwise keyword summaries

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A distance-based spelling suggestion method for XML keyword search

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Normalised LCS-based method for indexing multidimensional data cube

International Journal of Intelligent Information and Database Systems
Exploiting structures in keyword queries for effective XML search

Information Sciences: an International Journal
Question answering on interlinked data

Proceedings of the 22nd international conference on World Wide Web
Efficient parsing-based search over structured data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Probabilistic query rewriting for efficient and effective keyword search on graph data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task. In this paper, we introduce the problem of query cleaning for keyword search queries in a database context and propose a set of effective and efficient solutions. Query cleaning involves semantic linkage and spelling corrections of database relevant query words, followed by segmentation of nearby query words such that each segment corresponds to a high quality data term. We define a quality metric of a keyword query, and propose a number of algorithms for cleaning keyword queries optimally. It is demonstrated that the basic optimal query cleaning problem can be solved using a dynamic programming algorithm. We further extend the basic algorithm to address incremental query cleaning and top-k optimal query cleaning. The incremental query cleaning is efficient and memory-bounded, hence is ideal for scenarios in which the keywords are streamed. The top-k query cleaning algorithm is guaranteed to return the best k cleaned keyword queries in ranked order. Extensive experiments are conducted on three real-life data sets, and the results confirm the effectiveness and efficiency of the proposed solutions.