Keyword query cleaning using hidden Markov models

Authors:
Ken Q. Pu
Affiliations:
University of Ontario Inst. of Technology, Oshawa, Ontario
Venue:
Proceedings of the First International Workshop on Keyword Search on Structured Data
Year:
2009

Citing 10
Cited 1

Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Modern Information Retrieval

Modern Information Retrieval
Join Index Hierarchies for Supporting Efficient Navigations in Object-Oriented Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using Word Temporal Structure in HMM Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Universal HMM-Based Approach to Image Sequence Classification

ICIP '97 Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 3 - Volume 3
Shallow parsing using specialized hmms

The Journal of Machine Learning Research
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Conditional structure versus conditional estimation in NLP models

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research

The list Viterbi training algorithm and its application to keyword search over databases

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we consider the problem of keyword query cleaning for structured databases from a probabilistic approach. Keyword query cleaning consists of rewriting the user query, segmenting the keywords, matching each segment to database items, and finally tagging the segments by their meta-data information. We present an efficient and robust solution using Hidden Markov Models (HMM). By modeling user keyword queries using a generative probabilistic HMM-based model, we construct a HMM from the user specified keyword query (and the database instance). The optimal statistical keyword cleaning is computed as the most likely path of the constructed HMM. Furthermore, we demonstrate how the optimal HMM-based keyword cleaning algorithm can be generalized to compute a stream of clean queries ranked from the most likely clean query to the least likely clean query. Finally, we present the implementation of the proposed system and its preliminary performance.