Information retrieval
Information retrieval
A maximum entropy approach to natural language processing
Computational Linguistics
Modern Information Retrieval
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hybrid back-transliteration system for Japanese
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic construction of Japanese KATAKANA variant list from large corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Acquiring ontological knowledge from query logs
Proceedings of the 16th international conference on World Wide Web
Random walks on the click graph
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning query intent from regularized click graphs
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A discriminative candidate generator for string transformations
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning phrase-based spelling error models from clickthrough data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
Learning hash functions for cross-view similarity search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hi-index | 0.00 |
In this paper we address the problem of identifying a broad range of term variations in Japanese web search queries, where these variations pose a particularly thorny problem due to the multiple character types employed in its writing system. Our method extends the techniques proposed for English spelling correction of web queries to handle a wider range of term variants including spelling mistakes, valid alternative spellings using multiple character types, transliterations and abbreviations. The core of our method is a statistical model built on the MART algorithm (Friedman, 2001). We show that both string and semantic similarity features contribute to identifying term variation in web search queries; specifically, the semantic similarity features used in our system are learned by mining user session and click-through logs, and are useful not only as model features but also in generating term variation candidates efficiently. The proposed method achieves 70% precision on the term variation identification task with the recall slightly higher than 60%, reducing the error rate of a naïve baseline by 38%.