Supervised learning approach to optimize ranking function for Chinese FAQ-finder

Authors:
Guoping Hu;Dan Liu;Qingfeng Liu;Ren-Hua Wang
Affiliations:
iFly Speech Lab, University of Science and Technology of China, Hefei, China;Research of iFlyTEK Co., Ltd., Hefei, China;Research of iFlyTEK Co., Ltd., Hefei, China;iFly Speech Lab, University of Science and Technology of China, Hefei, China
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 3
Cited 4

Retrieving answers from frequently asked questions pages on the web

Proceedings of the 14th ACM international conference on Information and knowledge management
Finding similar questions in large question and answer archives

Proceedings of the 14th ACM international conference on Information and knowledge management
A supervised learning approach to entity search

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

FAQtory: A framework to provide high-quality FAQ retrieval systems

Expert Systems with Applications: An International Journal
A high-performance FAQ retrieval method using minimal differentiator expressions

Knowledge-Based Systems
A cloud of FAQ: A highly-precise FAQ retrieval system for the Web 2.0

Knowledge-Based Systems
Learning regular expressions to template-based FAQ retrieval systems

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the optimization problem for huge Question-Answer (QA) pairs collection based Chinese FAQ-Finder system. Unlike most published researches which leaned to address word mismatching problem among questions, we focus on more fundamental problem: ranking function, which was always arbitrarily borrowed from traditional document retrieval directly. One unified ranking function with four embedded parameters is proposed and the characteristics of three different fields of QA pair and effects of two different Chinese word segmentation settings are investigated. Experiments on 1,000 question queries and 3.8 million QA pairs show that the unified ranking function can achieve 6.67% promotion beyond TFIDF baseline. One supervised learning approach is also proposed to optimize ranking function by employing 264 features, including part-of-speech, and bigram co-occurrence etc. Experiments show that 7.06% further improvement can be achieved.