Supervised learning approach to optimize ranking function for Chinese FAQ-finder

  • Authors:
  • Guoping Hu;Dan Liu;Qingfeng Liu;Ren-Hua Wang

  • Affiliations:
  • iFly Speech Lab, University of Science and Technology of China, Hefei, China;Research of iFlyTEK Co., Ltd., Hefei, China;Research of iFlyTEK Co., Ltd., Hefei, China;iFly Speech Lab, University of Science and Technology of China, Hefei, China

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the optimization problem for huge Question-Answer (QA) pairs collection based Chinese FAQ-Finder system. Unlike most published researches which leaned to address word mismatching problem among questions, we focus on more fundamental problem: ranking function, which was always arbitrarily borrowed from traditional document retrieval directly. One unified ranking function with four embedded parameters is proposed and the characteristics of three different fields of QA pair and effects of two different Chinese word segmentation settings are investigated. Experiments on 1,000 question queries and 3.8 million QA pairs show that the unified ranking function can achieve 6.67% promotion beyond TFIDF baseline. One supervised learning approach is also proposed to optimize ranking function by employing 264 features, including part-of-speech, and bigram co-occurrence etc. Experiments show that 7.06% further improvement can be achieved.