A reliable FAQ retrieval system using a query log classification technique based on latent semantic analysis

  • Authors:
  • Harksoo Kim;Hyunjung Lee;Jungyun Seo

  • Affiliations:
  • Program of Computer and Communications Engineering, College of Information Technology, Kangwon National University, Hyoja, Republic of Korea;Natural Language Processing Laboratory, Department of Computer Science, Sogang University, Seoul, Republic of Korea;Department of Computer Science and Interdisciplinary Program of Integrated Biotechnology, Sogang University, Seoul, Republic of Korea

  • Venue:
  • Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users' query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users' query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources.