Multilingual novelty detection

  • Authors:
  • Flora S. Tsai;Yi Zhang;Agus T. Kwee;Wenyin Tang

  • Affiliations:
  • School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Novelty detection aims at reducing redundant information from a chronologically ordered list of documents or sentences. Other studies of novelty detection have been conducted on the English language, but few papers have addressed the problem of multilingual novelty detection. Likewise, research in multilingual information retrieval have rarely been applied to novelty detection. This paper attempts to bridge the two disciplines by first describing the preprocessing steps for English, Malay and Chinese, then applying document and sentence-level novelty detection for the three languages on APWSJ and TREC 2004 Novelty Track data. Experiments on sentence-level novelty detection show similar results for all three languages, which indicates that our algorithm is suitable for multilingual novelty detection at the sentence level. However, results for document-level novelty detection show a disparity across the different languages, with English and Malay outperforming Chinese. After applying sentence-level novelty detection to detect novel documents, we observe substantial improvements on all three languages. This demonstrates that segmenting documents into sentences improves document-level novelty detection in multiple languages, and has practical benefits for a real-time multilingual novelty detection system.