Multilingual sentence categorization and novelty mining

  • Authors:
  • Yi Zhang;Flora S. Tsai;Agus Trisnajaya Kwee

  • Affiliations:
  • Nanyang Technological University, School of Electrical and Electronic Engineering, Block S2, Nanyang Avenue, Singapore 639798, Singapore;Nanyang Technological University, School of Electrical and Electronic Engineering, Block S2, Nanyang Avenue, Singapore 639798, Singapore;Nanyang Technological University, School of Electrical and Electronic Engineering, Block S2, Nanyang Avenue, Singapore 639798, Singapore

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user's information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining.