Category Classification and Topic Discovery of Japanese and English News Articles

  • Authors:
  • David B. Bracewell;Jiajun Yan;Fuji Ren;Shingo Kuroiwa

  • Affiliations:
  • Department of Information Science and Intelligent Systems, The University of Tokushima, Tokushima, Japan;Department of Information Science and Intelligent Systems, The University of Tokushima, Tokushima, Japan;Department of Information Science and Intelligent Systems, The University of Tokushima, Tokushima, Japan and School of Information Engineering, Beijing University of Posts and Telecommunications, ...;Department of Information Science and Intelligent Systems, The University of Tokushima, Tokushima, Japan

  • Venue:
  • Electronic Notes in Theoretical Computer Science (ENTCS)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents algorithms for topic analysis of news articles. Topic analysis entails category classification and topic discovery and classification. Dealing with news has special requirements that standard classification approaches typically cannot handle. The algorithms proposed in this paper are able to do online training for both category and topic classification as well as discover new topics as they arise. Both algorithms are based on a keyword extraction algorithm that is applicable to any language that has basic morphological analysis tools. As such, both the category classification and topic discovery and classification algorithms can be easily used by multiple languages. Through experimentation the algorithms are shown to have high precision and recall in tests on English and Japanese.