Profile based algorithm to topic spotting in Reuter21578

  • Authors:
  • Taeho Jo

  • Affiliations:
  • School of Computer and Information Engineering, Inha University, Namgu, Incheon, South Korea

  • Venue:
  • ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This research proposes an alternative approach to machine learning based ones for categorizing online news articles in Reuter21578. For using machine learning based approaches for any task of text mining or information retrieval, documents should be encoded into numerical vectors; two problems, huge dimensionality and sparse distribution, caused by encoding so. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by avoiding the two problems.