Document clustering based on similarity of subjects using integrated subject graph

  • Authors:
  • Masao Nakada;Yuko Osana

  • Affiliations:
  • Graduate School of Bionics, Computer and Media Science, School of Computer Science, Tokyo University of Technology, Tokyo, Japan;School of Computer Science, Tokyo University of Technology, Tokyo, Japan

  • Venue:
  • AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this research, we propose an integrated subject graph which expresses the subject of the document. The proposed integrated subject graph is based on the graph-based text representation model which is called "subject graph". In the subject graph, a node represents a term in the text, and an edge denotes a relation between linked terms. As the conventional text representation models, the graph models such as the subject graph and the KeyGraph have been proposed, and most of them assume that one document has one subject. However, the document often has not only one subject but also plural subjects. In this research, we assume that each unit of the document such as a paragraph has one subject, and each unit is translated into a subject graph. Then, they are integrated into an integrated subject graph. In this research, we apply the proposed integrated subject graph to the document clustering and realize the document clustering based on the similarity of the subjects. We carried out a series of computer experiments and confirmed the effectiveness of the proposed integrated subject graph.