Incremental clustering of newsgroup articles

  • Authors:
  • Sascha Hennig;Michael Wurst

  • Affiliations:
  • Department of Computer Science, University of Dortmund, Dortmund, Germany;Department of Computer Science, University of Dortmund, Dortmund, Germany

  • Venue:
  • IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustering text documents is a basic enabling technique in a wide variety of Information and Knowledge Management applications. This paper presents an incremental clustering system to organize and manage Newsgroup articles. It serves administrators and readers of a Newsgroup to archive important postings and to get a structured over-view on current developments and topics. To be practically applicable, such a system must fulfill two conditions. First, it must be able to process rapidly changing text streams, modifying the cluster structure dynamically by adding, deleting and restructuring clusters. Second, it must consider the user in the incremental process. Severe changes in the organization structure are unacceptable for most users, even if they are optimal from the point of view of an abstract clustering criterion. We propose an approach to model the cost to accommodate to changes in the cluster structure explicitly. Users then may constraint, which changes are acceptable to them.