A new approach on search for similar documents with multiple categories using fuzzy clustering

  • Authors:
  • Rıdvan Saraçoğlu;Kemal Tütüncü;Novruz Allahverdi

  • Affiliations:
  • Department of Electronic and Computer Education, Selçuk University, Konya, Turkey;Department of Electronic and Computer Education, Selçuk University, Konya, Turkey;Department of Electronic and Computer Education, Selçuk University, Konya, Turkey

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 12.06

Visualization

Abstract

Searching for similar document has an important role in text mining and document management. In whether similar document search or in other text mining applications generally document classification is focused and class or category that the documents belong to is tried to be determined. The aim of the present study is the investigation of the case which includes the documents that belong to more than one category. The system used in the present study is a similar document search system that uses fuzzy clustering. The situation of belonging to more than one category for the documents is included by this system. The proposed approach consists of two stages to solve multicategories problem. The first stage is to find out the documents belonging to more than one category. The second stage is the determination of the categories to which these found documents belong to. For these two aims @a-threshold Fuzzy Similarity Classification Method (@a-FSCM) and Multiple Categories Vector Method (MCVM) are proposed as written order. Experimental results showed that proposed system can distinguish the documents that belong to more than one category efficiently. Regarding to the finding which documents belong to which classes, proposed system has better performance and success than the traditional approach.