A new approach on search for similar documents with multiple categories using fuzzy clustering

Authors:
Rıdvan Saraçoğlu;Kemal Tütüncü;Novruz Allahverdi
Affiliations:
Department of Electronic and Computer Education, Selçuk University, Konya, Turkey;Department of Electronic and Computer Education, Selçuk University, Konya, Turkey;Department of Electronic and Computer Education, Selçuk University, Konya, Turkey
Venue:
Expert Systems with Applications: An International Journal
Year:
2008

Citing 17
Cited 6

Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic indexing based on Bayesian inference networks

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Japanese probabilistic information retrieval using location and category information

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Data Mining: Technologies, Techniques, Tools, and Trends

Data Mining: Technologies, Techniques, Tools, and Trends
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Induction of Decision Trees

Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Similarity Model and Term Association For Document Categorization

DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A fuzzy clustering approach for finding similar documents using a novel similarity measure

Expert Systems with Applications: An International Journal
A text mining approach for automatic construction of hypertexts

Expert Systems with Applications: An International Journal
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
Using text classification and multiple concepts to answer e-mails

Expert Systems with Applications: An International Journal
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
Syskill & webert: Identifying interesting web sites

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

One-against-one fuzzy support vector machine classifier: An approach to text categorization

Expert Systems with Applications: An International Journal
Cross-lingual document representation and semantic similarity measure: a fuzzy set and rough set based approach

IEEE Transactions on Fuzzy Systems
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors

Expert Systems with Applications: An International Journal
Fast growing self organizing map for text clustering

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Generating queries from user-selected text

Proceedings of the 4th Information Interaction in Context Symposium

Quantified Score

Hi-index	12.06

Visualization

Abstract

Searching for similar document has an important role in text mining and document management. In whether similar document search or in other text mining applications generally document classification is focused and class or category that the documents belong to is tried to be determined. The aim of the present study is the investigation of the case which includes the documents that belong to more than one category. The system used in the present study is a similar document search system that uses fuzzy clustering. The situation of belonging to more than one category for the documents is included by this system. The proposed approach consists of two stages to solve multicategories problem. The first stage is to find out the documents belonging to more than one category. The second stage is the determination of the categories to which these found documents belong to. For these two aims @a-threshold Fuzzy Similarity Classification Method (@a-FSCM) and Multiple Categories Vector Method (MCVM) are proposed as written order. Experimental results showed that proposed system can distinguish the documents that belong to more than one category efficiently. Regarding to the finding which documents belong to which classes, proposed system has better performance and success than the traditional approach.