A Similarity-Based Soft Clustering Algorithm for Documents

Authors:
K. Lin;Ravikuma Kondadadi
Affiliations:
-;-
Venue:
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Year:
2001

Citing 0
Cited 9

Navigating massive data sets via local clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Content-based retrieval in hybrid peer-to-peer networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Concept-matching IR systems versus word-matching information retrieval systems: Considering fuzzy interrelations for indexing Web pages: Special Topic Section on Soft Approaches to Information Retrieval and Information Access on the Web

Journal of the American Society for Information Science and Technology
Vagueness and uncertainty in information retrieval: how can fuzzy sets help?

Proceedings of the 2006 international workshop on Research issues in digital libraries
Clustered organized conceptual queries in the internet using fuzzy interrelations

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Web news summarization via soft clustering algorithm

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
An improved web information summarization based on SSSC

CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 3
A suite of testbeds for the realistic evaluation of peer-to-peer information retrieval systems

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Improving the quality of predictions using textual information in online user reviews

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects; hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC requires only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.