Navigating massive data sets via local clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Content-based retrieval in hybrid peer-to-peer networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Journal of the American Society for Information Science and Technology
Vagueness and uncertainty in information retrieval: how can fuzzy sets help?
Proceedings of the 2006 international workshop on Research issues in digital libraries
Clustered organized conceptual queries in the internet using fuzzy interrelations
AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Web news summarization via soft clustering algorithm
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
An improved web information summarization based on SSSC
CAR'10 Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 3
A suite of testbeds for the realistic evaluation of peer-to-peer information retrieval systems
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Hi-index | 0.00 |
Abstract: Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects; hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC requires only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.