Clustering objects from multiple collections

Authors:
Vera Hollink;Maarten Van Someren;Viktor De Boer
Affiliations:
Centre for Mathematics and Computer Science and University of Amsterdam;University of Amsterdam;University of Amsterdam
Venue:
KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Year:
2009

Citing 6
Cited 0

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text clustering with extended user feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental hierarchical clustering of text documents

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Partial Similarity of Objects, or How to Compare a Centaur to a Horse

International Journal of Computer Vision
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering methods cluster objects on the basis of a similarity measure between the objects. In clustering tasks where the objects come from more than one collection often part of the similarity results from features that are related to the collections rather than features that are relevant for the clustering task. For example, when clustering pages from various web sites by topic, pages from the same web site often contain similar terms. The collection-related part of the similarity hinders clustering as it causes the creation of clusters that correspond to collections instead of topics. In this paper we present two methods to restrict clustering to the part of the similarity that is not associated with membership of a collection. Both methods can be used on top of standard clustering methods. Experiments on data sets with objects from multiple collections show that our methods result in better clusters than methods that do not take collection information into account.