Multi-instance clustering with applications to multi-instance prediction

  • Authors:
  • Min-Ling Zhang;Zhi-Hua Zhou

  • Affiliations:
  • National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093 and College of Computer and Information Engineering, Hohai University, Nanjing, China 210098;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093

  • Venue:
  • Applied Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the setting of multi-instance learning, each object is represented by a bag composed of multiple instances instead of by a single instance in a traditional learning setting. Previous works in this area only concern multi-instance prediction problems where each bag is associated with a binary (classification) or real-valued (regression) label. However, unsupervised multi-instance learning where bags are without labels has not been studied. In this paper, the problem of unsupervised multi-instance learning is addressed where a multi-instance clustering algorithm named Bamic is proposed. Briefly, by regarding bags as atomic data items and using some form of distance metric to measure distances between bags, Bamic adapts the popular k -Medoids algorithm to partition the unlabeled training bags into k disjoint groups of bags. Furthermore, based on the clustering results, a novel multi-instance prediction algorithm named Bartmip is developed. Firstly, each bag is re-represented by a k-dimensional feature vector, where the value of the i-th feature is set to be the distance between the bag and the medoid of the i-th group. After that, bags are transformed into feature vectors so that common supervised learners are used to learn from the transformed feature vectors each associated with the original bag's label. Extensive experiments show that Bamic could effectively discover the underlying structure of the data set and Bartmip works quite well on various kinds of multi-instance prediction problems.