Learning of semantic sibling group hierarchies - K-means vs. bi-secting-K-means

  • Authors:
  • Marko Brunzel

  • Affiliations:
  • DFKI GmbH - German Research Center for Artificial Intelligence and Otto-von-Guericke Universität Magdeburg, Germany

  • Venue:
  • DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The discovery of semantically associated groups of terms is important for many applications of text understanding, including document vectorization for text mining, semi-automated ontology extension from documents and ontology engineering with help of domain-specific texts. In [3], we have proposed a method for the discovery of such terms and shown that its performance is superior to other methods for the same task. However, we have observed that (a) the approach is sensitive to the term clustering method and (b) the performance improves with the size of the results'list, thus incurring higher human overhead in the postprocessing phase. In this study, we address these issues by proposing the delivery of a hierarchically organized output, computed with Bisecting K-Means. We compared the results of the new algorithm with those delivered by the original method, which used K-Means using two ontologies as gold standards.