Combining statistics and semantics via ensemble model for document clustering

  • Authors:
  • Samah Jamal Fodeh;William F Punch;Pang-Ning Tan

  • Affiliations:
  • Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI

  • Venue:
  • Proceedings of the 2009 ACM symposium on Applied Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge provided by domain experts, knowledge specific to the particular data set. In this study, we propose an ensemble model that couples two sources of information: statistics information that is derived from the data set, and sense information retrieved from WordNet that is used to build a semantic binary model. We evaluated the efficacy of using our combined ensemble model on the Reuters-21578 and 20newsgroups data sets.