Boosting over Groups and Its Application to Acronym-Expansion Extraction

  • Authors:
  • Weijian Ni;Yalou Huang;Dong Li;Yang Wang

  • Affiliations:
  • College of Information Technical Science, Nankai University, Tianjin, China;College of Information Technical Science, Nankai University, Tianjin, China;College of Information Technical Science, Nankai University, Tianjin, China;College of Information Technical Science, Nankai University, Tianjin, China

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many real-world classification applications, instances are generated from different `groups'. Take webpage classification as an example, the webpages for training and testing can be naturally grouped by network domains, which often vary a lot from one to another in domain size or webpage template. The differences between `groups' would result that the distribution of instances from different `groups' also vary. Thus, it is not so reasonable to equally treat the instances as the independent elements during training and testing as in conventional classification algorithms. This paper addresses the classification problem where all the instances can be naturally grouped. Specifically, we give a formulation to this kind of problem and propose a simple but effective boosting approach, which is called AdaBoost.Group. The problem is demonstrated by the task of recognizing acronyms and their expansions from text, where all the instances are grouped by sentences. The experimental results show that our approach is more appropriate to this kind of problems than conventional classification approaches.