Boosting over Groups and Its Application to Acronym-Expansion Extraction

Authors:
Weijian Ni;Yalou Huang;Dong Li;Yang Wang
Affiliations:
College of Information Technical Science, Nankai University, Tianjin, China;College of Information Technical Science, Nankai University, Tianjin, China;College of Information Technical Science, Nankai University, Tianjin, China;College of Information Technical Science, Nankai University, Tianjin, China
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 8
Cited 0

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Acrophile: an automated acronym extractor and server

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Boosting Methods for Regression

Machine Learning
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
A supervised learning approach to acronym identification

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real-world classification applications, instances are generated from different `groups'. Take webpage classification as an example, the webpages for training and testing can be naturally grouped by network domains, which often vary a lot from one to another in domain size or webpage template. The differences between `groups' would result that the distribution of instances from different `groups' also vary. Thus, it is not so reasonable to equally treat the instances as the independent elements during training and testing as in conventional classification algorithms. This paper addresses the classification problem where all the instances can be naturally grouped. Specifically, we give a formulation to this kind of problem and propose a simple but effective boosting approach, which is called AdaBoost.Group. The problem is demonstrated by the task of recognizing acronyms and their expansions from text, where all the instances are grouped by sentences. The experimental results show that our approach is more appropriate to this kind of problems than conventional classification approaches.