Exploratory class-imbalanced and non-identical data distribution in automatic keyphrase extraction

Authors:
Weijian Ni;Tong Liu;Qingtian Zeng
Affiliations:
Shandong University of Science and Technology, Qingdao, Shandong Province, P.R. China;Shandong University of Science and Technology, Qingdao, Shandong Province, P.R. China;Shandong University of Science and Technology, Qingdao, Shandong Province, P.R. China
Venue:
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Year:
2012

Citing 17
Cited 0

Stacked regressions

Machine Learning
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Is Combining Classifiers with Stacking Better than Selecting the Best One?

Machine Learning
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Keyword extraction for contextual advertisement

Proceedings of the 17th international conference on World Wide Web
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
CollabRank: towards a collaborative approach to single-document keyphrase extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Enhancing Keyword Search with a Keyphrase Index

Advances in Focused Retrieval
Graph-based keyword extraction for single-document summarization

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Keyword extraction for social snippets

Proceedings of the 19th international conference on World wide web
Keyphrase extraction in scientific publications

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Automatic keyphrase extraction via topic decomposition

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

While supervised learning algorithms hold much promise for automatic keyphrase extraction, most of them presume that the samples are evenly distributed among different classes as well as drawn from an identical distribution, which, however, may not be the case in the real-world task of extracting keyphrases from documents. In this paper, we propose a novel supervised keyphrase extraction approach which deals with the problems of class-imbalanced and non-identical data distributions in automatic keyphrase extraction. Our approach is by nature a stacking approach where meta-models are trained on balanced partitions of a given training set and then combined through introducing meta-features describing particular keyphrase patterns embedded in each document. Experimental results verify the effectiveness of our approach.