Keyword extraction using support vector machine

Authors:
Kuo Zhang;Hui Xu;Jie Tang;Juanzi Li
Affiliations:
Department of Computer Science and Technology, Tsinghua University, Beijing, P.R.China;Department of Computer Science and Technology, Tsinghua University, Beijing, P.R.China;Department of Computer Science and Technology, Tsinghua University, Beijing, P.R.China;Department of Computer Science and Technology, Tsinghua University, Beijing, P.R.China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 10
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
OCELOT: a system for summarizing Web pages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Man vs. machine: a case study in base noun phrase learning

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A unified statistical model for the identification of English baseNP

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with keyword extraction. By keyword extraction, we mean extracting a subset of words/phrases from a document that can describe the ‘meaning' of the document. Keywords are of benefit to many text mining applications. However, a large number of documents do not have keywords and thus it is necessary to assign keywords before enjoying the benefit from it. Several research efforts have been done on keyword extraction. These methods make use of the ‘global context information', which makes the performance of extraction restricted. A thorough and systematic investigation on the issue is thus needed. In this paper, we propose to make use of not only ‘global context information', but also ‘local context information' for extracting keywords from documents. As far as we know, utilizing both ‘global context information' and ‘local context information' in keyword extraction has not been sufficiently investigated previously. Methods for performing the tasks on the basis of Support Vector Machines have also been proposed in this paper. Features in the model have been defined. Experimental results indicate that the proposed SVM based method can significantly outperform the baseline methods for keyword extraction. The proposed method has been applied to document classification, a typical text mining processing. Experimental results show that the accuracy of document classification can be significantly improved by using the keyword extraction method.