Combining Statistical Machine Learning Models to Extract Keywords from Chinese Documents

Authors:
Chengzhi Zhang
Affiliations:
Department of Information Management, Nanjing University of Science & Technology, Nanjing 210093 and Institute of Scientific & Technical Information of China, Beijing 100038
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 15
Cited 0

Automatic text structuring and retrieval-experiments in automatic encyclopedia searching

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Highlights: language- and domain-independent automatic indexing terms for abstracting

Journal of the American Society for Information Science
The nature of statistical learning theory

The nature of statistical learning theory
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Ensembling neural networks: many could be better than all

Artificial Intelligence
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Using lexical chains for keyword extraction

Information Processing and Management: an International Journal
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. However, most methods of automatic keyword extraction cannot use the features of documents effectively. A method which integrates the statistical machine learning models is proposed in this paper. This method extracts keyword from Chinese documents through voting of multiple keywords extraction models. Experimental results show that the proposed method based on ensemble leaning outperforms other methods according to F1 measurement. Moreover, the keywords extraction model based on ensemble learning with the weighted voting outperforms the model without the weighted voting.