Automatic text structuring and retrieval-experiments in automatic encyclopedia searching
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Highlights: language- and domain-independent automatic indexing terms for abstracting
Journal of the American Society for Information Science
The nature of statistical learning theory
The nature of statistical learning theory
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction
Proceedings of the fourth ACM conference on Digital libraries
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Ensembling neural networks: many could be better than all
Artificial Intelligence
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Domain-Specific Keyphrase Extraction
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Improved automatic keyword extraction given more linguistic knowledge
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Using lexical chains for keyword extraction
Information Processing and Management: an International Journal
A statistical approach to mechanized encoding and searching of literary information
IBM Journal of Research and Development
Hi-index | 0.00 |
Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. However, most methods of automatic keyword extraction cannot use the features of documents effectively. A method which integrates the statistical machine learning models is proposed in this paper. This method extracts keyword from Chinese documents through voting of multiple keywords extraction models. Experimental results show that the proposed method based on ensemble leaning outperforms other methods according to F1 measurement. Moreover, the keywords extraction model based on ensemble learning with the weighted voting outperforms the model without the weighted voting.