Automatic keyphrase extraction from chinese news documents

Authors:
Houfeng Wang;Sujian Li;Shiwen Yu
Affiliations:
Department of Computer Science and Technology, School of Electronic Engineering and Computer Science, Peking University, Beijing, China;Department of Computer Science and Technology, School of Electronic Engineering and Computer Science, Peking University, Beijing, China;Department of Computer Science and Technology, School of Electronic Engineering and Computer Science, Peking University, Beijing, China
Venue:
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Year:
2005

Citing 4
Cited 2

PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
An Automatic Online News Topic Keyphrase Extraction System

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a framework for automatically supplying keyphrases for a Chinese news document. It works as follows: extracts Chinese character strings from a source article as an initial set of keyphrase candidates based on frequency and length of the strings, then, filters out unimportant candidates from the initial set by using elimination-rules and transforms vague ones into their canonical forms according to controlled synonymous terms list and abbreviation list, and finally, selects the best items from the set of the remaining candidates by score measure. The approach is tested on People Daily corpus and the experiment results are satisfactory.