Word segmentation and POS tagging for chinese keyphrase extraction

Authors:
Xiaochun Huang;Jian Chen;Puliu Yan;Xin Luo
Affiliations:
School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China
Venue:
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Year:
2005

Citing 2
Cited 0

Using keyphrases as search result surrogates on small screen devices

Personal and Ubiquitous Computing
Chinese word segmentation without using lexicon and hand-crafted training data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyphrases are essential for many text mining applications. In order to automatically extracting keyphrases from Chinese text, an extraction system is proposed in this paper. To access a particular problem of Chinese information processing, a lexicon-based word segmentation approach is presented. For this purpose, a verb lexicon, a functional word lexicon and a stop word lexicon are constructed. A predefined keyphrase lexicon is applied to improve the performance of extraction. The approach uses a small Part-Of-Speech(POS) tagset to index phrases simply according to these lexicons. It is especially effective for identifying phrases in form of combinations of nouns, adjectives and verbs. Keyphrases are sifted by their weighted TF-IDF (Term occurrence Frequency-Inverse Document Frequency) values. New keyphrases are added into the keyphrase lexicon.