Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages

Authors:
Xinghua Li;Xindong Wu;Xuegang Hu;Fei Xie;Zhaozhong Jiang
Affiliations:
-;-;-;-;-
Venue:
ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Year:
2008

Citing 0
Cited 1

Keyword extraction based on sequential pattern mining

Proceedings of the Third International Conference on Internet Multimedia Computing and Service

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new keyword extraction algorithm for Chinese news web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the corelation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single web page without corpus. Experiments on randomly selected web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.