News-oriented automatic Chinese keyword indexing

Authors:
Li Sujian;Wang Houfeng;Yu Shiwen;Xin Chengsheng
Affiliations:
Peking University;Peking University;Peking University;The Information Center of People's Daily
Venue:
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Year:
2003

Citing 5
Cited 1

PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Chinese keyword extraction based on max-duplicated strings of the documents

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology

ACM Transactions on Asian Language Information Processing (TALIP)
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence

Concepts discrimination research

ISP'06 Proceedings of the 5th WSEAS International Conference on Information Security and Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. However, the majority of news articles come without keywords, and indexing them manually costs highly. Aiming at news articles' characteristics and the resources available, this paper introduces a simple procedure to index keywords based on the scoring system. In the process of indexing, we make use of some relatively mature linguistic techniques and tools to filter those meaningless candidate items. Furthermore, according to the hierarchical relations of content words, keywords are not restricted to extracting from text. These methods have improved our system a lot. At last experimental results are given and analyzed, showing that the quality of extracted keywords are satisfying.