New indices for text: PAT Trees and PAT arrays
Information retrieval
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
News-oriented automatic Chinese keyword indexing
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
An Automatic Online News Topic Keyphrase Extraction System
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A Keyword Extraction Method Based on Lexical Chains
ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
Advertising keywords extraction from web pages
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Hi-index | 0.00 |
The corpus analysis methods in Chinese keyword extraction look on the corpus as a single sample of language stochastic process. But the distributions of keywords in the whole corpus and in each document are very different from each other. The extraction based on global statistical information only can get significant keywords in the whole corpus. Max-duplicated strings contain the local significant keywords in each document. In this paper, we designed an efficient algorithm to extract the max-duplicated strings by building PAT-tree for the document, so that the keywords can be picked out from the max-duplicated strings by their SIG values in the corpus.