Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Fast and quasi-natural language search for gigabytes of Chinese texts
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A new statistical formula for Chinese text segmentation incorporating contextual information
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A comparison of Chinese document indexing strategies and retrieval models
ACM Transactions on Asian Language Information Processing (TALIP)
Automatic construction of Chinese stop word list
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Hi-index | 0.00 |
This paper focused on introducing a novel PM indexing schema for Chinese text retrieval. Different with the Western languages, there is no delimiter between words in Chinese texts. The indexing is based either on the characters or on the segmented words. For the word-based indexing, the out-of-vocabulary words, such as the proper nouns, or domain terminology, are usually mis-segmented due to the limited vocabulary coverage of the segmentation dictionaries and thus impair the query precision. In this paper, several indexing and ranking methods, including the novel PM-based ranking, were tested so as to compare their efficiency in dealing with the new words in Chinese text retrieval. The experiment has shown that the query precision of the PM + word method is 10% higher than the word indexing.