Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The maximum entropy principle in information retrieval
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering Chinese words from unsegmented text (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Combination and boundary detection approaches on Chinese indexing
Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
PM-based indexing for Chinese text retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Information Retrieval
Chinese word segmentation and its effect on information retrieval
Information Processing and Management: an International Journal
On the effect of stopword removal for SMS-Based FAQ retrieval
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Hi-index | 0.00 |
In modern information retrieval systems, effective indexing can be achieved by removal of stop words. Till now many stop word lists have been developed for English language. However, no standard stop word list has been constructed for Chinese language yet. With the fast development of information retrieval in Chinese language, exploring Chinese stop word lists becomes critical. In this paper, to save the time and release the burden of manual stop word selection, we propose an automatic aggregated methodology based on statistical and information models for extraction of a stop word list in Chinese language. Result analysis shows that our stop list is comparable with a general English stop word list, and our list is much more general than other Chinese stop lists as well. Our stop word extraction algorithm is a promising technique, which saves the time for manual generation and constructs a standard. It could be applied into other languages in the future.