A Heuristic Approach for Segmentation Granularity Problem in Chinese Information Retrieval

Authors:
Ding Fan;Wang Bin;Wang Sili
Affiliations:
-;-;-
Venue:
ALPIT '07 Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)
Year:
2007

Citing 0
Cited 1

Information retrieval oriented word segmentation based on character associative strength ranking

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Chinese information retrieval, documents are usually segmented into words and then indexed by these words. However, segmentation granularity problem (SDP) should be considered because small granularity may lead to low precision and efficiency while big granularity may cause low recall. To solve the problem, this paper proposes an intuitive and heuristic approach. Two-level index for the segmentation dictionary is built by which the original query word could be expanded with its weighted overlaid words. This method not only reserves the advantage of big granularity in precision, but also overcome its disadvantage in recall. The experimental results show that our approach slightly but consistently outperforms the baseline.