A new character-based indexing method using frequency data for Japanese documents
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining multiple evidence from different properties of weighting schemes
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using n-grams for Korean text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Implementation of the SMART Information Retrieval System
Implementation of the SMART Information Retrieval System
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
A comparison of Chinese document indexing strategies and retrieval models
ACM Transactions on Asian Language Information Processing (TALIP)
Automatic construction of English/Chinese parallel corpora
Journal of the American Society for Information Science and Technology
Applying Machine Learning to Text Segmentation for Information Retrieval
Information Retrieval
Chinese word segmentation and its effect on information retrieval
Information Processing and Management: an International Journal
Dictionary-based techniques for cross-language information retrieval
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Covering ambiguity resolution in Chinese word segmentation based on contextual information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese information retrieval based on terms and relevant terms
ACM Transactions on Asian Language Information Processing (TALIP)
Adapting pivoted document-length normalization for query size: Experiments in Chinese and English
ACM Transactions on Asian Language Information Processing (TALIP)
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The effect of translation quality in MT-based cross-language information retrieval
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Document re-ranking based on automatically acquired key terms in Chinese information retrieval
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Comparing different units for query translation in Chinese cross-language information retrieval
Proceedings of the 2nd international conference on Scalable information systems
Relating dependent indexes using dempster-shafer theory
Proceedings of the 17th ACM conference on Information and knowledge management
Kinds of features for Chinese opinionated information retrieval
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Information retrieval oriented word segmentation based on character associative strength ranking
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A large scale study of English-Chinese online dictionary search behavior
UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
Synonyms extraction using web content focused crawling
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Diacritics restoration in vietnamese: letter based vs. syllable based model
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
Chinese document re-ranking based on term distribution and maximal marginal relevance
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A cross-lingual framework for web news taxonomy integration
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Document re-ordering based on key terms in top retrieved documents
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Improving retrieval effectiveness by using key terms in top retrieved documents
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Journal of Biomedical Informatics
A new method to compose long unknown Chinese keywords
Journal of Information Science
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Journal of Biomedical Informatics
Hi-index | 0.00 |
In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances. In this study, we carry out more experiments on different ways to segment documents and queries, and to combine words with n-grams. Our experiments show that a combination of the longest-matching algorithm with single characters is the best choice.