Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Optimizing a text retrieval system utilizing N-gram indexing
Optimizing a text retrieval system utilizing N-gram indexing
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Highlights: language- and domain-independent automatic indexing terms for abstracting
Journal of the American Society for Information Science
TELLTALE: experiments in a dynamic hypertext environment for degraded and multilingual data
Journal of the American Society for Information Science - Special issue on full-text retrieval
Using n-grams for Korean text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
One-time complete indexing of text: theory and practice
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Overlapping statistical word indexing: a new indexing method for Japanese text
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing cross-language query expansion techniques by degrading translation resources
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Experiments with the Eurospider Retrieval System for CLEF 2000
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Multilingual Information Retrieval Based on Parallel Texts from the Web
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
A Language-Independent Approach to European Text Retrieval
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
TNO at CLEF-2001: Comparing Translation Resources
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
JHU/APL Experiments at CLEF: Translation Resources and Score Normalization
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Cross-language information retrieval: experiments based on CLEF 2000 corpora
Information Processing and Management: an International Journal
Empirical methods for exploiting parallel texts
Empirical methods for exploiting parallel texts
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Should we translate the documents or the queries in cross-language information retrieval?
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Letter level learning for language independent diacritics restoration
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Cross-Language Evaluation Forum: Objectives, Results, Achievements
Information Retrieval
The effectiveness of combining information retrieval strategies for European languages
Proceedings of the 2004 ACM symposium on Applied computing
Language identification in web pages
Proceedings of the 2005 ACM symposium on Applied computing
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
Different indexing strategies for multilingual web retrieval: experiments with the EuroGOV corpus
Proceedings of the seventeenth conference on Hypertext and hypermedia
s-grams: Defining generalized n-grams for information retrieval
Information Processing and Management: an International Journal
Cross-language information retrieval using PARAFAC2
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching strategies for the Hungarian language
Information Processing and Management: an International Journal
Stemming Indonesian: A confix-stripping approach
ACM Transactions on Asian Language Information Processing (TALIP)
Don't have a stemmer?: be un+concern+ed
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic acquisition of inflectional lexica for morphological normalisation
Information Processing and Management: an International Journal
Ontology-Driven Approximate Duplicate Elimination of Postal Addresses
IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Text Retrieval through Corrupted Queries
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams
Proceedings of the 2nd ACM workshop on Improving non english web searching
Current research issues and trends in non-English Web searching
Information Retrieval
Addressing morphological variation in alphabetic languages
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Does dictionary based bilingual retrieval work in a non-normalized index?
Information Processing and Management: an International Journal
Indexing and stemming approaches for the Czech language
Information Processing and Management: an International Journal
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Evaluation of the bible as a resource for cross-language information retrieval
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Indexing and searching strategies for the Russian language
Journal of the American Society for Information Science and Technology
Term selection and query operations for video retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
JHU ad hoc experiments at CLEF 2008
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages
ACM Transactions on Asian Language Information Processing (TALIP)
Language identification: the long and the short of the matter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improve feature selection method of web page language identification using fuzzy ARTMAP
International Journal of Intelligent Information and Database Systems
Dynamic public service mediation
EGOV'10 Proceedings of the 9th IFIP WG 8.5 international conference on Electronic government
Selecting automatically the best query translations
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
Cross-language plagiarism detection
Language Resources and Evaluation
ACM Transactions on Asian Language Information Processing (TALIP)
CRTER: using cross terms to enhance probabilistic information retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A novel corpus-based stemming algorithm using co-occurrence statistics
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Search result caching in peer-to-peer information retrieval networks
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Comparative information retrieval evaluation for scanned documents
Proceedings of the 15th WSEAS international conference on Computers
GRAS: An effective and efficient stemming algorithm for information retrieval
ACM Transactions on Information Systems (TOIS)
Ad-Hoc mono- and bilingual retrieval experiments at the university of hildesheim
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Exploring new languages with HAIRCUT at CLEF 2005
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Web retrieval experiments with the EuroGOV corpus at the university of hildesheim
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Cross-language retrieval using HAIRCUT at CLEF 2004
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Mono- and crosslingual retrieval experiments at the university of hildesheim
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Language identification in multi-lingual web-documents
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Authorship Attribution Based on Specific Vocabulary
ACM Transactions on Information Systems (TOIS)
Information retrieval strategies for digitized handwritten medieval documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
A first approach to CLIR using character n-grams alignment
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
JHU/APL ad hoc experiments at CLEF 2006
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Character N-grams translation in cross-language information retrieval
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Cross-Language high similarity search using a conceptual thesaurus
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Cross-Language plagiarism detection using a multilingual semantic network
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Microblog-genre noise and impact on semantic annotation accuracy
Proceedings of the 24th ACM Conference on Hypertext and Social Media
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Identifying useful human correction feedback from an on-line machine translation service
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
The Cross-Language Evaluation Forum has encouraged research in text retrieval methods for numerous European languages and has developed durable test suites that allow language-specific techniques to be investigated and compared. The labor associated with crafting a retrieval system that takes advantage of sophisticated linguistic methods is daunting. We examine whether language-neutral methods can achieve accuracy comparable to language-specific methods with less concomitant software complexity. Using the CLEF 2002 test set we demonstrate empirically how overlapping character n-gram tokenization can provide retrieval accuracy that rivals the best current language-specific approaches for European languages. We show that n = 4 is a good choice for those languages, and document the increased storage and time requirements of the technique. We report on the benefits of and challenges posed by n-grams, and explain peculiarities attendant to bilingual retrieval. Our findings demonstrate clearly that accuracy using n-gram indexing rivals or exceeds accuracy using unnormalized words, for both monolingual and bilingual retrieval.