SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Mining the web for answers to natural language questions
Proceedings of the tenth international conference on Information and knowledge management
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Adaptive sentence boundary disambiguation
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Minimally supervised induction of grammatical gender
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
HLT '91 Proceedings of the workshop on Speech and Natural Language
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Using the web to overcome data sparseness
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Using the web to disambiguate acronyms
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Hi-index | 0.00 |
We investigate the use of Web search engine statistics for the task of case restoration. Because most engines are case insensitive, an approach based on search hit counts, as employed in previous work in natural language ambiguity resolution, is not applicable for this task. Consequently, we study the use of statistics computed from the snippets generated by a Web search engine, and we show that such statistics can achieve performance similar to corpus-based approaches. We also note that the top few results returned by a search engine may not the most representative for modeling phenomena in a language.