A case study of using web search statistics: case restoration

  • Authors:
  • Silviu Cucerzan

  • Affiliations:
  • Microsoft Research, Redmond, WA

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the use of Web search engine statistics for the task of case restoration. Because most engines are case insensitive, an approach based on search hit counts, as employed in previous work in natural language ambiguity resolution, is not applicable for this task. Consequently, we study the use of statistics computed from the snippets generated by a Web search engine, and we show that such statistics can achieve performance similar to corpus-based approaches. We also note that the top few results returned by a search engine may not the most representative for modeling phenomena in a language.