Foundations of statistical natural language processing
Foundations of statistical natural language processing
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
IO-Top-k: index-access optimized top-k query processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Correcting ESL errors using phrasal SMT techniques
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The linguist's search engine: an overview
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
The power of naive query segmentation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Optimising search engines using evolutionally adapted language models in typed dependency parses
SIDE'12 Proceedings of the 2012 international conference on Swarm and Evolutionary Computation
Hi-index | 0.00 |
This paper introduces Netspeak, a Web service which assists writers in finding adequate expressions. To provide statistically relevant suggestions, the service indexes more than 1.8 billion n-grams, n≤5, along with their occurrence frequencies on the Web. If in doubt about a wording, a user can specify a query that has wildcards inserted at those positions where she feels uncertain. Queries define patterns for which a ranked list of matching n-grams along with usage examples are retrieved. The ranking reflects the occurrence frequencies of the n-grams and informs about both absolute and relative usage. Given this choice of customary wordings, one can easily select the most appropriate. Especially second-language speakers can learn about style conventions and language usage. To guarantee response times within milliseconds we have developed an index that considers occurrence probabilities, allowing for a biased sampling during retrieval. Our analysis shows that the extreme speedup obtained with this strategy (factor 68) comes without significant loss in retrieval quality.