Engkoo: mining the web for language learning

  • Authors:
  • Matthew R. Scott;Xiaohua Liu;Ming Zhou;Microsoft Engkoo Team

  • Affiliations:
  • Microsoft Research Asia, Haidian District, Beijing, China;Microsoft Research Asia, Haidian District, Beijing, China;Microsoft Research Asia, Haidian District, Beijing, China;Microsoft Research Asia, Haidian District, Beijing, China

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents Engkoo, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world's largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.