A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
INTERACT '09 Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I
Named entity translation with web mining and transliteration
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Collaborative decoding: partial hypothesis re-ranking using translation consensus between decoders
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Mining bilingual data from the web with adaptively learnt patterns
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
SRL-based verb selection for ESL
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Evaluating the Quality of Web-Mined Bilingual Sentences Using Multiple Linguistic Features
IALP '10 Proceedings of the 2010 International Conference on Asian Language Processing
An overview of web mining in education
Proceedings of the 17th Panhellenic Conference on Informatics
Hi-index | 0.00 |
This paper presents Engkoo, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world's largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.