Understanding Japanese information processing
Understanding Japanese information processing
Language determination: natural language processing from scanned document images
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Multilingual information access
Lectures on information retrieval
Multilingual Information Access
ESSIR '00 Proceedings of the Third European Summer-School on Lectures on Information Retrieval-Revised Lectures
Towards an intelligent multilingual keyboard system
HLT '01 Proceedings of the first international conference on Human language technology research
Multi-language named-entity recognition system based on HMM
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Study of some distance measures for language and encoding identification
LD '06 Proceedings of the Workshop on Linguistic Distances
Language identification in multi-lingual web-documents
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Text segmentation by language using minimum description length
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
This paper proposes a new algorithm that simultaneously identifies the coding system and language of a code string fetched from the Internet, especially World-Wide Web. The algorithm uses statistic language models to select the correctly decoded string as well as to determine the language. The proposed algorithm covers 9 languages and 11 coding systems used in Eastern Asia and Western Europe. Experimental results show that the level of accuracy of our algorithm is over 95% for 640 on-line documents.