RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Language Identification on the Web: Extending the Dictionary Method
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A search engine based on query logs, and search log analysis by automatic language identification
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
n-Gram Statistics for Natural Language Understanding and Text Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
Classifying with co-stems: a new representation for information filtering
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Detecting culture in coordinates: cultural areas in social media
Proceedings of the 2011 international workshop on DETecting and Exploiting Cultural diversiTy on the social web
LiveTweet: monitoring and predicting interesting microblog posts
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
Language identification for creating language-specific Twitter collections
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Microblog language identification: overcoming the limitations of short, unedited and idiomatic text
Language Resources and Evaluation
Guidelines for multilingual linked data
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Determining language variant in microblog messages
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Technical Section: EXOD: A tool for building and exploring a large graph of open datasets
Computers and Graphics
Hi-index | 0.00 |
In a multi-language Information Retrieval setting, the knowledge about the language of a user query is important for further processing. Hence, we compare the performance of some typical approaches for language detection on very short, query-style texts. The results show that already for single words an accuracy of more than 80% can be achieved, for slightly longer texts we even observed accuracy values close to 100%.