Text compression
Representation and learning in information retrieval
Representation and learning in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text genre classification with genre-revealing and subject-revealing features
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic text categorization in terms of genre and author
Computational Linguistics
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Detecting and supporting known item queries in online public access catalogs
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Segmenting documents by stylistic character
Natural Language Engineering
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A machine learning approach to reading level assessment
Computer Speech and Language
A Variant of N-Gram Based Language Classification
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
ePaper: A personalized mobile newspaper
Journal of the American Society for Information Science and Technology
Cuisine: Classification using stylistic feature sets and-or name-based feature sets
Journal of the American Society for Information Science and Technology
Measuring the interestingness of articles in a limited user environment
Information Processing and Management: an International Journal
Gender attribution: tracing stylometric evidence beyond topic and genre
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Using relative entropy for authorship attribution
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A static technique for fault localization using character n-gram based information retrieval model
Proceedings of the 5th India Software Engineering Conference
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
An interdisciplinary VR-architecture for 3D chatting with non-verbal communication
EGVE - JVRC'11 Proceedings of the 17th Eurographics conference on Virtual Environments & Third Joint Virtual Reality
Hi-index | 0.00 |
We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple information theoretic principles and achieves effective performance across a variety of languages and tasks without requiring feature selection or extensive pre-processing. To demonstrate the language and task independence of the proposed technique, we present experimental results on several languages---Greek, English, Chinese and Japanese---in several text categorization problems---language identification, authorship attribution, text genre classification, and topic detection. Our experimental results show that the simple approach achieves state of the art performance in each case.