Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
A self-organizing semantic map for information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Hierarchical neural networks for text categorization (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Self-Organizing Maps
Empirical studies in strategies for Arabic retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Machine Learning
Text categorization by boosting automatically extracted concepts
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Amharic Character Recognition using a Fast Signature Based Algorithm
IV '03 Proceedings of the Seventh International Conference on Information Visualization
Large-scale text categorization by batch mode active learning
Proceedings of the 15th international conference on World Wide Web
Amharic-English Information Retrieval
Evaluation of Multilingual and Multi-modal Information Retrieval
Amharic-English Information Retrieval with Pseudo Relevance Feedback
Advances in Multilingual and Multimodal Information Retrieval
Soft-supervised learning for text classification
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An Amharic stemmer: reducing words to their citation forms
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Dictionary-based amharic: english information retrieval
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Current research issues and trends in non-English Web searching
Information Retrieval
Hi-index | 0.00 |
We present work aimed at compiling an Amharic corpus from the Web and automatically categorizing the texts. Amharic is the second most spoken Semitic language in the World (after Arabic) and used for countrywide communication in Ethiopia. It is highly inflectional and quite dialectally diversified. We discuss the issues of compiling and annotating a corpus of Amharic news articles from the Web. This corpus was then used in three sets of text classification experiments. Working with a less-researched language highlights a number of practical issues that might otherwise receive less attention or go unnoticed. The purpose of the experiments has not primarily been to develop a cutting-edge text classification system for Amharic, but rather to put the spotlight on some of these issues. The first two sets of experiments investigated the use of Self-Organizing Maps (SOMs) for document classification. Testing on small datasets, we first looked at classifying unseen data into 10 predefined categories of news items, and then at clustering it around query content, when taking 16 queries as class labels. The second set of experiments investigated the effect of operations such as stemming and part-of-speech tagging on text classification performance. We compared three representations while constructing classification models based on bagging of decision trees for the 10 predefined news categories. The best accuracy was achieved using the full text as representation. A representation using only the nouns performed almost equally well, confirming the assumption that most of the information required for distinguishing between various categories actually is contained in the nouns, while stemming did not have much effect on the performance of the classifier.