Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Machine Learning
Hybrid neural plausibility networks for news agents
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
An EM Based Training Algorithm for Cross-Language Text Categorization
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Language model information retrieval with document expansion
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Cluster-based patent retrieval
Information Processing and Management: an International Journal
Semi-supervised single-label text categorization using centroid-based classifiers
Proceedings of the 2007 ACM symposium on Applied computing
Clustering short texts using wikipedia
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An improved centroid classifier for text categorization
Expert Systems with Applications: An International Journal
A general optimization framework for smoothing language models on graph structures
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised text categorization by active search
Proceedings of the 17th ACM conference on Information and knowledge management
Text classification from unlabeled documents with bootstrapping and feature projection techniques
Information Processing and Management: an International Journal
The Set Classification Problem and Solution Methods
ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Using the Web as corpus for self-training text categorization
Information Retrieval
Semisupervised Learning for Computational Linguistics
Semisupervised Learning for Computational Linguistics
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Smoothing document language model with local word graph
Proceedings of the 18th ACM conference on Information and knowledge management
Using Nearest Neighbor Information to Improve Cross-Language Text Classification
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Neighbor-weighted K-nearest neighbor for unbalanced text corpus
Expert Systems with Applications: An International Journal
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Research on Short Text Classification Algorithm Based on Statistics and Rules
ISECS '10 Proceedings of the 2010 Third International Symposium on Electronic Commerce and Security
Summarizing microblogs automatically
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inducing word senses to improve web search result clustering
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
On the difficulty of clustering company tweets
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Summarization as feature selection for document categorization on small datasets
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Sentiment Mining within Social Media for Topic Identification
ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
A New Model for Chinese Short-text Classification Considering Feature Extension
AICI '10 Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence - Volume 02
A Self-enriching Methodology for Clustering Narrow Domain Short Texts
The Computer Journal
Transductive learning for text classification using explicit knowledge models
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Using weighted nearest neighbor to benefit from unlabeled data
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Distributional term representations for short-text categorization
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
During the last decades the Web has become the greatest repository of digital information. In order to organize all this information, several text categorization methods have been developed, achieving accurate results in most cases and in very different domains. Due to the recent usage of Internet as communication media, short texts such as news, tweets, blogs, and product reviews are more common every day. In this context, there are two main challenges; on the one hand, the length of these documents is short, and therefore, the word frequencies are not informative enough, making text categorization even more difficult than usual. On the other hand, topics are changing constantly at a fast rate, causing the lack of adequate amounts of training data. In order to deal with these two problems we consider a text classification method that is supported on the idea that similar documents may belong to the same category. Mainly, we propose a neighborhood consensus classification method that classifies documents by considering their own information as well as information about the category assigned to other similar documents from the same target collection. In particular, the short texts we used in our evaluation are news titles with an average of 8 words. Experimental results are encouraging; they indicate that leveraging information from similar documents helped to improve classification accuracy and that the proposed method is especially useful when labeled training resources are limited.