Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Using LSI for text classification in the presence of background text
Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Using latent semantic indexing to filter spam
Proceedings of the 2003 ACM symposium on Applied computing
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Supervised Latent Semantic Indexing for Document Categorization
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A latent semantic classification model
Proceedings of the 14th ACM international conference on Information and knowledge management
A framework for understanding latent semantic indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Sprinkling: supervised latent semantic indexing
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Adaptive label-driven scaling for latent semantic indexing
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Acquiring Word Similarities with Higher Order Association Mining
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Visualizing and Evaluating Complexity of Textual Case Bases
ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
International Journal of Knowledge and Web Intelligence
Experiments on summary-based opinion classification
CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Distributed representations to detect higher order term correlations in textual content
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Hi-index | 0.00 |
Latent Semantic Indexing (LSI) has been shown to be effective in recovering from synonymy and polysemy in text retrieval applications. However, since LSI ignores class labels of training documents, LSI generated representations are not as effective in classification tasks. To address this limitation, a process called 'sprinkling' is presented. Sprinkling is a simple extension of LSI based on augmenting the set of features using additional terms that encode class knowledge. However, a limitation of sprinkling is that it treats all classes (and classifiers) in the same way. To overcome this, we propose a more principled extension called Adaptive Sprinkling (AS). AS leverages confusion matrices to emphasise the differences between those classes which are hard to separate. The method is tested on diverse classification tasks, including those where classes share ordinal or hierarchical relationships. These experiments reveal that AS can significantly enhance the performance of instance-based techniques (kNN) to make them competitive with the state-of-the-art SVM classifier. The revised representations generated by AS also have a favourable impact on SVM performance.