Using LSI for text classification in the presence of background text

Authors:
Sarah Zelikovitz;Haym Hirsh
Affiliations:
Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ
Venue:
Proceedings of the tenth international conference on Information and knowledge management
Year:
2001

Citing 12
Cited 35

Personalized information delivery: an analysis of information filtering methods

Communications of the ACM - Special issue on information filtering
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Using linear algebra for intelligent information retrieval

SIAM Review
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Machine learning in automated text categorisation

Machine learning in automated text categorisation

An infrastructure for open latent semantic linking

Proceedings of the thirteenth ACM conference on Hypertext and hypermedia
Integrating Background Knowledge into Nearest-Neighbor Text Classification

ECCBR '02 Proceedings of the 6th European Conference on Advances in Case-Based Reasoning
Combining clustering and co-training to enhance text classification using unlabelled data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning similarity measures in non-orthogonal space

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Application Resource Requirement Estimation in a Parallel-Pipeline Model of Execution

IEEE Transactions on Parallel and Distributed Systems
A framework for understanding latent semantic indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Semantic indexing in structured peer-to-peer networks

Journal of Parallel and Distributed Computing
SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning

Knowledge-Based Systems
Latent semantic analysis for text categorization using neural network

Knowledge-Based Systems
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Genetic algorithm for text clustering based on latent semantic indexing

Computers & Mathematics with Applications
Parametric and nonparametric evolutionary computing with a content-based feature selection approach for parallel categorization

Expert Systems with Applications: An International Journal
On robustness and domain adaptation using SVD for word sense disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Supervised domain adaption for WSD

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Supervised latent semantic indexing using adaptive sprinkling

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Evaluation of video news classification techniques for automatic content personalisation

International Journal of Advanced Media and Communication
Integrating background knowledge into text classification

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Video news classification for automatic content personalization: a genetic algorithm based approach

Proceedings of the 14th Brazilian Symposium on Multimedia and the Web
Improving Text Classification Performance with Incremental Background Knowledge

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Domain kernels for text categorization

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
A framework for understanding Latent Semantic Indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Comparability of LSI and human judgment in text analysis tasks

MMACTEE'09 Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering
Purging false negatives in cancer diagnosis using incremental active learning

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
A propositional approach to textual case indexing

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Vocabulary completion through word cooccurrence analysis using unlabeled documents for text categorization

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Recommending library methods: an evaluation of the vector space model (VSM) and latent semantic indexing (LSI)

ICSR'06 Proceedings of the 9th international conference on Reuse of Off-the-Shelf Components
Which should we try first? ranking information resources through query classification

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Sprinkling: supervised latent semantic indexing

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Selective integration of background knowledge in TCBR systems

ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
A two-stage feature selection method for text categorization

Computers & Mathematics with Applications
High performance query expansion using adaptive co-training

Information Processing and Management: an International Journal
A Heuristic Method for Learning Path Sequencing for Intelligent Tutoring System ITS in E-learning

International Journal of Intelligent Information Technologies
Enhancing short text clustering with small external repositories

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Parallel Training of An Improved Neural Network for Text Categorization

International Journal of Parallel Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process.