Text categorization by boosting automatically extracted concepts

Authors:
Lijuan Cai;Thomas Hofmann
Affiliations:
Brown University, Providence, RI;Brown University, Providence, RI
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 16
Cited 31

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Latent Semantic Kernels

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Efficient Boosting Algorithm for Combining Preferences

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
ProbMap -- A probabilistic approach for mapping large document collections

Intelligent Data Analysis

Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Web taxonomy integration through co-bootstrapping

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
NEWPAR: an automatic feature selection and weighting schema for category ranking

Proceedings of the 2006 ACM symposium on Document engineering
Text classification improved through multigram models

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Using bag-of-concepts to improve the performance of support vector machines in text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Towards automatic extraction of event and place semantics from flickr tags

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improvement of behavior detection by dynamic threshold

DNCOCO'07 Proceedings of the 9th WSEAS International Conference on Data Networks, Communications, Computers
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Semantic representation of multimedia content: Knowledge representation and semantic indexing

Multimedia Tools and Applications
Personalizing Threshold Values on Behavior Detection with Collaborative Filtering

UIC '08 Proceedings of the 5th international conference on Ubiquitous Intelligence and Computing
Dynamic threshold determination for stable behavior detection

WSEAS Transactions on Computers
Methods for extracting place semantics from Flickr tags

ACM Transactions on the Web (TWEB)
A density-based method for adaptive LDA model selection

Neurocomputing
Classifying Amharic webnews

Information Retrieval
Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering

Expert Systems with Applications: An International Journal
Web Search Clustering and Labeling with Hidden Topics

ACM Transactions on Asian Language Information Processing (TALIP)
Automatic Detecting Documents Containing Personal Health Information

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Classifying Amharic news text using self-organizing maps

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Keepin' it real: semi-supervised learning with realistic tuning

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Learning to integrate web taxonomies

Web Semantics: Science, Services and Agents on the World Wide Web
An evaluation of text retrieval methods for similarity search of multi-dimensional NMR-spectra

BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Behavior detection based on touched objects with dynamic threshold determination model

EuroSSC'07 Proceedings of the 2nd European conference on Smart sensing and context
Coordinate model for text categorization

Transactions on edutainment V
Boosting for text classification with semantic features

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Application of text categorization to astronomy field

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Knowledge evolution course discovery in a professional virtual community

Knowledge-Based Systems
A three-phase method for patent classification

Information Processing and Management: an International Journal
Methods for extracting place semantics from Flickr tags

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Term-based representations of documents have found wide-spread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard lexical semantics and, as a consequence, are not sufficiently robust with respect to variations in word usage.In this paper we investigate the use of concept-based document representations to supplement word- or phrase-based features. The utilized concepts are automatically extracted from documents via probabilistic latent semantic analysis. We propose to use AdaBoost to optimally combine weak hypotheses based on both types of features. Experimental results on standard benchmarks confirm the validity of our approach, showing that AdaBoost achieves consistent improvements by including additional semantic features in the learned ensemble.