Feature generation for text categorization using world knowledge

Authors:
Evgeniy Gabrilovich;Shaul Markovitch
Affiliations:
Computer Science Department, Technion, Haifa, Israel;Computer Science Department, Technion, Haifa, Israel
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 15
Cited 68

Term clustering of syntactic phrases

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
A constructive induction framework

Proceedings of the sixth international workshop on Machine learning
Feature discovery for problem solving systems

Feature discovery for problem solving systems
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Feature generation for sequence categorization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Feature Generation Using General Constructor Functions

Machine Learning
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automatically Extracting Features for Concept Learning from the Web

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Text classification and named entities for new event detection

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Tackling concept drift by temporal inductive transfer

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Constructing informative prior distributions from domain knowledge in text classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Process-Specific Information for Learning Electronic Negotiation Outcomes

Fundamenta Informaticae
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving text classification for oral history archives with temporal domain knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Just-in-time contextual advertising

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A knowledge-based search engine powered by wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing relevance and revenue in ad search: a query substitution approach

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
BNS feature scaling: an improved representation over tf-idf for svm text classification

Proceedings of the 17th ACM conference on Information and knowledge management
Search advertising using web relevance feedback

Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Knowledge Supervised Text Classification with No Labeled Documents

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Term generalization and synonym resolution for biological abstracts: using the gene ontology for subcellular localization prediction

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Analytical features: a knowledge-based approach to audio feature generation

EURASIP Journal on Audio, Speech, and Music Processing
Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Improving text classification by a sense spectrum approach to term expansion

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
A Novel Conception Based Texts Classification Method

AST '09 Proceedings of the 2009 International e-Conference on Advanced Science and Technology
Automatically assessing review helpfulness

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Importance of semantic representation: dataless classification

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Explanation-based feature construction

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Supervised latent semantic indexing using adaptive sprinkling

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting internal and external semantics for the clustering of short texts using world knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Term generalization and synonym resolution for biological abstracts: using the gene ontology for subcellular localization prediction

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Unsupervised argument identification for Semantic Role Labeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
An ordering of terms based on semantic relatedness

IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics
Geo-mining: discovery of road and transport networks using directional patterns

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A Kernel-based feature weighting for text classification

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A discriminative model for semi-supervised learning

Journal of the ACM (JACM)
Categorizing software engineering knowledge using a combination of SWEBOK and text categorization

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Document classification utilising ontologies and relations between documents

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Fully unsupervised core-adjunct argument classification

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Extract semantic information from Wordnet to improve text classification performance

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Concept-Based Information Retrieval Using Explicit Semantic Analysis

ACM Transactions on Information Systems (TOIS)
Using thesaurus to improve multiclass text classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
User-related tag expansion for web document clustering

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Web Page Summarization for Just-in-Time Contextual Advertising

ACM Transactions on Intelligent Systems and Technology (TIST)
A multi-layer text classification framework based on two-level representation model

Expert Systems with Applications: An International Journal
Detecting Intent of Web Queries Using Questions and Answers in CQA Corpus

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Enhancing accessibility of microblogging messages using semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Naming of image regions for user-friendly image retrieval

ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part I
TODWEB: training-less ontology based deep web source classification

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Strengthening learning algorithms by feature discovery

Information Sciences: an International Journal
Enriching short text representation in microblog for clustering

Frontiers of Computer Science in China
Wikipedia-based semantic smoothing for the language modeling approach to information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Language patterns in the learning of strategies from negotiation texts

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
An experimental comparison of explicit semantic analysis implementations for cross-language retrieval

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Wikipedia-based smoothing for enhancing text clustering

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Exploiting Wikipedia for cross-lingual and multilingual information retrieval

Data & Knowledge Engineering
Classification of short texts by deploying topical annotations

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Short text conceptualization using a probabilistic knowledgebase

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Unsupervised multi-label text classification using a world knowledge ontology

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Learning a concept-based document similarity measure

Journal of the American Society for Information Science and Technology
Process-Specific Information for Learning Electronic Negotiation Outcomes

Fundamenta Informaticae
Language-independent named entity identification using Wikipedia

MM '12 Proceedings of the First Workshop on Multilingual Modeling
Use of adaptive still image descriptors for annotation of video frames

ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
On the connections between explicit semantic analysis and latent semantic analysis

Proceedings of the 21st ACM international conference on Information and knowledge management
Building Multi-Modal Relational Graphs for Multimedia Retrieval

International Journal of Multimedia Data Engineering & Management
Mapping semantic knowledge for unsupervised text categorisation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Semantic contextual advertising based on the open directory project

ACM Transactions on the Web (TWEB)
Relational term-suggestion graphs incorporating multipartite concept and expertise networks

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We enhance machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated features that augment the standard bag of words. Feature generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.