Building semantic kernels for text classification using wikipedia

Authors:
Pu Wang;Carlotta Domeniconi
Affiliations:
George Mason University, Fairfax, USA;George Mason University, Fairfax, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 14
Cited 43

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Support Vector Machines Based on a Semantic Kernel for Text Categorization

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 5 - Volume 5
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The Wikipedia XML corpus

ACM SIGIR Forum
Mining Domain-Specific Thesauri from Wikipedia: A Case Study

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Improving Text Classification by Using Encyclopedia Knowledge

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Reviewer Recommendation System Based on Collaborative Intelligence

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Exploring Flickr's related tags for semantic annotation of web images

Proceedings of the ACM International Conference on Image and Video Retrieval
Linking Wikipedia entries to blog feeds by machine learning

Proceedings of the 3rd International Universal Communication Symposium
Creating User Profiles Using Wikipedia

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Automatic content-based categorization of Wikipedia articles

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Content-enriched classifier for web video classification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A probabilistic topic-connection model for automatic image annotation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Semantics-based representation model for multi-layer text classification

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Linking topics of news and blogs with wikipedia for complementary navigation

BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
A semantic term weighting scheme for text categorization

Expert Systems with Applications: An International Journal
A generalized method for word sense disambiguation based on wikipedia

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Unsupervised feature weighting based on local feature relatedness

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
High-order co-clustering text data on semantics-based representation model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Multilingual document clustering using wikipedia as external knowledge

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
A multi-layer text classification framework based on two-level representation model

Expert Systems with Applications: An International Journal
Text clustering based on granular computing and wikipedia

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Towards high-quality semantic entity detection over online forums

SocInfo'11 Proceedings of the Third international conference on Social informatics
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Two birds with one stone: learning semantic models for text categorization and word sense disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia

ACM Transactions on Intelligent Systems and Technology (TIST)
Wikipedia-based semantic smoothing for the language modeling approach to information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Document classification with multi-layered immune principle

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part I
Efficient semantic kernel-based text classification using matching pursuit KFDA

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Concept labeling: building text classifiers with minimal supervision

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Unsupervised multi-label text classification using a world knowledge ontology

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Learning a concept-based document similarity measure

Journal of the American Society for Information Science and Technology
Biomedical text categorization with concept graph representations using a controlled vocabulary

Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
The CQC algorithm: cycling in graphs to semantically enrich and enhance a bilingual dictionary

Journal of Artificial Intelligence Research
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence
WikiSent: weakly supervised sentiment analysis through extractive summarization with wikipedia

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Abstracting for Dimensionality Reduction in Text Classification

International Journal of Intelligent Systems
Building Multi-Modal Relational Graphs for Multimedia Retrieval

International Journal of Multimedia Data Engineering & Management
Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving semi-supervised text classification by using wikipedia knowledge

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Mapping semantic knowledge for unsupervised text categorisation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Improving question retrieval in community question answering using world knowledge

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Relational term-suggestion graphs incorporating multipartite concept and expertise networks

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Knowledge-based graph document modeling

Proceedings of the 7th ACM international conference on Web search and data mining
Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
PSG: a two-layer graph model for document summarization

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW technique, and to other recently developed methods.