Automated feature generation from structured knowledge

Authors:
Weiwei Cheng;Gjergji Kasneci;Thore Graepel;David Stern;Ralf Herbrich
Affiliations:
University of Marburg, Marburg, Germany;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 15
Cited 3

Using WordNet to disambiguate word senses for text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Journal of Intelligent Information Systems
New Directions in Question Answering

New Directions in Question Answering
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Addressing cold-start problem in recommendation systems

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Question Answering on the Semantic Web

IEEE Intelligent Systems
The YAGO-NAGA approach to knowledge discovery

ACM SIGMOD Record
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Combining instance-based learning and logistic regression for multilabel classification

Machine Learning
Improving web search relevance with semantic features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Boosting for text classification with semantic features

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Learning semantic user profiles from text

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

Unsupervised generation of data mining features from linked open data

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Semantic Labelling for Document Feature Patterns Using Ontological Subjects

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mapping semantic knowledge for unsupervised text categorisation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

The prediction accuracy of any learning algorithm highly depends on the quality of the selected features; but often, the task of feature construction and selection is tedious and nonscalable. In recent years, however, there have been numerous projects with the goal of constructing general-purpose or domain-specific knowledge bases with entity-relationship-entity triples extracted from various Web sources or collected from user communities, e.g. YAGO, DBpedia, Freebase, UMLS, etc. This paper advocates the simple and yet far-reaching idea that the structured knowledge contained in such knowledge bases can be exploited to automatically extract features for general learning tasks. We introduce an expressive graph-based language for extracting features from such knowledge bases and a theoretical framework for constructing feature vectors from the extracted features. Our experimental evaluation on different learning scenarios provides evidence that the features derived through our framework can considerably improve the prediction accuracy, especially when the labeled data at hand is sparse.