Cluster-based concept invention for statistical relational learning

Authors:
Alexandrin Popescul;Lyle H. Ungar
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 9
Cited 12

Algorithmic Program DeBugging

Algorithmic Program DeBugging
Relational Data Mining

Relational Data Mining
Distance based approaches to relational learning and clustering

Relational Data Mining
Propositionalization approaches to relational data mining

Relational Data Mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Relational Learning with Statistical Predicate Invention: Better Models for Hypertext

Machine Learning
Clustering and Identifying Temporal Trends in Document Databases

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Aggregation-based feature invention and relational concept classes

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical learning from relational databases

Statistical learning from relational databases

Streaming feature selection using alpha-investing

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Leveraging relational autocorrelation with latent group models

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Leveraging Relational Autocorrelation with Latent Group Models

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Distribution-based aggregation for relational learning with identifier attributes

Machine Learning
Linear prediction models with graph regularization for web-page categorization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Streamwise Feature Selection

The Journal of Machine Learning Research
Statistical predicate invention

Proceedings of the 24th international conference on Machine learning
First-Order Probabilistic Languages: Into the Unknown

Inductive Logic Programming
Learning Markov logic network structure via hypergraph lifting

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Change of representation for statistical relational learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research
Feature enrichment and selection for transductive classification on networked data

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning. Entities derived from clusters increase the expressivity of feature spaces by creating new first-class concepts which contribute to the creation of new features. For example, in CiteSeer, papers can be clustered based on words or citations giving "topics", and authors can be clustered based on documents they co-author giving "communities". Such cluster-derived concepts become part of more complex feature expressions. Out of the large number of generated features, those which improve predictive accuracy are kept in the model, as decided by statistical feature selection criteria. We present results demonstrating improved accuracy on two tasks, venue prediction and link prediction, using CiteSeer data.