Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data
Knowledge and Information Systems
Classification in Networked Data: A Toolkit and a Univariate Case Study
The Journal of Machine Learning Research
Mixed Membership Stochastic Blockmodels
The Journal of Machine Learning Research
Scalable learning of collective behavior based on sparse social dimensions
Proceedings of the 18th ACM conference on Information and knowledge management
Learning similarity metrics for event identification in social media
Proceedings of the third ACM international conference on Web search and data mining
Semi-Supervised Classification of Network Data Using Very Few Labels
ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Ranking-based classification of heterogeneous information networks
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Social networks are naturally represented as heterogeneous networks with multiple types of objects e.g., actors, items and multiple types of links e.g., links between actors that denote social ties e.g., friendship, and links that connect actors to items e.g., photos, videos, articles, etc. that denote relationships between actors and items. In this paper, we consider the task of assigning labels to the unlabeled actors (individuals) in a large heterogeneous social network in which labels are available for a subset of actors. Specifically, we seek to learn a predictive model to label actors based on the attributes of the actors themselves and/or items that are linked to them in the network. Unfortunately, the number of distinct items, represented in real-world networks such as Facebook or Flickr is quite large (in the millions) although only a small subset of them are linked to specific actors. This leads to data sparsity which causes over-fitting and hence poor performance in predicting the labels of unlabeled actors. To address this problem, we induce hierarchical taxonomies over items and use the resulting taxonomies as a basis for selecting abstract and hence parsimonious representations of network data for learning the predictive models. Our experiments using three different predictors (Iterative classification Naïve Bayes, Iterative classification Logistic Regression, and EdgeCluster) on two real-world data sets, Last.fm and Flickr, show that the predictive models that take advantage of abstract representations of network data are competitive with, and in some cases, outperform those that do not.