On supervised mining of dynamic content-based networks1

Authors:
Charu C. Aggarwal;Nan Li
Affiliations:
IBM T. J. Watson Research Center, Hawthorne, NY 10532, USA;Department of Computer Science, University of California, Santa Barbara, Santa Barbara, CA 93106, USA
Venue:
Statistical Analysis and Data Mining
Year:
2012

Citing 18
Cited 0

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
On the collective classification of email "speech acts"

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Linear prediction models with graph regularization for web-page categorization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Applying link-based classification to label blogs

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Effective label acquisition for collective classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment
Managing and Mining Graph Data

Managing and Mining Graph Data
Social Network Data Analytics

Social Network Data Analytics
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, a large amount of information has become available online in the form of web documents, social networks, or blogs. Such networks are large, heterogeneous, and often contain a huge number of links. This linkage structure encodes rich structural information about the topical behavior of the network. Such networks are often dynamic and evolve rapidly over time. Much of the work in the literature has focused on classification either with purely text behavior or with purely linkage behavior. Furthermore, the work in the literature is mostly designed for static networks. However, a given network may be quite diverse, and the use of either content or structure could be more or less effective in different parts of the network. In this paper, we examine the problem of node classification in dynamic information networks with both text content and links. Our techniques use a random walk approach in conjunction with the content of the network to facilitate an effective classification process. Our approach is dynamic, and can be applied to networks which are updated incrementally. Our results suggest that an approach based on both content and links is extremely robust and effective. We also present methods to perform supervised keyword-based clustering of nodes using this approach. We present experimental results illustrating the effectiveness and efficiency of our classification approach. We also show that the approach is able to find effective and coherent clusters. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 16–34, 2012, © 2012 Wiley Periodicals, Inc. (This paper is an extended version of Ref.[1].)