Scalable community discovery on textual data with relations

Authors:
Huajing Li;Zaiqing Nie;Wang-Chien Lee;Lee Giles;Ji-Rong Wen
Affiliations:
The Pennsylvania State University, University Park, PA, USA;Microsoft Research Asia, Beijing, China;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 17
Cited 9

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Hubs, authorities, and communities

ACM Computing Surveys (CSUR)
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering spatial data using random walks

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Web community mining and web log mining: commodity cluster based execution

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Learning to Probabilistically Identify Authoritative Documents

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Clustering and Identifying Temporal Trends in Document Databases

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Latent dirichlet allocation

The Journal of Machine Learning Research
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Group and topic discovery from relations and text

Proceedings of the 3rd international workshop on Link discovery
A latent mixed membership model for relational data

Proceedings of the 3rd international workshop on Link discovery
Extraction and classification of dense communities in the web

Proceedings of the 16th international conference on World Wide Web
Structural and temporal analysis of the blogosphere through community factorization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Community-based ranking of the social web

Proceedings of the 21st ACM conference on Hypertext and hypermedia
On community outliers and their efficient detection in information networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining topics on participations for community discovery

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A framework for joint community detection across multiple related networks

Neurocomputing
Context-based friend suggestion in online photo-sharing community

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Literature search through mixed-membership community discovery

SBP'10 Proceedings of the Third international conference on Social Computing, Behavioral Modeling, and Prediction
Leveraging network structure for incremental document clustering

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A framework for exploring organizational structure in dynamic social networks

Decision Support Systems
Combining Relations and Text in Scientific Network Clustering

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Every piece of textual data is generated as a method to convey its authors' opinion regarding specific topics. Authors deliberately organize their writings and create links, i.e., references, acknowledgments, for better expression. Thereafter, it is of interest to study texts as well as their relations to understand the underlying topics and communities. Although many efforts exist in the literature in data clustering and topic mining, they are not applicable to community discovery on large document corpus for several reasons. First, few of them consider both textual attributes as well as relations. Second, scalability remains a significant issue for large-scale datasets. Additionally, most algorithms rely on a set of initial parameters that are hard to be captured and tuned. Motivated by the aforementioned observations, a hierarchical community model is proposed in the paper which distinguishes community cores from affiliated members. We present our efforts to develop a scalable community discovery solution for large-scale document corpus. Our proposal tries to quickly identify potential cores as seeds of communities through relation analysis. To eliminate the influence of initial parameters, an innovative attribute-based core merge process is introduced so that the algorithm promises to return consistent communities regardless initial parameters. Experimental results suggest that the proposed method has high scalability to corpus size and feature dimensionality, with more than 15 topical precision improvement compared with popular clustering techniques.