Topic mining based on graph local clustering

Authors:
Sara Elena Garza Villarreal;Ramón F. Brena
Affiliations:
Universidad Autónoma de Nuevo León, NL, Mexico;Tec de Monterrey, Monterrey, NL, Mexico
Venue:
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Year:
2011

Citing 15
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Topic Identification Using Webpage Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Clustering the Chilean Web

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Identifying Document Topics Using the Wikipedia Category Network

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Topic Detection by Clustering Keywords

DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Exploring local community structures in large networks

Web Intelligence and Agent Systems
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Topic-link LDA: joint models of topic and author community

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Detecting Communities in Large Networks by Iterative Local Expansion

CASON '09 Proceedings of the 2009 International Conference on Computational Aspects of Social Networks
Graph Local Clustering for Topic Detection in Web Collections

LA-WEB '09 Proceedings of the 2009 Latin American Web Congress (la-web 2009)
Stochastic local clustering for massive graphs

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces an approach for discovering thematically related document groups (a topic mining task) in massive document collections with the aid of graph local clustering. This can be achieved by viewing a document collection as a directed graph where vertices represent documents and arcs represent connections among these (e.g. hyperlinks). Because a document is likely to have more connections to documents of the same theme, we have assumed that topics have the structure of a graph cluster, i.e. a group of vertices with more arcs to the inside of the group and fewer arcs to the outside of it. So, topics could be discovered by clustering the document graph; we use a local approach to cope with scalability. We also extract properties (keywords and most representative documents) from clusters to provide a summary of the topic. This approach was tested over the Wikipedia collection and we observed that the resulting clusters in fact correspond to topics, which shows that topic mining can be treated as a graph clustering problem.