Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Topic Identification Using Webpage Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Identifying Document Topics Using the Wikipedia Category Network
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Topic Detection by Clustering Keywords
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Exploring local community structures in large networks
Web Intelligence and Agent Systems
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
Topic-link LDA: joint models of topic and author community
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Detecting Communities in Large Networks by Iterative Local Expansion
CASON '09 Proceedings of the 2009 International Conference on Computational Aspects of Social Networks
Graph Local Clustering for Topic Detection in Web Collections
LA-WEB '09 Proceedings of the 2009 Latin American Web Congress (la-web 2009)
Stochastic local clustering for massive graphs
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
This paper introduces an approach for discovering thematically related document groups (a topic mining task) in massive document collections with the aid of graph local clustering. This can be achieved by viewing a document collection as a directed graph where vertices represent documents and arcs represent connections among these (e.g. hyperlinks). Because a document is likely to have more connections to documents of the same theme, we have assumed that topics have the structure of a graph cluster, i.e. a group of vertices with more arcs to the inside of the group and fewer arcs to the outside of it. So, topics could be discovered by clustering the document graph; we use a local approach to cope with scalability. We also extract properties (keywords and most representative documents) from clusters to provide a summary of the topic. This approach was tested over the Wikipedia collection and we observed that the resulting clusters in fact correspond to topics, which shows that topic mining can be treated as a graph clustering problem.