Detecting research topics via the correlation between graphs and texts

Authors:
Yookyung Jo;Carl Lagoze;C. Lee Giles
Affiliations:
Cornell University;Cornell University;Pennsylvania State University
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 16
Cited 9

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Natural communities in large linked networks

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A graph-theoretic approach to extract storylines from search results

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Partitioning of Web graphs by community topology

WWW '05 Proceedings of the 14th international conference on World Wide Web
Using term informativeness for named entity detection

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Detection of emerging space-time clusters

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Probabilistic models for discovering e-communities

Proceedings of the 15th international conference on World Wide Web
Bibliometric impact measures leveraging topic analysis

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering scientific literature using sparse citation graph analysis

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Studying the history of ideas using topic models

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Detecting topic evolution in scientific literature: how can citations help?

Proceedings of the 18th ACM conference on Information and knowledge management
Automatic topic detection with an incremental clustering algorithm

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
The web of topics: discovering the topology of topic evolution in a corpus

Proceedings of the 20th international conference on World wide web
DVD: a model for event diversified versions discovery

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Phrases as subtopical concepts in scholarly text

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
An efficient algorithm for topic ranking and modeling topic evolution

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Indices of novelty for emerging topic detection

Information Processing and Management: an International Journal
Using `core documents' for detecting and labelling new emerging topics

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of detecting topics in large-scale linked document collections. Recently, topic detection has become a very active area of research due to its utility for information navigation, trend analysis, and high-level description of data. We present a unique approach that uses the correlation between the distribution of a term that represents a topic and the link distribution in the citation graph where the nodes are limited to the documents containing the term. This tight coupling between term and graph analysis is distinguished from other approaches such as those that focus on language models. We develop a topic score measure for each term, using the likelihood ratio of binary hypotheses based on a probabilistic description of graph connectivity. Our approach is based on the intuition that if a term is relevant to a topic, the documents containing the term have denser connectivity than a random selection of documents. We extend our algorithm to detect a topic represented by a set of terms, using the intuition that if the co-occurrence of terms represents a new topic, the citation pattern should exhibit the synergistic effect. We test our algorithm on two electronic research literature collections,arXiv and Citeseer.Our evaluation shows that the approach is effective and reveals some novel aspects of topic detection.