Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Natural communities in large linked networks
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The automated acquisition of topic signatures for text summarization
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A graph-theoretic approach to extract storylines from search results
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Partitioning of Web graphs by community topology
WWW '05 Proceedings of the 14th international conference on World Wide Web
Using term informativeness for named entity detection
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Detection of emerging space-time clusters
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Probabilistic models for discovering e-communities
Proceedings of the 15th international conference on World Wide Web
Bibliometric impact measures leveraging topic analysis
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering scientific literature using sparse citation graph analysis
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Studying the history of ideas using topic models
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Detecting topic evolution in scientific literature: how can citations help?
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic topic detection with an incremental clustering algorithm
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
The web of topics: discovering the topology of topic evolution in a corpus
Proceedings of the 20th international conference on World wide web
DVD: a model for event diversified versions discovery
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Phrases as subtopical concepts in scholarly text
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
An efficient algorithm for topic ranking and modeling topic evolution
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Indices of novelty for emerging topic detection
Information Processing and Management: an International Journal
Hi-index | 0.00 |
In this paper we address the problem of detecting topics in large-scale linked document collections. Recently, topic detection has become a very active area of research due to its utility for information navigation, trend analysis, and high-level description of data. We present a unique approach that uses the correlation between the distribution of a term that represents a topic and the link distribution in the citation graph where the nodes are limited to the documents containing the term. This tight coupling between term and graph analysis is distinguished from other approaches such as those that focus on language models. We develop a topic score measure for each term, using the likelihood ratio of binary hypotheses based on a probabilistic description of graph connectivity. Our approach is based on the intuition that if a term is relevant to a topic, the documents containing the term have denser connectivity than a random selection of documents. We extend our algorithm to detect a topic represented by a set of terms, using the intuition that if the co-occurrence of terms represents a new topic, the citation pattern should exhibit the synergistic effect. We test our algorithm on two electronic research literature collections,arXiv and Citeseer.Our evaluation shows that the approach is effective and reveals some novel aspects of topic detection.