On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Journal of Machine Learning Research
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Bibliometric impact measures leveraging topic analysis
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
ICML '06 Proceedings of the 23rd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic sentiment mixture: modeling facets and opinions in weblogs
Proceedings of the 16th international conference on World Wide Web
Unsupervised prediction of citation influences
Proceedings of the 24th international conference on Machine learning
Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic and role discovery in social networks
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Citation recommendation without author supervision
Proceedings of the fourth ACM international conference on Web search and data mining
Hi-index | 0.00 |
One major goal of text mining is to provide automatic methods to help humans grasp the key ideas in ever-increasing text corpora. To this effect, we propose a statistically well-founded method for identifying the original ideas that a document contributes to a corpus, focusing on self-referential diachronic corpora such as research publications, blogs, email, and news articles. Our statistical model of passage impact defines (interesting) original content through a combination of impact and novelty, and the model is used to identify each document's most original passages. Unlike heuristic approaches, the statistical model is extensible and open to analysis. We evaluate the approach both on synthetic data and on real data in the domains of research publications and news, showing that the passage impact model outperforms a heuristic baseline method.