An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
NLTK: the Natural Language Toolkit
ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to Information Retrieval
Introduction to Information Retrieval
Multi-view clustering via canonical correlation analysis
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A comparison of extrinsic clustering evaluation metrics based on formal constraints
Information Retrieval
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic generation of story highlights
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An approach for using Wikipedia to measure the flow of trends across countries
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
We establish a novel task in the spirit of news summarization and topic detection and tracking (TDT): daily determination of the topics newly popular with Wikipedia readers. Central to this effort is a new public dataset consisting of the hourly page view statistics of all Wikipedia articles over the last three years. We give baseline results for the tasks of: discovering individual pages of interest, clustering these pages into coherent topics, and extracting the most relevant summarizing sentence for the reader. When compared to human judgements, our system shows the viability of this task, and opens the door to a range of exciting future work.