WikiTopics: what is popular on Wikipedia and why

Authors:
Byung Gyu Ahn;Benjamin Van Durme;Chris Callison-Burch
Affiliations:
Johns Hopkins University;Johns Hopkins University;Johns Hopkins University
Venue:
WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
Year:
2011

Citing 10
Cited 1

An investigation of linguistic features and clustering algorithms for topical document clustering

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Do summaries help?

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
NLTK: the Natural Language Toolkit

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Tracking and summarizing news on a daily basis with Columbia's Newsblaster

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to Information Retrieval

Introduction to Information Retrieval
Multi-view clustering via canonical correlation analysis

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic generation of story highlights

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

An approach for using Wikipedia to measure the flow of trends across countries

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

We establish a novel task in the spirit of news summarization and topic detection and tracking (TDT): daily determination of the topics newly popular with Wikipedia readers. Central to this effort is a new public dataset consisting of the hourly page view statistics of all Wikipedia articles over the last three years. We give baseline results for the tasks of: discovering individual pages of interest, clustering these pages into coherent topics, and extracting the most relevant summarizing sentence for the reader. When compared to human judgements, our system shows the viability of this task, and opens the door to a range of exciting future work.