Identifying featured articles in wikipedia: writing style matters

Authors:
Nedim Lipka;Benno Stein
Affiliations:
Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
Proceedings of the 19th international conference on World wide web
Year:
2010

Citing 8
Cited 10

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A content-driven reputation system for the wikipedia

Proceedings of the 16th international conference on World Wide Web
Does it matter who contributes: a study on featured articles in the german wikipedia

Proceedings of the eighteenth conference on Hypertext and hypermedia
Measuring article quality in wikipedia: models and evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Size matters: word count as a measure of quality on wikipedia

Proceedings of the 17th international conference on World Wide Web
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Network analysis of collaboration structure in Wikipedia

Proceedings of the 18th international conference on World wide web
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology

Towards automatic quality assurance in Wikipedia

Proceedings of the 20th international conference companion on World wide web
Classifying with co-stems: a new representation for information filtering

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Detection of text quality flaws as a one-class classification problem

Proceedings of the 20th ACM international conference on Information and knowledge management
Characterizing Wikipedia pages using edit network motif profiles

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Measuring the quality of web content using factual information

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Predicting quality flaws in user-generated content: the case of wikipedia

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Classifying Wikipedia articles using network motif counts and ratios

Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
Tell me more: an actionable quality model for Wikipedia

Proceedings of the 9th International Symposium on Open Collaboration
What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task "Is an article featured or not?" we present a machine learning approach that exploits an article's character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust, straightforward to implement, and outperforms existing solutions. We underpin these claims by an experiment design where, among others, the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation.