Revision graph extraction in Wikipedia based on supergram decomposition

Authors:
Jianmin Wu;Mizuho Iwaihara
Affiliations:
Waseda University, kitakyushu, Fukuoka, Japan;Waseda University, kitakyushu, Fukuoka, Japan
Venue:
Proceedings of the 9th International Symposium on Open Collaboration
Year:
2013

Citing 6
Cited 0

On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Detecting near-duplicates for web crawling

Proceedings of the 16th international conference on World Wide Web
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Structuring wiki revision history

Proceedings of the 2007 international symposium on Wikis
Cooperation and quality in wikipedia

Proceedings of the 2007 international symposium on Wikis
Staying in the loop: structure and dynamics of Wikipedia's breaking news collaborations

Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration

Quantified Score

Hi-index	0.00

Visualization

Abstract

As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods.