A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Topic detection and tracking: event-based information organization
Topic detection and tracking: event-based information organization
Temporal and information flow based event detection from social text streams
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter
HICSS '10 Proceedings of the 2010 43rd Hawaii International Conference on System Sciences
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Identifying content for planned events across social media sites
Proceedings of the fifth ACM international conference on Web search and data mining
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
TEDAS: A Twitter-based Event Detection and Analysis System
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
On building a reusable Twitter corpus
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using paraphrases for improving first story detection in news and Twitter
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
Despite the popularity of Twitter for research, there are very few publicly available corpora, and those which are available are either too small or unsuitable for tasks such as event detection. This is partially due to a number of issues associated with the creation of Twitter corpora, including restrictions on the distribution of the tweets and the difficultly of creating relevance judgements at such a large scale. The difficulty of creating relevance judgements for the task of event detection is further hampered by ambiguity in the definition of event. In this paper, we propose a methodology for the creation of an event detection corpus. Specifically, we first create a new corpus that covers a period of 4 weeks and contains over 120 million tweets, which we make available for research. We then propose a definition of event which fits the characteristics of Twitter, and using this definition, we generate a set of relevance judgements aimed specifically at the task of event detection. To do so, we make use of existing state-of-the-art event detection approaches and Wikipedia to generate a set of candidate events with associated tweets. We then use crowdsourcing to gather relevance judgements, and discuss the quality of results, including how we ensured integrity and prevented spam. As a result of this process, along with our Twitter corpus, we release relevance judgements containing over 150,000 tweets, covering more than 500 events, which can be used for the evaluation of event detection approaches.