A study of automation from seed URL generation to focused web archive development: the CTRnet context

  • Authors:
  • Seungwon Yang;Kiran Chitturi;Gregory Wilson;Mohamed Magdy;Edward A. Fox

  • Affiliations:
  • Virginia Tech, Blacksburg, VA, USA;Virginia Tech, Blacksburg, VA, USA;Virginia Tech, Blacksburg, VA, USA;Virginia Tech, Blacksburg, VA, USA;Virginia Tech, Blacksburg, VA, USA

  • Venue:
  • Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the event of emergencies and disasters, massive amounts of web resources are generated and shared. Due to the rapidly changing nature of those resources, it is important to start archiving them as soon as a disaster occurs. This led us to develop a prototype system for constructing archives with minimum human intervention using the seed URLs extracted from tweet collections. We present the details of our prototype system. We applied it to five tweet collections that had been developed in advance, for evaluation. We also identify five categories of non- relevant files and conclude with a discussion of findings from the evaluation.