Search the past with the portuguese web archive

  • Authors:
  • Daniel Gomes;David Cruz;João Miranda;Miguel Costa;Simão Fontes

  • Affiliations:
  • Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal

  • Venue:
  • Proceedings of the 22nd international conference on World Wide Web companion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The web was invented to quickly exchange data between scientists, but it became a crucial communication tool to connect the world. However, the web is extremely ephemeral. Most of the information published online becomes quickly unavailable and is lost forever. There are several initiatives worldwide that struggle to archive information from the web before it vanishes. However, search mechanisms to access this information are still limited and do not satisfy their users who demand performance similar to live-web search engines. This demo presents the Portuguese Web Archive, which enables search over 1.2 billion files archived from 1996 to 2012. It is the largest full-text searchable web archive publicly available [17]. The software developed to support this service is also publicly available as a free open source project at Google Code, so that it can be reused and enhanced by other web archivists. A short video about the Portuguese Web Archive is available at vimeo.com/59507267. The service can be tried live at archive.pt.