Search the past with the portuguese web archive

Authors:
Daniel Gomes;David Cruz;João Miranda;Miguel Costa;Simão Fontes
Affiliations:
Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal;Foundation for National Scientific Computing, Lisbon, Portugal
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 8
Cited 0

Building Nutch: Open Source Search

Queue - Search Engines
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Trend detection through temporal link analysis

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Prioritizing Web Usability

Prioritizing Web Usability
Search User Interfaces

Search User Interfaces
Clustering and exploring search results using timeline constructions

Proceedings of the 18th ACM conference on Information and knowledge management
A survey on web archiving initiatives

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Evaluating web archive search systems

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The web was invented to quickly exchange data between scientists, but it became a crucial communication tool to connect the world. However, the web is extremely ephemeral. Most of the information published online becomes quickly unavailable and is lost forever. There are several initiatives worldwide that struggle to archive information from the web before it vanishes. However, search mechanisms to access this information are still limited and do not satisfy their users who demand performance similar to live-web search engines. This demo presents the Portuguese Web Archive, which enables search over 1.2 billion files archived from 1996 to 2012. It is the largest full-text searchable web archive publicly available [17]. The software developed to support this service is also publicly available as a free open source project at Google Code, so that it can be reused and enhanced by other web archivists. A short video about the Portuguese Web Archive is available at vimeo.com/59507267. The service can be tried live at archive.pt.