Web Archiving
On the value of temporal information in information retrieval
ACM SIGIR Forum
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia
Proceedings of the 13th International Conference on Extending Database Technology
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
WikiWars: a new corpus for research on temporal expressions
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
TimeTrails: a system for exploring spatio-temporal information in documents
Proceedings of the VLDB Endowment
YAGO2: exploring and querying world knowledge in time, space, context, and many languages
Proceedings of the 20th international conference companion on World wide web
Harvesting facts from textual web sources by constrained label propagation
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Web-preservation organization like the Internet Archive not only capture the history of born-digital content but also reflect the zeitgeist of different time periods over more than a decade. This longitudinal data is a potential gold mine for researchers like sociologists, politologists, media and market analysts, or experts on intellectual property. The LAWA project (Longitudinal Analytics of Web Archive data) is developing an Internet-based experimental testbed for large-scale data analytics on Web archive collections. Its emphasis is on scalable methods for this specific kind of big-data analytics, and software tools for aggregating, querying, mining, and analyzing Web contents over long epochs. In this paper, we highlight our research on {\em entity-level analytics} in Web archive data, which lifts Web analytics from plain text to the entity-level by detecting named entities, resolving ambiguous names, extracting temporal facts and visualizing entities over extended time periods. Our results provide key assets for tracking named entities in the evolving Web, news, and social media.