Corporator: a tool for creating RSS-based specialized corpora

  • Authors:
  • Cédrick Fairon

  • Affiliations:
  • Centre de traitement automatique du langage, UCLouvain, Belgique

  • Venue:
  • WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new approach and a software for collecting specialized corpora on the Web. This approach takes advantage of a very popular XML-based norm used on the Web for sharing content among websites: RSS (Really Simple Syndication). After a brief introduction to RSS, we explain the interest of this type of data sources in the framework of corpus development. Finally, we present Corporator, an Open Source software which was designed for collecting corpus from RSS feeds.