The lixto project: exploring new frontiers of web data extraction

  • Authors:
  • Julien Carme;Michal Ceresna;Oliver Frölich;Georg Gottlob;Tamir Hassan;Marcus Herzog;Wolfgang Holzinger;Bernhard Krüpl

  • Affiliations:
  • Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria;Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria;Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria;Oxford University Computing Laboratory, Oxford, United Kingdom;Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria;Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria;Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria;Database and Artificial Intelligence Group, Vienna University of Technology, Wien, Austria

  • Venue:
  • BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.