Web object identification for web automation and meta-search

  • Authors:
  • Iraklis Kordomatis;Christoph Herzog;Ruslan R. Fayzrakhmanov;Bernhard Krüpl-Sypien;Wolfgang Holzinger;Robert Baumgartner

  • Affiliations:
  • Vienna University of Technology;Vienna University of Technology;Vienna University of Technology;Vienna University of Technology;Vienna University of Technology;Vienna University of Technology

  • Venue:
  • Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web object identification plays an important role in research fields such as information extraction, web automation, and web form understanding for building meta-search engines. In contrast to other works, we approach this problem by analyzing various spatial, visual, functional and textual characteristics of web pages. We compute 49 unique features for all visible web page elements, which are then applied to machine learning classifiers in order to identify similar elements on other previously unexamined web pages. We evaluate our approach with different scenarios by analyzing the relevance of the chosen features and the classification rate of the applied classifiers. These scenarios focus on understanding search forms from the transportation domain, particularly flight, train, and bus connections. The results of the evaluation are very promising.