Reformatting web documents via header trees

  • Authors:
  • Minoru Yoshida;Hiroshi Nakagawa

  • Affiliations:
  • University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan

  • Venue:
  • ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new method for reformatting web documents by extracting semantic structures from web pages. Our approach is to extract trees that describe hierarchical relations in documents. We developed an algorithm for this task by employing the EM algorithm and clustering techniques. Preliminary experiments showed that our approach was more effective than baseline methods.