Style and branding elements extraction from businessweb sites

  • Authors:
  • Limei Jiao;Suk Hwan Lim;Nina Bhatti;Yuhong Xiong;Jerry Liu

  • Affiliations:
  • Hewlett-Packard Laboratories, Beijing, China;Hewlett-Packard Laboratories, Palo Alto, CA, USA;Hewlett-Packard Laboratories, Palo Alto, CA, USA;Innovation Works, Beijing, China;Hewlett-Packard Laboratories, Palo Alto, CA, USA

  • Venue:
  • Proceedings of the 10th ACM symposium on Document engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a method to extract style and branding elements from multiple web pages in a given site for content repurposing. Style and branding elements convey the values of the site owners effectively and connect with the target prospects. They are manifested through logos, graphical elements, background color, font styles, font colors and other illustrations. Our method automatically extracts color and image elements appearing frequently and prominently on multiple pages throughout the site. We rely on a DOM tree matching method to obtain the frequency of re-occurring elements and use relative sizes and positions of elements to determine the type of elements. Note that approximate locations of these elements provide an added clue to the content repurposing engine as to where to place the elements in the repurposed document. The obtained results show that the proposed method can efficiently extract style and branding elements with high accuracy.