Extracting position relations from the web

  • Authors:
  • Yanhong Liu;Peiquan Jin;Lihua Yue

  • Affiliations:
  • University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China

  • Venue:
  • Proceedings of the eleventh international workshop on Web information and data management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a new algorithm to extract people's position in a corporation from the Web. People's position in a corporation, which the term position relation refers to, is a kind of significant competitive intelligence for enterprises. Our algorithm is based on the structural feature of position relation in Web contents, i.e., position relation is usually described in Web pages as a table or a list. In order to define the structural feature of Web contents, we first introduce a structural coefficient for each Web page. This structural coefficient is then used to generate structural file segments from Web pages. A structural file segment consists of all the candidates of position relation with a similar structure. We then employ a pattern-matching method to extract position relations from the structural file segments. Finally, we conduct experiments on a real data set containing 6028 Chinese Web pages gathered through the Baidu search engine, and evaluate the precision and recall of our approach. The experimental results show that our algorithm has a high precision over 96% as well as a recall over 87%.