Extracting position relations from the web

Authors:
Yanhong Liu;Peiquan Jin;Lihua Yue
Affiliations:
University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China
Venue:
Proceedings of the eleventh international workshop on Web information and data management
Year:
2009

Citing 8
Cited 1

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Kernel methods for relation extraction

The Journal of Machine Learning Research
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting relations with integrated information using kernel methods

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
FBK-IRST: kernel methods for semantic relation extraction

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
An alignment-based approach to semi-supervised relation extraction including multiple arguments

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology

A structural approach to extracting Chinese position relations from web pages

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new algorithm to extract people's position in a corporation from the Web. People's position in a corporation, which the term position relation refers to, is a kind of significant competitive intelligence for enterprises. Our algorithm is based on the structural feature of position relation in Web contents, i.e., position relation is usually described in Web pages as a table or a list. In order to define the structural feature of Web contents, we first introduce a structural coefficient for each Web page. This structural coefficient is then used to generate structural file segments from Web pages. A structural file segment consists of all the candidates of position relation with a similar structure. We then employ a pattern-matching method to extract position relations from the structural file segments. Finally, we conduct experiments on a real data set containing 6028 Chinese Web pages gathered through the Baidu search engine, and evaluate the precision and recall of our approach. The experimental results show that our algorithm has a high precision over 96% as well as a recall over 87%.