Representing a web page as sets of named entities of multiple types: a model and some preliminary applications

Authors:
Nan Di;Conglei Yao;Mengcheng Duan;Jonathan Zhu;Xiaoming Li
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;City University of Hong Kong, Kowloon, Hong Kong;Peking University, Beijing, China
Venue:
Proceedings of the 17th international conference on World Wide Web
Year:
2008

Citing 3
Cited 0

Improving pseudo-relevance feedback in web information retrieval using web page segmentation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Relation extraction using label propagation based semi-supervised learning

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

As opposed to representing a document as a "bag of words" in most information retrieval applications, we propose a model of representing a web page as sets of named entities of multiple types. Specifically, four types of named entities are extracted, namely person, geographic location, organization, and time. Moreover, the relations among these entities are also extracted, weighted, classified and marked by labels. On top of this model, some interesting applications are demonstrated. In particular, we introduce a notion of person-activity, which contains four different elements: person, location, time and activity. With this notion and based on a reasonably large set of web pages, we are able to show how one person's activities can be attributed by time and location, which gives a good idea of the mobility of the person under question.