Named entity recognition and identification for finding the owner of a home page

Authors:
Vassilis Plachouras;Matthieu Rivière;Michalis Vazirgiannis
Affiliations:
LIX, École Polytechnique, Palaiseau, France,PRESANS, X-TEC, École Polytechnique, Palaiseau, France;PRESANS, X-TEC, École Polytechnique, Palaiseau, France;LIX, École Polytechnique, Palaiseau, France,Dept of Informatics, AUEB, Athens, Greece
Venue:
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Year:
2012

Citing 15
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
2D Conditional Random Fields for Web information extraction

ICML '05 Proceedings of the 22nd international conference on Machine learning
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Named entity recognition with a maximum entropy approach

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Extracting Author Meta-Data from Web Using Visual Features

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Social Network Extraction of Academic Researchers

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Extracting the author of web pages

Proceedings of the 2nd ACM workshop on Information credibility on the web
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic Web Pages Author Extraction

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
On identifying academic homepages for digital libraries

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entity-based applications, such as expert search or online social networks where users search for persons, require high-quality datasets of named entity references. Obtaining such high-quality datasets can be achieved by automatically extracting metadata from Web pages. In this work, we focus on the identification of the named entity that corresponds to the owner of a particular Web page, for example, a home page or an organizational staff Web page. More specifically, from a set of named entities that have already been extracted from a Web page, we identify the one which corresponds to the owner of the home page. First, we develop a set of features which are combined in a scoring function to select the named entity of the Web page owner. Second, we formulate the problem as a classification problem in which a pair of a Web page and named entity is classified as being associated or not. We evaluate the proposed approaches on a set of Web pages in which we have previously identified named entities. Our experimental results show that we can identify the named entity corresponding to the owner of a home page with accuracy over 90%.