Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Use of support vector machines in extended named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Named entity recognition with a maximum entropy approach
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Extracting Author Meta-Data from Web Using Visual Features
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Social Network Extraction of Academic Researchers
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Extracting the author of web pages
Proceedings of the 2nd ACM workshop on Information credibility on the web
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic Web Pages Author Extraction
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
On identifying academic homepages for digital libraries
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Hi-index | 0.00 |
Entity-based applications, such as expert search or online social networks where users search for persons, require high-quality datasets of named entity references. Obtaining such high-quality datasets can be achieved by automatically extracting metadata from Web pages. In this work, we focus on the identification of the named entity that corresponds to the owner of a particular Web page, for example, a home page or an organizational staff Web page. More specifically, from a set of named entities that have already been extracted from a Web page, we identify the one which corresponds to the owner of the home page. First, we develop a set of features which are combined in a scoring function to select the named entity of the Web page owner. Second, we formulate the problem as a classification problem in which a pair of a Web page and named entity is classified as being associated or not. We evaluate the proposed approaches on a set of Web pages in which we have previously identified named entities. Our experimental results show that we can identify the named entity corresponding to the owner of a home page with accuracy over 90%.