Extracting the author of web pages

Authors:
Yoshikiyo Kato;Daisuke Kawahara;Kentaro Inui;Sadao Kurohashi;Tomohide Shibata
Affiliations:
National Institute of Information and Communications Technology, Seika, Soraku, Kyoto, Japan;National Institute of Information and Communications Technology, Seika, Soraku, Kyoto, Japan;National Institute of Information and Communications Technology, Seika, Soraku, Kyoto, Japan;National Institute of Information and Communications Technology / Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan
Venue:
Proceedings of the 2nd ACM workshop on Information credibility on the web
Year:
2008

Citing 9
Cited 4

Knowledge-based metadata extraction from PostScript files

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to extract information from semi-structured text using a discriminative context free grammar

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques

IEEE Transactions on Knowledge and Data Engineering
Joint optimization of wrapper generation and template detection

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Tree-structured conditional random fields for semantic annotation

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Using web page layout for extraction of sender names

Proceedings of the 3rd International Universal Communication Symposium
WISDOM: a web information credibility analysis system

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Automatic Web Pages Author Extraction

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Named entity recognition and identification for finding the owner of a home page

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we define the problem of identifying the author of a Web page as a sub-problem of identifying the information sender configuration of a Web page. We propose a method that extracts the author name candidates from a Web page based on linguistic features, and rank the candidates based on local features such as distance from the main content. The evaluation shows that we can achieve more than 75% precision when evaluated with candidates ranked within top five.