Extracting Author Meta-Data from Web Using Visual Features

Authors:
Shuyi Zheng;Ding Zhou;Jia Li;C. Lee Giles
Affiliations:
-;-;-;-
Venue:
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Year:
2007

Citing 0
Cited 2

On identifying academic homepages for digital libraries

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Named entity recognition and identification for finding the owner of a home page

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classi- fication problem. A homepage can be treated as a group of information pieces which need to be classified to differ- ent fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual fea- tures of information pieces on a homepage should be suf- ficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature- space based classification. Experimental results demon- strate that utilizing visual features and applying the inter- fields probability model can significantly improve the ex- traction accuracy.