Person Retrieval on XML Documents by Coreference Analysis Utilizing Structural Features

Authors:
Yumi Yonei;Mizuho Iwaihara;Masatoshi Yoshikawa
Affiliations:
Department of Social Informatics, Graduate School of Informatics, Kyoto University, Kyoto, Japan 606-8501;Department of Social Informatics, Graduate School of Informatics, Kyoto University, Kyoto, Japan 606-8501;Department of Social Informatics, Graduate School of Informatics, Kyoto University, Kyoto, Japan 606-8501
Venue:
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Year:
2008

Citing 6
Cited 0

The EDR electronic dictionary

Communications of the ACM
A maximum entropy approach to natural language processing

Computational Linguistics
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
XML search: languages, INEX and scoring

ACM SIGMOD Record
TopX: efficient and versatile top-k query processing for semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword retrieval of the present day exploits frequencies and positions of search keywords in target documents. As for retrieval by two or more keywords, semantic relation between keywords is important. For retrieving information about a person, it is common to search by a pair of keywords consisting of person's name and his/her attribute of the interest. By using dependency analysis and coreference analysis, correct occurrences of pairs of person and his/her attributes can be retrieved. However, existing natural language analysis does not consider the factor that logical structures of the documents strongly influence probabilistic patterns of coreference. In this paper, we propose a new way of person retrieval by computing a maximum entropy model from linguistic features and structural features, where structural features are learned from probabilistic distribution of coreference over XML document structures. Our method can utilize strong correlation between XML document structures and coreference, thus having superior accuracy than existing methods.