Cross-Language Latent Relational Search between Japanese and English Languages Using a Web Corpus

  • Authors:
  • Nguyen Tuan Duc;Danushka Bollegala;Mitsuru Ishizuka

  • Affiliations:
  • The University of Tokyo;The University of Tokyo;The University of Tokyo

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent relational search is a novel entity retrieval paradigm based on the proportional analogy between two entity pairs. Given a latent relational search query {(Japan, Tokyo), (France, ?)}, a latent relational search engine is expected to retrieve and rank the entity “Paris” as the first answer in the result list. A latent relational search engine extracts entities and relations between those entities from a corpus, such as the Web. Moreover, from some supporting sentences in the corpus, (e.g., “Tokyo is the capital of Japan” and “Paris is the capital and biggest city of France”), the search engine must recognize the relational similarity between the two entity pairs. In cross-language latent relational search, the entity pairs as well as the supporting sentences of the first entity pair and of the second entity pair are in different languages. Therefore, the search engine must recognize similar semantic relations across languages. In this article, we study the problem of cross-language latent relational search between Japanese and English using Web data. To perform cross-language latent relational search in high speed, we propose a multi-lingual indexing method for storing entities and lexical patterns that represent the semantic relations extracted from Web corpora. We then propose a hybrid lexical pattern clustering algorithm to capture the semantic similarity between lexical patterns across languages. Using this algorithm, we can precisely measure the relational similarity between entity pairs across languages, thereby achieving high precision in the task of cross-language latent relational search. Experiments show that the proposed method achieves an MRR of 0.605 on Japanese-English cross-language latent relational search query sets and it also achieves a reasonable performance on the INEX Entity Ranking task.