Extracting paraphrases of japanese action word of sentence ending part from web and mobile news articles

Authors:
Hiroshi Nakagawa;Hidetaka Masuda
Affiliations:
Information Technology Center, The University of Tokyo, Tokyo, Japan;Tokyo Denki University, Tokyo, Japan
Venue:
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Year:
2004

Citing 7
Cited 0

DIRT @SBT@discovery of inference rules from text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Kiwi: a multilingual usage consultation tool based on internet searching

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Proceedings of the second international workshop on Paraphrasing - Volume 16

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Text simplification for reading assistance: a project note

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Extracting structural paraphrases from aligned monolingual corpora

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Unsupervised segmentation of chinese corpus using accessor variety

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this research, we extract paraphrases from Japanese Web news articles that are long and aimed at displaying on personal computer screens and mobile news articles that are short and compact and aimed at mobile terminals' small screens. We have collected them for more than two years, and aligned them at article level and then at sentence level. As the result, we got more than 88,000 pairs of aligned sentences. Next, we extract paraphrases of the final part of sentences from this aligned corpus. The paraphrases that we try to extract are the sentence final nouns of mobile article sentences and their counterpart expressions of Web article sentences. We extract character strings and word sequences for paraphrases based on branching factor, frequency and length of string. The precision is 90% for highest ranked candidate and 83% to 59% for each top three candidates of 100 most frequently used action nouns.