Extracting 5W1H event semantic elements from Chinese online news

  • Authors:
  • Wei Wang;Dongyan Zhao;Lei Zou;Dong Wang;Weiguo Zheng

  • Affiliations:
  • Institute of Computer Science & Technology, Peking University, Beijing, China and Engineering College of Armed Police of People's Republic of China, Xi'an, China;Institute of Computer Science & Technology, Peking University, Beijing, China and Key Laboratory of Computational Linguistics, Peking University, Ministry of Education, China;Institute of Computer Science & Technology, Peking University, Beijing, China;Institute of Computer Science & Technology, Peking University, Beijing, China;Institute of Computer Science & Technology, Peking University, Beijing, China

  • Venue:
  • WAIM'10 Proceedings of the 11th international conference on Web-age information management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a verb-driven approach to extract 5W1H (Who, What, Whom, When, Where and How) event semantic information from Chinese online news. The main contributions of our work are two-fold: First, given the usual structure of a news story, we propose a novel algorithm to extract topic sentences by stressing the importance of news headline; Second, we extract event facts (i.e. 5W1H) from these topic sentences by applying a rule-based method (verb-driven) and a supervised machine-learning method (SVM). This method significantly improves the predicate-argument structure used in Automatic Content Extraction (ACE) Event Extraction (EE) task by considering valency (dominant capacity to noun phrases) of a Chinese verb. Extensive experiments on ACE 2005 datasets confirm its effectiveness and it also shows a very high scalability, since we only consider the topic sentences and surface text features. Based on this method, we build a prototype system named Chinese News Fact Extractor (CNFE). CNFE is evaluated on a real world corpus containing 30,000 newspaper documents. Experiment results show that CNFE can extract event facts efficiently.