Development of an automatic trend exploration system using the MuST data collection

  • Authors:
  • Masaki Murata;Qing Ma;Toshiyuki Kanamaru;Hitoshi Isahara;Koji Ichii;Tamotsu Shirado;Sachiyo Tsukawaki

  • Affiliations:
  • National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto, Japan;Ryukoku University, Otsu, Shiga, Japan and National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto, Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto, Japan and Kyoto University, Yoshida-nihonmatsu-cho, Sakyo-ku, Kyoto, Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto, Japan;Hiroshima University, Higashi-hiroshima, Hiroshima, Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto, Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto, Japan

  • Venue:
  • IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The automatic extraction of trend information from text documents such as newspaper articles would be useful for exploring and examining trends. To enable this, we used data sets provided by a workshop on multimodal summarization for trend information (the MuST Workshop) to construct an automatic trend exploration system. This system first extracts units, temporals, and item expressions from newspaper articles, then it extracts sets of expressions as trend information, and finally it arranges the sets and displays them in graphs. For example, when documents concerning the politics are given, the system extracts "%" and "Cabinet approval rating" as a unit and an item expression including temporal expressions. It next extracts values related to "%". Finally, it makes a graph where temporal expressions are used for the horizontal axis and the value of percentage is shown on the vertical axis. This graph indicates the trend of Cabinet approval rating and is useful for investigating Cabinet approval rating. Graphs are obviously easy to recognize and useful for understanding information described in documents. In experiments, when we judged the extraction of a correct graph as the top output to be correct, the system accuracy was 0.2500 in evaluation A and 0.3334 in evaluation B. (In evaluation A, a graph where 75% or more of the points were correct was judged to be correct; in evaluation B, a graph where 50% or more of the points were correct was judged to be correct.) When we judged the extraction of a correct graph in the top five outputs to be correct, accuracy rose to 0.4167 in evaluation A and 0.6250 in evaluation B. Our system is convenient and effective because it can output a graph that includes trend information at these levels of accuracy when given only a set of documents as input.