Context-Based Text Mining for Insights in Long Documents

  • Authors:
  • Hironori Takeuchi;Shiho Ogino;Hideo Watanabe;Yoshiko Shirata

  • Affiliations:
  • Tokyo Research Laboratory, IBM Japan, Ltd., IBM Research, Kanagawa, Japan;Tokyo Research Laboratory, IBM Japan, Ltd., IBM Research, Kanagawa, Japan;Tokyo Research Laboratory, IBM Japan, Ltd., IBM Research, Kanagawa, Japan;Graduate School of Business Science, University of Tsukuba, Tokyo, Japan

  • Venue:
  • PAKM '08 Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider long documents and try to find differences between document collections. In the analysis of document collections such as project status reports or annual reports, each document and each sentence tend to be relatively long. Therefore, it can be difficult to derive insights by looking only for representative concepts in the selected document collection based on a divergence metric. In this paper, we propose an analysis approach based on contextual information. By extracting pairs of a topic word and a keyword and assessing their representativeness in the selected document collection, we are developing a method to extract insights from these long documents. Applying the proposed method for the analysis between the annual reports of bankrupt companies and those of sound companies, we were able to derive insights that could not be extracted with the conventional methods.