Text analysis for detecting terrorism-related articles on the web

  • Authors:
  • Dongjin Choi;Byeongkyu Ko;Heesun Kim;Pankoo Kim

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Journal of Network and Computer Applications
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classifying web documents is considered as one of the most important tasks to reveal the terrorism-related documents. Internet provides a lot of valuable information to the users and the amount of web contents is progressively increasing. This makes it very difficult to identify potentially dangerous documents. Simply extracting keywords from documents is not enough to classify the contents. To build automated document classification systems, many techniques have been studied so far, but they are mostly statistical and knowledge-based approaches. These methods, however, do not yield satisfactory results because of complexity of natural languages. To overcome this deficiency, we propose a method to use word similarity based on WordNet hierarchy and n-gram data frequency. This method was tested with the sampled New York Times articles by querying four distinct words from four different areas. Experimental results show our proposed method effectively extracts context words from the text and identifies terrorism-related documents.