AUTOMATIC MACHINE LEARNING OF KEYPHRASE EXTRACTION FROM SHORT HTML DOCUMENTS WRITTEN IN HEBREW

  • Authors:
  • Yaakov HaCohen-Kerner;Ittay Stern;David Korkus;Erick Fredj

  • Affiliations:
  • Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel

  • Venue:
  • Cybernetics and Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keyphrases extracted from documents may save precious time for tasks such as filtering, summarization, and categorization. A few such systems are available for documents written in English. In this paper, we propose a model called LEH_KEY (Learning to Extract Hebrew KEYphrases) that for the first time learns to extract keyphrases for documents written in Hebrew. Firstly, we introduce a relatively high number (15) of baseline extraction methods as opposed to other related systems that use combinations of a low number (two/three) of baseline extraction methods. In contrast, we have investigated various combinations of larger number of baseline methods and various machine learning methods have been tested. The best results have been achieved by a combination of six baseline methods using J48 (an improved variant of C4.5). Our results have been found to be at least of equal quality to those achieved by extraction systems for documents written in English, which are regarded as state-of-the art.