Automatic expansion of domain-specific lexicons by term categorization

  • Authors:
  • Henri Avancini;Alberto Lavelli;Fabrizio Sebastiani;Roberto Zanoli

  • Affiliations:
  • Consiglio Nazionale delle Ricerche, Pisa, Italy;ITC-irst, Povo (TN), Italy;Consiglio Nazionale delle Ricerche, Pisa, Italy;ITC-irst, Povo (TN), Italy

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss an approach to the automatic expansion ofdomain-specific lexicons, that is, to the problem ofextending, for each ci in a predefined setC ={c1,…,cm} ofsemantic domains, an initial lexiconLi0 into a larger lexiconLi1. Our approach relies onterm categorization, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.