Pattern mining across domain-specific text collections

  • Authors:
  • Lee Gillam;Khurshid Ahmad

  • Affiliations:
  • Department of Computing, School of Electronics and Physical Sciences, University of Surrey, Guildford, United Kingdom;Department of Computing, School of Electronics and Physical Sciences, University of Surrey, Guildford, United Kingdom

  • Venue:
  • MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses a consistency in patterns of language use across domain-specific collections of text. We present a method for the automatic identification of domain-specific keywords – specialist terms – based on comparing language use in scientific domain-specific text collections with language use in texts intended for a more general audience. The method supports automatic production of collocational networks, and of networks of concepts – thesauri, or so-called ontologies. The method involves a novel combination of existing metrics from work in computational linguistics, which can enable extraction, or learning, of these kinds of networks. Creation of ontologies or thesauri is informed by international (ISO) standards in terminology science, and the resulting resource can be used to support a variety of work, including data-mining applications.