Ontology generation for large email collections

  • Authors:
  • Hui Yang;Jamie Callan

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • dg.o '08 Proceedings of the 2008 international conference on Digital government research
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new approach to identifying concepts expressed in a collection of email messages, and organizing them into an ontology or taxonomy for browsing. It incorporates techniques from text mining, information retrieval, natural language processing and machine learning to generate a concept ontology. Nominal N-gram mining is used to identify candidate concepts. Wordnet and surface text pattern matching are used to identify relationships among the concepts. A supervised clustering algorithm is then used to further cluster the concepts. The experiments show that the approach is effective.