A formal concept analysis-based domain-specific thesaurus and its application in document representation

  • Authors:
  • Jihn-Chang Jehng;Shihchieh Chou;Chin-Yi Cheng

  • Affiliations:
  • Institute of Human Resource Management, National Central University, Jhongli City, Taoyuan County, Taiwan;Department of Information Management, National Central University, Jhongli City, Taoyuan County, Taiwan;Department of Information Management, National Central University, Jhongli City, Taoyuan County, Taiwan

  • Venue:
  • ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Many techniques in the process of document retrieval and clustering, based on the vector space model, represent documents by vectors. They ignore the conceptual relationships of terms such as synonyms, hypernyms and hyponyms and, especially, treat terms as a bag of terms. The application of conceptual relationships of terms has been proved by generating improved results for document clustering in previous studies. For those studies, thesauri like WordNet were used to provide the information of relationships between terms. However, some domain-specific terms like "query expansion" and "document clustering" cannot be found in these thesauri. These terms are thought of as important features in domain-specific documents. In this paper, we propose an automatic domain-specific thesaurus building approach based on Formal Concept Analysis (FCA) dealing with the problem with general thesauri. We also apply the domain-specific thesaurus as background knowledge to represent documents by concept dimension vectors. In the evaluation, an improved result by our method compared to traditional approaches is shown.