A formal concept analysis-based domain-specific thesaurus and its application in document representation

Authors:
Jihn-Chang Jehng;Shihchieh Chou;Chin-Yi Cheng
Affiliations:
Institute of Human Resource Management, National Central University, Jhongli City, Taoyuan County, Taiwan;Department of Information Management, National Central University, Jhongli City, Taoyuan County, Taiwan;Department of Information Management, National Central University, Jhongli City, Taoyuan County, Taiwan
Venue:
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Year:
2010

Citing 13
Cited 0

WordNet: a lexical database for English

Communications of the ACM
A lattice conceptual clustering system and its application to browsing retrieval

Machine Learning
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
A vector space model for automatic indexing

Communications of the ACM
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Concept Data Analysis: Theory and Applications

Concept Data Analysis: Theory and Applications
Conceptual query expansion

Data & Knowledge Engineering
Text retrieval with more realistic concept matching and reinforcement learning

Information Processing and Management: an International Journal
A new unsupervised method for document clustering by using WordNet lexical and conceptual relations

Information Retrieval
Design and development of a concept-based multi-document summarization system for research abstracts

Journal of Information Science
Learning concept hierarchies from text corpora using formal concept analysis

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many techniques in the process of document retrieval and clustering, based on the vector space model, represent documents by vectors. They ignore the conceptual relationships of terms such as synonyms, hypernyms and hyponyms and, especially, treat terms as a bag of terms. The application of conceptual relationships of terms has been proved by generating improved results for document clustering in previous studies. For those studies, thesauri like WordNet were used to provide the information of relationships between terms. However, some domain-specific terms like "query expansion" and "document clustering" cannot be found in these thesauri. These terms are thought of as important features in domain-specific documents. In this paper, we propose an automatic domain-specific thesaurus building approach based on Formal Concept Analysis (FCA) dealing with the problem with general thesauri. We also apply the domain-specific thesaurus as background knowledge to represent documents by concept dimension vectors. In the evaluation, an improved result by our method compared to traditional approaches is shown.