WordNet: a lexical database for English
Communications of the ACM
Improving the effectiveness of information retrieval with local context analysis
ACM Transactions on Information Systems (TOIS)
A vector space model for automatic indexing
Communications of the ACM
Formal Concept Analysis: Mathematical Foundations
Formal Concept Analysis: Mathematical Foundations
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Concept Data Analysis: Theory and Applications
Concept Data Analysis: Theory and Applications
Data & Knowledge Engineering
Text retrieval with more realistic concept matching and reinforcement learning
Information Processing and Management: an International Journal
Design and development of a concept-based multi-document summarization system for research abstracts
Journal of Information Science
Learning concept hierarchies from text corpora using formal concept analysis
Journal of Artificial Intelligence Research
Hi-index | 0.01 |
Many techniques in the process of document retrieval and clustering, based on the vector space model, represent documents by vectors. They ignore the conceptual relationships of terms such as synonyms, hypernyms and hyponyms and, especially, treat terms as a bag of terms. The application of conceptual relationships of terms has been proved by generating improved results for document clustering in previous studies. For those studies, thesauri like WordNet were used to provide the information of relationships between terms. However, some domain-specific terms like "query expansion" and "document clustering" cannot be found in these thesauri. These terms are thought of as important features in domain-specific documents. In this paper, we propose an automatic domain-specific thesaurus building approach based on Formal Concept Analysis (FCA) dealing with the problem with general thesauri. We also apply the domain-specific thesaurus as background knowledge to represent documents by concept dimension vectors. In the evaluation, an improved result by our method compared to traditional approaches is shown.