Query refinement through lexical clustering of scientific textual databases

Authors:
Eric SanJuan
Affiliations:
LITA Université Paul Verlaine & URI-INIST/CNRS, Metz, France
Venue:
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Year:
2005

Citing 14
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A study of retrospective and on-line event detection

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Classifying molecular sequences using a linkage graph with their pairwise similarities

Theoretical Computer Science - Special issue: Genome informatics
Clustering gene expression patterns

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Visualizing science by citation mapping

Journal of the American Society for Information Science
Data clustering: a review

ACM Computing Surveys (CSUR)
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Text Mining at the Term Level

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Cluster analysis of gene expression data

Cluster analysis of gene expression data
Lexically-based terminology structuring: some inherent limits

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

TermWatch system automatically extracts multi word terms from scientific texts based on morphological analysis and relates them through linguistic variations. The resulting terminological network is clustered based on a 3-level hierarchical graph algorithm and mapped onto a 2D space. Clusters are automatically labeled based on variation activity. After a precise review of the methodology, this paper evaluates in the context of querying a scientific textual database, the overlap of terms and cluster labels with the keywords selected by human indexers as well as the set of possible queries based on the clustering output. The results show that linguistic variation paradigm is a robust way of automatically extracting and structuring a user comprehensive terminological resource for query refinement.