Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the novelty of text-mined rules using lexical knowledge
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An algorithm for term conflation based on tree structures
Journal of the American Society for Information Science and Technology
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
Exploiting the Similarity of Non-Matching Terms at RetrievalTime
Information Retrieval
Strong similarity measures for ordered sets of documents in information retrieval
Information Processing and Management: an International Journal
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Text Mining: A New Frontier for Lossless Compression
DCC '99 Proceedings of the Conference on Data Compression
A Multi-Level Text Mining Method to Extract Biological Relationships
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extracting noun phrases from large-scale texts: a hybrid approach and its automatic evaluation
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Corpus-dependent association thesauri for information retrieval
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A compression algorithm using integrated record information for translation dictionaries
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Mining sequential patterns in the B2B environment
Journal of Information Science
Distributional lexical semantics for stop lists
IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Hi-index | 0.00 |
In this research, the development of a `concept-clumping algorithm' designed to improve the clustering of technical concepts is demonstrated . The algorithm developed first identifies a list of technically relevant noun phrases from a cleaned extracted list and then applies a rule-based algorithm for identifying synonymous terms based on shared words in each term. An assessment of the algorithm found that the algorithm has an 89—91% precision rate, was successful in moving technically important terms higher in the term frequency list, and improved the technical specificity of term clusters.