Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Fast clustering algorithm for information organization
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
An approach to clustering abstracts
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Enhancement of DTP feature selection method for text categorization
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A Competitive Term Selection Method for Information Retrieval
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
UPV-SI: word sense induction using self term expansion
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Particle Swarm Optimization for clustering short-text corpora
Proceedings of the 2009 conference on Computational Intelligence and Bioengineering: Essays in Memory of Antonina Starita
On the relative hardness of clustering corpora
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Using query-relevant documents pairs for cross-lingual information retrieval
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Instance selection in text classification using the silhouette coefficient measure
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Clustering short text and its evaluation
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Extended information inference model for unsupervised categorization of web short texts
Journal of Information Science
Hi-index | 0.00 |
Free access to scientific papers in major digital libraries and other web repositories is limited to only their abstracts. Current keyword-based techniques fail on narrow domain-oriented libraries, e.g., those containing only documents on high energy physics like those of the hep-ex collection of CERN. We propose a simple procedure to cluster abstracts which consists in applying the transition point technique during the term selection process. This technique uses the mid-frequency terms to index the documents due to the fact that they have a high semantic content. In the experiments we have carried out, the transition point approach has been compared with well known unsupervised term selection techniques. Transition point technique shown that it is possible to obtain a better performance than traditional methods. Moreover, we propose an approach to analyse the stability of transition point term selection method.