Clustering abstracts of scientific texts using the transition point technique

Authors:
David Pinto;Héctor Jiménez-Salazar;Paolo Rosso
Affiliations:
Faculty of Computer Science, BUAP, Puebla, Mexico;Faculty of Computer Science, BUAP, Puebla, Mexico;Department of Information Systems and Computation, UPV, Valencia, Spain
Venue:
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2006

Citing 6
Cited 11

Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Fast clustering algorithm for information organization

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
An approach to clustering abstracts

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Enhancement of DTP feature selection method for text categorization

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A Competitive Term Selection Method for Information Retrieval

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
UPV-SI: word sense induction using self term expansion

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Particle Swarm Optimization for clustering short-text corpora

Proceedings of the 2009 conference on Computational Intelligence and Bioengineering: Essays in Memory of Antonina Starita
On the relative hardness of clustering corpora

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Using query-relevant documents pairs for cross-lingual information retrieval

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Instance selection in text classification using the silhouette coefficient measure

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Clustering short text and its evaluation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Extended information inference model for unsupervised categorization of web short texts

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Free access to scientific papers in major digital libraries and other web repositories is limited to only their abstracts. Current keyword-based techniques fail on narrow domain-oriented libraries, e.g., those containing only documents on high energy physics like those of the hep-ex collection of CERN. We propose a simple procedure to cluster abstracts which consists in applying the transition point technique during the term selection process. This technique uses the mid-frequency terms to index the documents due to the fact that they have a high semantic content. In the experiments we have carried out, the transition point approach has been compared with well known unsupervised term selection techniques. Transition point technique shown that it is possible to obtain a better performance than traditional methods. Moreover, we propose an approach to analyse the stability of transition point term selection method.