Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Natural Language Information Retrieval
Natural Language Information Retrieval
Clustering Algorithms
Modern Information Retrieval
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Analysis of Clustering Algorithms for Web-Based Search
PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
Automated Selection of Interesting Medical Text Documents by the TEA Text Analyzer
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
On the Nature of Structure and Its Identification
WG '99 Proceedings of the 25th International Workshop on Graph-Theoretic Concepts in Computer Science
Multi-attribute Text Classification Using the Fuzzy Borda Method and Semantic Grades
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Recurrent-neural-network-based Boolean factor analysis and its application to word clustering
IEEE Transactions on Neural Networks
Particle Swarm Optimization for clustering short-text corpora
Proceedings of the 2009 conference on Computational Intelligence and Bioengineering: Essays in Memory of Antonina Starita
Fuzzifying clustering algorithms: the case study of majorclust
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Evaluation of internal validity measures in short-text corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
ITSA*: an effective iterative method for short-text clustering tasks
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Clustering abstracts of scientific texts using the transition point technique
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Sense cluster based categorization and clustering of abstracts
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
A general bio-inspired method to improve the short-text clustering task
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
An efficient Particle Swarm Optimization approach to cluster short texts
Information Sciences: an International Journal
Hi-index | 0.00 |
Free access to full-text scientific papers in major digital libraries and other web repositories is limited to only their abstracts consisting of no more than several dozens of words. Current keyword-based techniques allow for clustering such type of short texts only when the data set is multi-category, e.g., some documents are devoted to sport, others to medicine, others to politics, etc. However, they fail on narrow domain-oriented libraries, e.g., those containing all documents only on physics, or all on geology, or all on computational linguistics, etc. Nevertheless, just such data sets are the most frequent and most interesting ones. We propose simple procedure to cluster abstracts, which consists in grouping keywords and using more adequate document similarity measure. We use Stein's MajorClust method for clustering both keywords and documents. We illustrate our approach on the texts from the Proceedings of a narrow-topic conference. Limitations of our approach are also discussed. Our preliminary experiments show that abstracts cannot be clustered with the same quality as full texts, though the achieved quality is adequate for many applications; accordingly, we suggest Makagonov's proposal that digital libraries should provide document images of full texts of the papers (and not only abstracts) for open access via Internet, in order to help in search, classification, clustering, selection, and proper referencing of the papers.