Comparison of similarity measures for clustering Turkish documents

Authors:
Ainura Madylova;Ş/ule Gü/ndü/z Ö/ğ/ü/dü/cü/
Affiliations:
(Correspd. Tel.: +90 212 2853682/ Fax: +90 212 2853679/ E-mail: madylova@itu.edu.tr) Department of Computer Engineering, Istanbul Technical University, Maslak, Istanbul TR34469, Turkey;Department of Computer Engineering, Istanbul Technical University, Maslak, Istanbul TR34469, Turkey
Venue:
Intelligent Data Analysis
Year:
2009

Citing 22
Cited 2

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Data clustering: a review

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Validation indices for graph clustering

Pattern Recognition Letters - Special issue: Graph-based representations in pattern recognition
Textual Similarities Based on a Distributional Approach

DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
THESUS: Organizing Web document collections based on link semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Clustering and Information Retrieval (Network Theory and Applications)

Clustering and Information Retrieval (Network Theory and Applications)
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Learning morphological disambiguation rules for Turkish

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A fuzzy clustering approach for finding similar documents using a novel similarity measure

Expert Systems with Applications: An International Journal
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Personalized Hierarchical Clustering

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Comparison of Semantic and Single Term Similarity Measures for Clustering Turkish Documents

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Measuring the semantic similarity of texts

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Biological cluster validity indices based on the gene ontology

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

An Ontology Based Model for Document Clustering

International Journal of Intelligent Information Technologies
Extracting debate graphs from parliamentary transcripts: a study directed at UK house of commons debates

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text clustering has become an important part of the web data organization with the rapid growth of the World Wide Web (www). Clustering simplifies web search engine work by grouping large amount of documents, retrieved according to a given query. Similarity measures used in clustering affect the output of the grouping directly. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. In order to improve grouping of Turkish documents, we investigate several similarity measures based on the semantic similarity of terms. Moreover, some techniques for calculating documents similarity are studied. The aim of this paper is to study the effects of semantic and single term similarity measures to the clustering results of Turkish documents. All experiments are carried out on Turkish web sites, taking into account the relationships of terms based on the ontology for the Turkish language.