Using WordNet to disambiguate word senses for text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A vector space model for automatic indexing
Communications of the ACM
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Journal of Intelligent Information Systems
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning similarity measures in non-orthogonal space
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Computational Linguistics
Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The phrase-based vector space model for automatic retrieval of free-text medical documents
Data & Knowledge Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Introduction to Information Retrieval
Introduction to Information Retrieval
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Exploiting noun phrases and semantic relationships for text document clustering
Information Sciences: an International Journal
A comparison of extrinsic clustering evaluation metrics based on formal constraints
Information Retrieval
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A generalized vector space model for text retrieval based on semantic relatedness
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
WordNet-based text document clustering
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Semantic smoothing of document models for agglomerative clustering
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
An extensive empirical study of collocation extraction methods
ACLstudent '05 Proceedings of the ACL Student Research Workshop
Document clustering using nonnegative matrix factorization
Information Processing and Management: an International Journal
Automatic evaluation of topic coherence
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Text relatedness based on a word thesaurus
Journal of Artificial Intelligence Research
Knowledge-based vector space model for text clustering
Knowledge and Information Systems
Concept-Based Information Retrieval Using Explicit Semantic Analysis
ACM Transactions on Information Systems (TOIS)
Composite kernels for semi-supervised clustering
Knowledge and Information Systems
A knowledge-based semantic Kernel for text classification
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
On ontology-driven document clustering using core semantic features
Knowledge and Information Systems - Special Issue on "Context-Aware Data Mining (CADM)"
Word sense disambiguation for exploiting hierarchical thesauri in text classification
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Efficient semantic kernel-based text classification using matching pursuit KFDA
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Combining vector space model and multi word term extraction for semantic query expansion
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
In this paper we present a new semantic smoothing vector space kernel (S-VSM) for text documents clustering. In the suggested approach semantic relatedness between words is used to smooth the similarity and the representation of text documents. The basic hypothesis examined is that considering semantic relatedness between two text documents may improve the performance of the text document clustering task. For our experimental evaluation we analyze the performance of several semantic relatedness measures when embedded in the proposed (S-VSM) and present results with respect to different experimental conditions, such as: (i) the datasets used, (ii) the underlying knowledge sources of the utilized measures, and (iii) the clustering algorithms employed. To the best of our knowledge, the current study is the first to systematically compare, analyze and evaluate the impact of semantic smoothing in text clustering based on 'wisdom of linguists', e.g., WordNets, 'wisdom of crowds', e.g., Wikipedia, and 'wisdom of corpora', e.g., large text corpora represented with the traditional Bag of Words (BoW) model. Three semantic relatedness measures for text are considered; two knowledge-based (Omiotis[1] that uses WordNet, and WLM[2] that uses Wikipedia), and one corpus-based (PMI[3] trained on a semantically tagged SemCor version). For the comparison of different experimental conditions we use the BCubed F-Measure evaluation metric which satisfies all formal constraints of good quality cluster. The experimental results show that the clustering performance based on the S-VSM is better compared to the traditional VSM model and compares favorably against the standard GVSM kernel which uses word co-occurrences to compute the latent similarities between document terms.