Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Practical Data-Oriented Microaggregation for Statistical Disclosure Control
IEEE Transactions on Knowledge and Data Engineering
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Swoogle: a search and metadata engine for the semantic web
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation
Data Mining and Knowledge Discovery
Designing semantics-preserving cluster representatives for scientific input conditions
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
A simple and fast algorithm for K-medoids clustering
Expert Systems with Applications: An International Journal
Rich document representation and classification: An analysis
Knowledge-Based Systems
Advanced ontology management system for personalised e-Learning
Knowledge-Based Systems
WordNet::Similarity: measuring the relatedness of concepts
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Obtaining the consensus and inconsistency among a set of assertions on a qualitative attribute
Expert Systems with Applications: An International Journal
Density-based microaggregation for statistical disclosure control
Expert Systems with Applications: An International Journal
Text clustering using frequent itemsets
Knowledge-Based Systems
A classification algorithm based on local cluster centers with a few labeled training examples
Knowledge-Based Systems
Web Semantics: Science, Services and Agents on the World Wide Web
Semantic microaggregation for the anonymization of query logs
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Ontology-driven web-based semantic similarity
Journal of Intelligent Information Systems
Performance of ontology-based semantic similarities in clustering
ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
Ontology-based information content computation
Knowledge-Based Systems
Ontology-based anonymization of categorical values
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Towards semantic microaggregation of categorical data for confidential documents
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
The centroid or consensus of a set of objects with qualitative attributes
Expert Systems with Applications: An International Journal
An ontology-based measure to compute semantic similarity in biomedicine
Journal of Biomedical Informatics
A dissimilarity measure for the k-Modes clustering algorithm
Knowledge-Based Systems
Dimensionality reduction and main component extraction of mass spectrometry cancer data
Knowledge-Based Systems
Enhanced centroid-based classification technique by filtering outliers
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Ontology-based semantic similarity: A new feature-based approach
Expert Systems with Applications: An International Journal
Journal of Biomedical Informatics
A modification of the k-means method for quasi-unsupervised learning
Knowledge-Based Systems
Journal of Biomedical Informatics
Hi-index | 0.00 |
Centroids are key components in many data analysis algorithms such as clustering or microaggregation. They are considered as the central value that minimises the distance to all the objects in a dataset or cluster. Methods for centroid construction are mainly devoted to datasets with numerical and categorical attributes, focusing on the numerical and distributional properties of data. Textual attributes, on the contrary, consist of term lists referring to concepts with a specific semantic content (i.e., meaning), which cannot be evaluated by means of classical numerical operators. Hence, the centroid of a dataset with textual attributes should be the term that minimises the semantic distance against the members of the set. Semantically-grounded methods aiming to construct centroids for datasets with textual attributes are scarce and, as it will be discussed in this paper, they are hampered by their limited semantic analysis of data. In this paper, we propose a method that, exploiting the knowledge provided by background ontologies (like WordNet), is able to construct the centroid of multivariate datasets described by means of textual attributes. Special efforts have been put in the minimisation of the semantic distance between the centroid and the input data. As a result, our method is able to provide optimal centroids (i.e., those that minimise the distance to all the objects in the dataset) according to the exploited background ontology and a semantic similarity measure. Our proposal has been evaluated by means of a real dataset consisting on short textual answers provided by visitors of a natural park. Results show that our centroids retain the semantic content of the input data better than related works.