Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Bursty and Hierarchical Structure in Streams
Data Mining and Knowledge Discovery
WordNet: a lexical database for English
HLT '94 Proceedings of the workshop on Human Language Technology
Ontology-Driven Semantic Matches between Database Schemas
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
A web-based novel term similarity framework for ontology learning
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Abordagem não supervisionada para extração de conceitos a partir de textos
Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Hi-index | 0.00 |
Ontology learning integrates many complementary techniques, including machine learning, natural language processing, and data mining. Specifically, clustering techniques facilitate the building of interrelationships between terms by exploiting similarities of concepts. With the rapid growth of the Web, online information has become one of the major information sources. The ontology learning process where traditional clustering algorithms are involved tends to be slow and computationally expensive when the dataset is as large as the Web. To address this problem, we present an efficient concept clustering technique for ontology learning that reduces the number of required pairwise term similarity computations without a loss of quality. Our approach is to identify relevant terms using a computationally inexpensive similarity metric based on an event life cycle in online news articles. Then, we perform more sophisticated similarity computations. Hence, we can build clusters with high precision/recall and high speed. Without a loss of clustering quality, our framework reduces the number of required computations from O(N2) to (N + L2) (L « N) where N is the number of candidate concepts. Our experimental results show that clustering based on our similarity framework can construct concept clusters 1541.07% faster than clustering with all term pair similarity computations.