Information Retrieval
Discriminative Features for Document Classification
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Clustering short texts using wikipedia
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Wikipedia-Based Kernels for Text Categorization
SYNASC '07 Proceedings of the Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Improving Text Classification by Using Encyclopedia Knowledge
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
Clustering Documents with Active Learning Using Wikipedia
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Semantics-based representation model for multi-layer text classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Linking topics of news and blogs with wikipedia for complementary navigation
BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Unsupervised feature weighting based on local feature relatedness
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
High-order co-clustering text data on semantics-based representation model
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A multi-layer text classification framework based on two-level representation model
Expert Systems with Applications: An International Journal
Text clustering based on granular computing and wikipedia
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Topical clustering of search results
Proceedings of the fifth ACM international conference on Web search and data mining
Correlation based multi-document summarization for scientific articles and news group
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Selecting keywords to represent web pages using Wikipedia information
Proceedings of the 18th Brazilian symposium on Multimedia and the web
An open-source toolkit for mining Wikipedia
Artificial Intelligence
DIKEA: domain-independent keyphrase extraction algorithm
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques.