OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English
Communications of the ACM
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
WWW '03 Proceedings of the 12th international conference on World Wide Web
Chinese word segmentation based on maximum matching and word binding force
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Mining Domain-Specific Thesauri from Wikipedia: A Case Study
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Improving Text Classification by Using Encyclopedia Knowledge
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
WikiRelate! computing semantic relatedness using wikipedia
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Leveraging Web 2.0 Sources for Web Content Classification
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Understanding user's query intent with wikipedia
Proceedings of the 18th international conference on World wide web
Clustering Documents Using a Wikipedia-Based Concept Representation
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Web Search Clustering and Labeling with Hidden Topics
ACM Transactions on Asian Language Information Processing (TALIP)
Enhancing cluster labeling using wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving text classification by a sense spectrum approach to term expansion
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Proceedings of the 18th ACM conference on Information and knowledge management
Named entity disambiguation by leveraging wikipedia semantic knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
ExSearch: a novel vertical search engine for online barter business
Proceedings of the 18th ACM conference on Information and knowledge management
Linking Wikipedia entries to blog feeds by machine learning
Proceedings of the 3rd International Universal Communication Symposium
Granular Computing for Text Mining: New Research Challenges and Opportunities
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
A Kernel-based feature weighting for text classification
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Wikipedia-assisted concept thesaurus for better web media understanding
Proceedings of the international conference on Multimedia information retrieval
Exploiting time-based synonyms in searching document archives
Proceedings of the 10th annual joint conference on Digital libraries
Learning to rank with (a lot of) word features
Information Retrieval
A probabilistic topic-connection model for automatic image annotation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Semantics-based representation model for multi-layer text classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Frequent itemset based hierarchical document clustering using Wikipedia as external knowledge
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Linking topics of news and blogs with wikipedia for complementary navigation
BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Cross lingual text classification by mining multilingual topics from wikipedia
Proceedings of the fourth ACM international conference on Web search and data mining
User-related tag expansion for web document clustering
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Collective entity linking in web text: a graph-based method
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
High-order co-clustering text data on semantics-based representation model
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Grammatical dependency-based relations for term weighting in text classification
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Multilingual document clustering using wikipedia as external knowledge
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Text classification for data loss prevention
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Semantics-based web service discovery using information retrieval techniques
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Autonomous and adaptive identification of topics in unstructured text
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
A multi-layer text classification framework based on two-level representation model
Expert Systems with Applications: An International Journal
Text clustering based on granular computing and wikipedia
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge
Proceedings of the 20th ACM international conference on Information and knowledge management
Leveraging Wikipedia concept and category information to enhance contextual advertising
Proceedings of the 20th ACM international conference on Information and knowledge management
PDFMeat: managing publications on the semantic desktop
Proceedings of the 20th ACM international conference on Information and knowledge management
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia
ACM Transactions on Intelligent Systems and Technology (TIST)
Quality-aware similarity assessment for entity matching in Web data
Information Systems
Topical clustering of search results
Proceedings of the fifth ACM international conference on Web search and data mining
Enriching short text representation in microblog for clustering
Frontiers of Computer Science in China
Athena: text mining based discovery of scientific workflows in disperse repositories
RED'10 Proceedings of the Third international conference on Resource Discovery
A web 2.0 approach for organizing search results using wikipedia
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Wikipedia-based smoothing for enhancing text clustering
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
LDA-Based topic modeling in labeling blog posts with wikipedia entries
APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Short text classification improved by learning multi-granularity topics
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Short text conceptualization using a probabilistic knowledgebase
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Learning a concept-based document similarity measure
Journal of the American Society for Information Science and Technology
CluChunk: clustering large scale user-generated content incorporating chunklet information
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Selecting keywords to represent web pages using Wikipedia information
Proceedings of the 18th Brazilian symposium on Multimedia and the web
Journal of Intelligent Information Systems
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA
Proceedings of the 21st ACM international conference on Information and knowledge management
On the connections between explicit semantic analysis and latent semantic analysis
Proceedings of the 21st ACM international conference on Information and knowledge management
A new term ranking method based on relation extraction and graph model for text classification
ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
Enhancing short text clustering with small external repositories
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Cross-media topic mining on wikipedia
Proceedings of the 21st ACM international conference on Multimedia
Improving semi-supervised text classification by using wikipedia knowledge
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Improving question retrieval in community question answering using world knowledge
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the important information on the semantic relationships between key terms. To overcome this problem, several methods have been proposed to enrich text representation with external resource in the past, such as WordNet. However, many of these approaches suffer from some limitations: 1) WordNet has limited coverage and has a lack of effective word-sense disambiguation ability; 2) Most of the text representation enrichment strategies, which append or replace document terms with their hypernym and synonym, are overly simple. In this paper, to overcome these deficiencies, we first propose a way to build a concept thesaurus based on the semantic relations (synonym, hypernym, and associative relation) extracted from Wikipedia. Then, we develop a unified framework to leverage these semantic relations in order to enhance traditional content similarity measure for text clustering. The experimental results on Reuters and OHSUMED datasets show that with the help of Wikipedia thesaurus, the clustering performance of our method is improved as compared to previous methods. In addition, with the optimized weights for hypernym, synonym, and associative concepts that are tuned with the help of a few labeled data users provided, the clustering performance can be further improved.