Text Clustering Based on Good Aggregations
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering short texts using wikipedia
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving weak ad-hoc queries using wikipedia asexternal corpus
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 17th international conference on World Wide Web
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A comparative study of ontology based term similarity measures on PubMed document clustering
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A probabilistic topic-connection model for automatic image annotation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Using Wikipedia categories for compact representations of chemical documents
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Semantics-based representation model for multi-layer text classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Frequent itemset based hierarchical document clustering using Wikipedia as external knowledge
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Linking topics of news and blogs with wikipedia for complementary navigation
BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Annotate Wikipedia with Flickr images: concepts and case study
ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
Hierarchical topic-based communities construction for authors in a literature database
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Document clustering using NMF and fuzzy relation
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
User-related tag expansion for web document clustering
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A generalized method for word sense disambiguation based on wikipedia
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Unsupervised feature weighting based on local feature relatedness
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
High-order co-clustering text data on semantics-based representation model
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Multilingual document clustering using wikipedia as external knowledge
IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Effectively mining wikipedia for clustering multilingual documents
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Improving document clustering using Okapi BM25 feature weighting
Information Retrieval
A multi-layer text classification framework based on two-level representation model
Expert Systems with Applications: An International Journal
Text clustering based on granular computing and wikipedia
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Transferring topical knowledge from auxiliary long texts for short text clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge
Proceedings of the 20th ACM international conference on Information and knowledge management
Leveraging Wikipedia concept and category information to enhance contextual advertising
Proceedings of the 20th ACM international conference on Information and knowledge management
Representing document as dependency graph for document clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
PDFMeat: managing publications on the semantic desktop
Proceedings of the 20th ACM international conference on Information and knowledge management
Enriching short text representation in microblog for clustering
Frontiers of Computer Science in China
Efficient semantic kernel-based text classification using matching pursuit KFDA
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Mining wikipedia and yahoo! answers for question expansion in opinion QA
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A web 2.0 approach for organizing search results using wikipedia
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Wikipedia-based smoothing for enhancing text clustering
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Short text classification improved by learning multi-granularity topics
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Short text conceptualization using a probabilistic knowledgebase
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Unsupervised multi-label text classification using a world knowledge ontology
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Clustering and understanding documents via discrimination information maximization
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
CluChunk: clustering large scale user-generated content incorporating chunklet information
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Selecting keywords to represent web pages using Wikipedia information
Proceedings of the 18th Brazilian symposium on Multimedia and the web
Sentence clustering via projection over term clusters
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
On the connections between explicit semantic analysis and latent semantic analysis
Proceedings of the 21st ACM international conference on Information and knowledge management
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia
Artificial Intelligence
Wiki3C: exploiting wikipedia for context-aware concept categorization
Proceedings of the sixth ACM international conference on Web search and data mining
A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
Semantic Labelling for Document Feature Patterns Using Ontological Subjects
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Effective measures for inter-document similarity
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Short text classification by detecting information path
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving semi-supervised text classification by using wikipedia knowledge
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Mapping semantic knowledge for unsupervised text categorisation
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Improving question retrieval in community question answering using world knowledge
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Semantic smoothing for text clustering
Knowledge-Based Systems
Deflation-based power iteration clustering
Applied Intelligence
WHAD: Wikipedia historical attributes data
Language Resources and Evaluation
Hi-index | 0.00 |
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared core words, although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise. In this paper, we present a novel text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, exact match and relatedness-match, to map text documents to Wikipedia concepts, and further to Wikipedia categories. Then the text documents are clustered based on a similarity metric which combines document content information, concept information as well as category information. The experimental results using the proposed clustering framework on three datasets (20-newsgroup, TDT2, and LA Times) show that clustering performance improves significantly by enriching document representation with Wikipedia concepts and categories.