Short text conceptualization using a probabilistic knowledgebase

Authors:
Yangqiu Song;Haixun Wang;Zhongyuan Wang;Hongsong Li;Weizhu Chen
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 17
Cited 11

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Concept-based feature generation and selection for information retrieval

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised modeling of Twitter conversations

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing index for taxonomy keyword search

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Automatic taxonomy construction from keywords

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A system for extracting top-K lists from the web

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Concept-based web search

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Identifying users' topical tasks in web search

Proceedings of the sixth ACM international conference on Web search and data mining
Harnessing linked knowledge sources for topic classification in social media

Proceedings of the 24th ACM Conference on Hypertext and Social Media
Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Context-dependent conceptualization

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hybrid entity clustering using crowds and data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most text mining tasks, including clustering and topic detection, are based on statistical methods that treat text as bags of words. Semantics in the text is largely ignored in the mining process, and mining results often have low interpretability. One particular challenge faced by such approaches lies in short text understanding, as short texts lack enough content from which statistical conclusions can be drawn easily. In this paper, we improve text understanding by using a probabilistic knowledgebase that is as rich as our mental world in terms of the concepts (of worldly facts) it contains. We then develop a Bayesian inference mechanism to conceptualize words and short text. We conducted comprehensive experiments on conceptualizing textual terms, and clustering short pieces of text such as Twitter messages. Compared to purely statistical methods such as latent semantic topic modeling or methods that use existing knowledge-bases (e.g., WordNet, Freebase and Wikipedia), our approach brings significant improvements in short text understanding as reflected by the clustering accuracy.