Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 17th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Concept-based feature generation and selection for information retrieval
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised modeling of Twitter conversations
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Probase: a probabilistic taxonomy for text understanding
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing index for taxonomy keyword search
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Automatic taxonomy construction from keywords
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A system for extracting top-K lists from the web
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding tables on the web
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Identifying users' topical tasks in web search
Proceedings of the sixth ACM international conference on Web search and data mining
Harnessing linked knowledge sources for topic classification in social media
Proceedings of the 24th ACM Conference on Hypertext and Social Media
Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Context-dependent conceptualization
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hybrid entity clustering using crowds and data
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Most text mining tasks, including clustering and topic detection, are based on statistical methods that treat text as bags of words. Semantics in the text is largely ignored in the mining process, and mining results often have low interpretability. One particular challenge faced by such approaches lies in short text understanding, as short texts lack enough content from which statistical conclusions can be drawn easily. In this paper, we improve text understanding by using a probabilistic knowledgebase that is as rich as our mental world in terms of the concepts (of worldly facts) it contains. We then develop a Bayesian inference mechanism to conceptualize words and short text. We conducted comprehensive experiments on conceptualizing textual terms, and clustering short pieces of text such as Twitter messages. Compared to purely statistical methods such as latent semantic topic modeling or methods that use existing knowledge-bases (e.g., WordNet, Freebase and Wikipedia), our approach brings significant improvements in short text understanding as reflected by the clustering accuracy.