Automatic taxonomy construction from keywords

Authors:
Xueqing Liu;Yangqiu Song;Shixia Liu;Haixun Wang
Affiliations:
Microsoft Research Asia & Tsinghua University, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 15
Cited 6

Data clustering: a review

ACM Computing Surveys (CSUR)
Multidimensional binary search trees used for associative searching

Communications of the ACM
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Modeling word burstiness using the Dirichlet distribution

ICML '05 Proceedings of the 22nd international conference on Machine learning
Mining web query hierarchies from clickthrough data

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Clustering query refinements by user intent

Proceedings of the 19th international conference on World wide web
Unsupervised ontology induction from text

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Predicting short-term interests using activity-based search context

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Parallel Spectral Clustering in Distributed Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient similarity joins for near-duplicate detection

ACM Transactions on Database Systems (TODS)
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A graph-based algorithm for inducing lexical taxonomies from scratch

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Short text conceptualization using a probabilistic knowledgebase

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Understanding tables on the web

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Concept-based web search

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Mining evolutionary multi-branch trees from text streams

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A phrase mining framework for recursive construction of a topical hierarchy

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Content coverage maximization on word networks for hierarchical topic summarization

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A hierarchical Dirichlet model for taxonomy expansion for search engines

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Taxonomies, especially the ones in specific domains, are becoming indispensable to a growing number of applications. State-of-the-art approaches assume there exists a text corpus to accurately characterize the domain of interest, and that a taxonomy can be derived from the text corpus using information extraction techniques. In reality, neither assumption is valid, especially for highly focused or fast-changing domains. In this paper, we study a challenging problem: Deriving a taxonomy from a set of keyword phrases. A solution can benefit many real life applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and contexts are needed. To this end, we first use a general purpose knowledgebase and keyword search to supply the required knowledge and context. Then we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n2 log n) to O(n log n), so that we can derive a domain specific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related query taxonomy illustrates the usefulness of our approach for specific domains.