ACM Computing Surveys (CSUR)
Multidimensional binary search trees used for associative searching
Communications of the ACM
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Modeling word burstiness using the Dirichlet distribution
ICML '05 Proceedings of the 22nd international conference on Machine learning
Mining web query hierarchies from clickthrough data
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Clustering query refinements by user intent
Proceedings of the 19th international conference on World wide web
Unsupervised ontology induction from text
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Predicting short-term interests using activity-based search context
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Parallel Spectral Clustering in Distributed Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Probase: a probabilistic taxonomy for text understanding
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A graph-based algorithm for inducing lexical taxonomies from scratch
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Short text conceptualization using a probabilistic knowledgebase
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Understanding tables on the web
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Mining evolutionary multi-branch trees from text streams
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A phrase mining framework for recursive construction of a topical hierarchy
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Content coverage maximization on word networks for hierarchical topic summarization
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A hierarchical Dirichlet model for taxonomy expansion for search engines
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Taxonomies, especially the ones in specific domains, are becoming indispensable to a growing number of applications. State-of-the-art approaches assume there exists a text corpus to accurately characterize the domain of interest, and that a taxonomy can be derived from the text corpus using information extraction techniques. In reality, neither assumption is valid, especially for highly focused or fast-changing domains. In this paper, we study a challenging problem: Deriving a taxonomy from a set of keyword phrases. A solution can benefit many real life applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and contexts are needed. To this end, we first use a general purpose knowledgebase and keyword search to supply the required knowledge and context. Then we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n2 log n) to O(n log n), so that we can derive a domain specific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related query taxonomy illustrates the usefulness of our approach for specific domains.