Employing topic models for pattern-based semantic class discovery

Authors:
Huibin Zhang;Mingjie Zhu;Shuming Shi;Ji-Rong Wen
Affiliations:
Nankai University;University of Science and Technology of China;Microsoft Research Asia;Microsoft Research Asia
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 15
Cited 8

Experiment on linguistically-based term associations

Information Processing and Management: an International Journal
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Induction of semantic classes from natural language text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Methods and metrics for cold-start recommendations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Latent dirichlet allocation

The Journal of Machine Learning Research
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Pattern-based semantic class discovery with multi-membership support

Proceedings of the 17th ACM conference on Information and knowledge management
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
NUS-ML: improving word sense disambiguation using topic features

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Searching coordinate terms with their context from the web

WISE'06 Proceedings of the 7th international conference on Web Information Systems

Nonlinear static-rank computation

Proceedings of the 18th ACM conference on Information and knowledge management
A web service for automatic word class acquisition

Proceedings of the 3rd International Universal Communication Symposium
Latent variable models of selectional preference

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Corpus-based semantic class mining: distributional vs. pattern-based approaches

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Nonlinear evidence fusion and propagation for hyponymy relation mining

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Finding dimensions for queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Extracting query facets from search results

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

A semantic class is a collection of items (words or phrases) which have semantically peer or sibling relationship. This paper studies the employment of topic models to automatically construct semantic classes, taking as the source data a collection of raw semantic classes (RASCs), which were extracted by applying predefined patterns to web pages. The primary requirement (and challenge) here is dealing with multi-membership: An item may belong to multiple semantic classes; and we need to discover as many as possible the different semantic classes the item belongs to. To adopt topic models, we treat RASCs as "documents", items as "words", and the final semantic classes as "topics". Appropriate preprocessing and postprocessing are performed to improve results quality, to reduce computation cost, and to tackle the fixed-k constraint of a typical topic model. Experiments conducted on 40 million web pages show that our approach could yield better results than alternative approaches.