Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Semi-supervised learning with graphs
Semi-supervised learning with graphs
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Espresso: leveraging generic patterns for automatically harvesting semantic relations
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning query intent from regularized click graphs
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Extracting structured information from user queries with semi-supervised conditional random fields
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A context pattern induction method for named entity extraction
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Methods for domain-independent information extraction from the web: an experimental comparison
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Weakly-supervised acquisition of labeled class instances using graph random walks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic set expansion for list question answering
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semantic lexicon adaptation for use in query interpretation
Proceedings of the 19th international conference on World wide web
Learning 5000 relational extractors
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SEISA: set expansion by iterative similarity aggregation
Proceedings of the 20th international conference on World wide web
Fine-grained class label markup of search queries
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Labeling queries for a people search engine
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
A joint model for discovery of aspects in utterances
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Understanding intents from search queries can improve a user's search experience and boost a site's advertising profits. Query tagging via statistical sequential labeling models has been shown to perform well, but annotating the training set for supervised learning requires substantial human effort. Domain-specific knowledge, such as semantic class lexicons, reduces the amount of needed manual annotations, but much human effort is still required to maintain these as search topics evolve over time. This paper investigates semi-supervised learning algorithms that leverage structured data (HTML lists) from the Web to automatically generate semantic-class lexicons, which are used to improve query tagging performance - even with far less training data. We focus our study on understanding the correct objectives for the semi-supervised lexicon learning algorithms that are crucial for the success of query tagging. Prior work on lexicon acquisition has largely focused on the precision of the lexicons, but we show that precision is not important if the lexicons are used for query tagging. A more adequate criterion should emphasize a trade-off between maximizing the recall of semantic class instances in the data, and minimizing the confusability. This ensures that the similar levels of precision and recall are observed on both training and test set, hence prevents over-fitting the lexicon features. Experimental results on retail product queries show that enhancing a query tagger with lexicons learned with this objective reduces word level tagging errors by up to 25% compared to the baseline tagger that does not use any lexicon features. In contrast, lexicons obtained through a precision-centric learning algorithm even degrade the performance of a tagger compared to the baseline. Furthermore, the proposed method outperforms one in which semantic class lexicons have been extracted from a database.