On the MSE robustness of batching estimators
Proceedings of the 33nd conference on Winter simulation
Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Text Mining for Causal Relations
Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Finding parts in very large corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW
WWW '05 Proceedings of the 14th international conference on World Wide Web
The role of lexico-semantic feedback in open-domain textual question-answering
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Performance issues and error analysis in an open-domain Question Answering system
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning surface text patterns for a Question Answering system
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Evaluation of resources for question answering evaluation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An analysis of the AskMSR question-answering system
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Semantic verification in an online fact seeking environment
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting Word Substitutions in Text
IEEE Transactions on Knowledge and Data Engineering
Corpus-based semantic lexicon induction with Web-based corroboration
UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
A probabilistic model of redundancy in information extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Automatic set instance extraction using the web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Corpus-based thesaurus construction for image retrieval in specialist domains
ECIR'03 Proceedings of the 25th European conference on IR research
Inducing domain-specific semantic class taggers from (almost) nothing
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
Many tasks related to or supporting information retrieval, such as query expansion, automated question answering, reasoning, or heterogeneous database integration, involve verification of a semantic category (e.g. "coffee" is a drink, "red" is a color, while "steak" is not a drink and "big" is not a color). We present a novel framework to automatically validate a membership in an arbitrary, not a trained a priori semantic category up to a desired level of accuracy. Our approach does not rely on any manually codified knowledge but instead capitalizes on the diversity of topics and word usage in a large corpus (e.g. World Wide Web). Using TREC factoid questions that expect the answer to belong to a specific semantic category, we show that a very high level of accuracy can be reached by automatically identifying more training seeds and more training patterns when needed. We develop a specific quantitative validation model that takes uncertainty and redundancy in the training data into consideration. We empirically confirm the important aspects of our model through ablation studies.