Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by committee
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A geometric interpretation and analysis of R-precision
Proceedings of the 14th ACM international conference on Information and knowledge management
Named entity recognition through classifier combination
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Proceedings of the 16th international conference on World Wide Web
Weakly-supervised discovery of named entities using web search queries
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting web search to generate synonyms for entities
Proceedings of the 18th international conference on World wide web
Understanding user's query intent with wikipedia
Proceedings of the 18th international conference on World wide web
A joint information model for n-best ranking
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic set expansion for list question answering
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semi-automatic entity set refinement
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Locating complex named entities in web text
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Not all seeds are equal: measuring the quality of text mining seeds
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
An active learning approach to finding related terms
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Materializing multi-relational databases from the web using taxonomic queries
Proceedings of the fourth ACM international conference on Web search and data mining
Insights from network structure for text mining
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
HITS-based seed selection and stop list construction for bootstrapping
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Class label enhancement via related instances
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Ensemble semantics for large-scale unsupervised relation extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Bootstrapping biomedical ontologies for scientific text using NELL
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Active learning for relation type extension with local and global data views
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised expansion algorithms with manual cleaning of the resulting expanded sets. In this paper, we study the effects of different seed sets in a state-of-the-art semi-supervised expansion system and show a tremendous variation in expansion performance depending on the choice of seeds. We further show that human editors, in general, provide very bad seed sets, which perform well-below the average random seed set. We identify three factors of seed set composition, namely prototypicality, ambiguity and coverage, and we investigate their effects on expansion performance. Finally, we propose various automatic systems for improving editor-generated seed sets, which seek to remove ambiguous and other error-prone seed instances. An extensive experimental analysis shows that expansion quality, measured in R-precision, can be improved on average by a maximum of 46% by removing the right seeds from a seed set. Our automatic methods outperform the human editors seed sets and on average improve expansion performance by up to 34% over the original seed sets.