Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
WWW '03 Proceedings of the 12th international conference on World Wide Web
Kernel methods for relation extraction
The Journal of Machine Learning Research
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Proceedings of the 15th international conference on World Wide Web
Extracting regulatory gene expression networks from PubMed
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Dependency tree kernels for relation extraction
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Espresso: leveraging generic patterns for automatically harvesting semantic relations
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Relation extraction using label propagation based semi-supervised learning
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Harvesting relations from the web: quantifiying the impact of filtering functions
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Semantic annotation for knowledge management: Requirements and a survey of the state of the art
Web Semantics: Science, Services and Agents on the World Wide Web
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Analysis and improvement of minimally supervised machine learning for relation extraction
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Term extraction from sparse, ungrammatical domain-specific documents
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific databases.We present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can have similar effects as produced by increasing the number of seeds, it leads overall to better results. Our approach thus allows to combine advantages of two sources: The high reliability of a closed corpus and the high redundancy of the Web.