Information extraction from Wikipedia using pattern learning

Authors:
Márton Miháltz
Affiliations:
Péter Pázmány Catholic University
Venue:
Acta Cybernetica
Year:
2010

Citing 19
Cited 0

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Kernel methods for relation extraction

The Journal of Machine Learning Research
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Combining linguistic and statistical analysis to extract relations from web documents

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Integrating probabilistic extraction models and data mining to discover relations and patterns in text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Using Wikipedia to bootstrap open information extraction

ACM SIGMOD Record
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Relation extraction from wikipedia using subtree mining

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Unsupervised relation extraction by mining Wikipedia texts using information from the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic Web technologies. We demonstrate both a verb frame-based approach using deep natural language processing techniques with extraction patterns developed by human knowledge experts and machine learning methods using shallow linguistic processing. We also propose a method for learning verb frame-based extraction patterns automatically from labeled data. We show that labeled training data can be produced with only minimal human effort by utilizing existing semantic resources and the special characteristics of Wikipedia. Custom solutions for named entity recognition are also possible in this scenario. We present evaluation and comparison of the different approaches for several different relations.