Adaptive information extraction from text by rule induction and generalisation

Authors:
Fabio Ciravegna
Affiliations:
Department of Computer Science, University of Sheffield, Sheffield, UK
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Year:
2001

Citing 5
Cited 73

Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Relational learning techniques for natural language information extraction

Relational learning techniques for natural language information extraction
Automatic acquisition of domain knowledge for Information Extraction

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2

Amilcare: adaptive information extraction for document annotation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive information extraction for document annotation in amilcare

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Human Language Technologies for Knowledge Management

IEEE Intelligent Systems
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
User-System Cooperation in Document Annotation Based on Information Extraction

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
On deep annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
A maximum entropy approach to information extraction from semi-structured and free text

Eighteenth national conference on Artificial intelligence
Bottom-up relational learning of pattern matching rules for information extraction

The Journal of Machine Learning Research
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
LearningPinocchio: adaptive information extraction for real world applications

Natural Language Engineering
Information Extraction from the Web: System and Techniques

Applied Intelligence
Learning by googling

ACM SIGKDD Explorations Newsletter
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW

WWW '05 Proceedings of the 14th international conference on World Wide Web
Mining web sites using adaptive information extraction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Unsupervised learning of generalized names

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Mining information extraction rules from datasheets without linguistic parsing

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Using HLT for acquiring, retrieving and publishing knowledge in AKT: position paper

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
Generating extraction patterns from a large semantic network and an untagged corpus

SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Ontology-based linguistic annotation

LingAnnot ;03 Proceedings of the ACL 2003 workshop on Linguistic annotation: getting the model right - Volume 19
Annotation for the Deep Web

IEEE Intelligent Systems
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)
Combining Information Extraction Systems Using Voting and Stacked Generalization

The Journal of Machine Learning Research
Hierarchical rule generalisation for speaker identification in fiction books

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Cascading use of soft and hard matching pattern rules for weakly supervised information extraction

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic extraction of paraphrastic phrases from medium size corpora

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Composition of conditional random fields for transfer learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Extraction and use of linguistic patterns for modelling medical guidelines

Artificial Intelligence in Medicine
Exploring phrasal context and error correction heuristics in bootstrapping for geographic named entity annotation

Information Systems
Ontologies as facilitators for repurposing web documents

International Journal of Human-Computer Studies
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
ARE: instance splitting strategies for dependency relation-based information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Ontology based document annotation: trends and open research problems

International Journal of Metadata, Semantics and Ontologies
Negation recognition in medical narrative reports

Information Retrieval
Ontology-based information extraction and integration from heterogeneous data sources

International Journal of Human-Computer Studies
Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Self-supervised relation extraction from the Web

Knowledge and Information Systems
A method for extracting knowledge from medical texts including numerical representation

International Journal of Computer Applications in Technology
Towards a System for Ontology-Based Information Extraction from PDF Documents

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Information Extraction

Foundations and Trends in Databases
Learning context-free grammars to extract relations from text

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
What's in a message?

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
Automating knowledge capture in the aerospace domain

Proceedings of the fifth international conference on Knowledge capture
Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting subjectivity classification to improve information extraction

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Creating relational data from unstructured and ungrammatical data sources

Journal of Artificial Intelligence Research
Bayesian information extraction network

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Semantic annotation of unstructured and ungrammatical text

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Exploiting background knowledge to build reference sets for information extraction

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Introduction to "Rule Transformation and Extraction" Track

RuleML '09 Proceedings of the 2009 International Symposium on Rule Interchange and Applications
A novel strategy for a vertical web page classifier based on continuous learning naïve bayes algorithm

International Journal of Computers and Applications
Ontological technologies for user modelling

International Journal of Metadata, Semantics and Ontologies
Combining relations for information extraction from free text

ACM Transactions on Information Systems (TOIS)
Fuzzy pattern rule induction for information extraction

ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Constructing reference sets from unstructured, ungrammatical text

Journal of Artificial Intelligence Research
Authoring technical documents for effective retrieval

EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
Using web-based knowledge extraction techniques to support cultural modeling

SBP'11 Proceedings of the 4th international conference on Social computing, behavioral-cultural modeling and prediction
Peeling back the layers: detecting event role fillers in secondary contexts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Enabling information extraction by inference of regular expressions from sample entities

Proceedings of the 20th ACM international conference on Information and knowledge management
LKMS: a legal knowledge management system exploiting semantic web technologies

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Self-supervised relation extraction from the web

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
ESpotter: adaptive named entity recognition for web browsing

WM'05 Proceedings of the Third Biennial conference on Professional Knowledge Management
On the need to bootstrap ontology learning with extraction grammar learning

ICCS'05 Proceedings of the 13th international conference on Conceptual Structures: common Semantics for Sharing Knowledge
Extracting structured subject information from digital document archives

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Learning to adapt cross language information extraction wrapper

Applied Intelligence
Bootstrapped training of event extraction classifiers

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Improving recall of regular expressions for information extraction

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Learning to predict from textual data

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

(LP)2 is a covering algorithm for adaptive Information Extraction from text (IE). It induces symbolic rules that insert SGML tags into texts by learning from examples found in a user-defined tagged corpus. Training is performed in two steps: initially a set of tagging rules is learned; then additional rules are induced to correct mistakes and imprecision in tagging. Induction is performed by bottom-up generalization of examples in the training corpus. Shallow knowledge about Natural Language Processing (NLP) is used in the generalization process. The algorithm has a considerable success story. From a scientific point of view, experiments report excellent results with respect to the current state of the art on two publicly available corpora. From an application point of view, a successful industrial IE tool has been based on (LP)2. Real world applications have been developed and licenses have been released to external companies for building other applications. This paper presents (LP)2, experimental results and applications, and discusses the role of shallow NLP in rule induction.