Machine Learning for Information Extraction in Informal Domains

Authors:
Dayne Freitag
Affiliations:
Justsystem Pittsburgh Research Center, 4616 Henry Street, Pittsburgh, PA 15213, USA. dayne@justresearch.com
Venue:
Machine Learning - Special issue on information retrieval
Year:
2000

Citing 34
Cited 52

An efficient algorithm for the inference of circuit-free automata

Syntactic and structural pattern recognition
Representation and learning in information retrieval

Representation and learning in information retrieval
C4.5: programs for machine learning

C4.5: programs for machine learning
Experiments on multistrategy learning by meta-learning

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Technical Note: Selecting a Classification Method by Cross-Validation

Machine Learning
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Information extraction as a basis for high-precision text classification

ACM Transactions on Information Systems (TOIS)
Unifying instance-based and rule-based induction

Machine Learning
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A scalable comparison-shopping agent for the World-Wide Web

AGENTS '97 Proceedings of the first international conference on Autonomous agents
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction

IEEE Transactions on Knowledge and Data Engineering
Learning Logical Definitions from Relations

Machine Learning
Rule Induction with CN2: Some Recent Improvements

EWSL '91 Proceedings of the European Working Session on Machine Learning
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Multistrategy Learning for Information Extraction

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Grammatical Inference: An Introduction Survey

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Applying machine learning to anaphora resolution

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Learning Text Analysis Rules for Domain-specific Natural Language Processing

Learning Text Analysis Rules for Domain-specific Natural Language Processing
Wrapper induction for information extraction

Wrapper induction for information extraction
Relational learning techniques for natural language information extraction

Relational learning techniques for natural language information extraction
Machine learning for information extraction in informal domains

Machine learning for information extraction in informal domains
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
BBN: description of the PLUM system as used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
NEC: description of the VENIEX system as used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
CRL/Brandeis: description of the Diderot system as used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
TRW: description of the DEFT system as used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
Hughes Research Laboratories: description of the Trainable Text Skimmer used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
Using decision trees for conference resolution

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Improving learning by choosing examples intelligently in two natural language tasks

Learning language in logic
A brief survey of web data extraction tools

ACM SIGMOD Record
DEByE - Date extraction by example

Data & Knowledge Engineering
Learning Logic Models for Automated Text Categorization

AI*IA 01 Proceedings of the 7th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Bottom-up relational learning of pattern matching rules for information extraction

The Journal of Machine Learning Research
Extracting relational data from HTML repositories

ACM SIGKDD Explorations Newsletter
Probabilistic reasoning for entity & relation recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Title extraction from bodies of HTML documents and its application to web page retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
HW-STALKER: a machine learning-based system for transforming QURE-Pagelets to XML

Data & Knowledge Engineering
Learning from parsed sentences with INTHELEX

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Information extraction from structured documents using k-testable tree automaton inference

Data & Knowledge Engineering
Combining Information Extraction Systems Using Voting and Stacked Generalization

The Journal of Machine Learning Research
Hierarchical rule generalisation for speaker identification in fiction books

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Web wrapper induction: a brief survey

AI Communications
Knowledge representation and reasoning based on entity and relation propagation diagram/tree

Intelligent Data Analysis
Web page title extraction and its application

Information Processing and Management: an International Journal
Hierarchical, perceptron-like learning for ontology-based information extraction

Proceedings of the 16th international conference on World Wide Web
Automated data extraction from the web with conditional models

International Journal of Business Intelligence and Data Mining
A wrapper generation system for PDF documents

Proceedings of the 2008 ACM symposium on Applied computing
Natural language processing and e-Government: crime information extraction from heterogeneous data sources

dg.o '08 Proceedings of the 2008 international conference on Digital government research
Neuro-IG: A Hybrid System for Selection and Elimination of Predictor Variables and non Relevant Individuals

Informatica
Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Extracting Semantic Frames from Thai Medical-Symptom Phrases with Unknown Boundaries

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Information extraction from web documents based on local unranked tree automaton inference

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Relational learning via propositional algorithms: an information extraction case study

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Visual extraction of information from web pages

Journal of Visual Languages and Computing
Neural based approach to keyword extraction from documents

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Learning rules to extract protein interactions from biomedical text

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Extracting sequences from the web

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Evaluating information extraction

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Extracting chemical reactions from Thai text for semantics-based information retrieval

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
Mining economic sentiment using argumentation structures

ER'10 Proceedings of the 2010 international conference on Advances in conceptual modeling: applications and challenges
Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Journal of Information Science
Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos

Knowledge-Based Systems
DOM semantic expansion-based extraction of topical information from web pages

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Enabling information extraction by inference of regular expressions from sample entities

Proceedings of the 20th ACM international conference on Information and knowledge management
A simhash-based scheme for locating product information from the web

Proceedings of the Second Symposium on Information and Communication Technology
Automatic keyphrases extraction from document using neural network

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Ensemble learning for keyphrases extraction from scientific document

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Wrapping PDF documents exploiting uncertain knowledge

CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
SVM based learning system for information extraction

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
A method of recognizing entity and relation

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Ontology creation: extraction of domain knowledge from web documents

ER'05 Proceedings of the 24th international conference on Conceptual Modeling
Multilingual video indexing and retrieval employing an information extraction tool for turkish news texts: a case study

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
The HiLeX system for semantic information extraction

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Neural networks letter: Training the max-margin sequence model with the relaxed slack variables

Neural Networks
Minimum-risk training of approximate CRF-based NLP systems

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Chinese-English mixed text normalization

Proceedings of the 7th ACM international conference on Web search and data mining
Determining the titles of Web pages using anchor text and link analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of learning to performinformation extraction in domains where linguistic processingis problematic, such as Usenet posts, email, and finger plan files.In place of syntactic and semantic information, other sources ofinformation can be used, such as term frequency, typography,formatting, and mark-up. We describe four learning approaches to thisproblem, each drawn from a different paradigm: a rote learner, aterm-space learner based on Naive Bayes, an approach using grammaticalinduction, and a relational rule learner. Experiments on 14information extraction problems defined over four diverse documentcollections demonstrate the effectiveness of these approaches.Finally, we describe a multistrategy approach which combines theselearners and yields performance competitive with or better than thebest of them. This technique is modular and flexible, and could findapplication in other machine learning problems.