Mining information extraction rules from datasheets without linguistic parsing

Authors:
Rakesh Agrawal;Howard Ho;François Jacquenet;Marielle Jacquenet
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;Université de Saint-Etienne, Saint-Etienne Cedex, France;Université de Saint-Etienne, Saint-Etienne Cedex, France
Venue:
IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Year:
2005

Citing 20
Cited 0

Learning text analysis rules for domain-specific natural language processing

Learning text analysis rules for domain-specific natural language processing
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Continuous querying in database-centric Web applications

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Multistrategy Learning for Information Extraction

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Learning information extraction patterns from examples

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Discovering informative content blocks from Web documents

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Wrapper induction for information extraction

Wrapper induction for information extraction
Learning rules and their exceptions

The Journal of Machine Learning Research
LearningPinocchio: adaptive information extraction for real world applications

Natural Language Engineering
Learning semantic-level information extraction rules by type-oriented ILP

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Introduction to information extraction

AI Communications
Adaptive information extraction from text by rule induction and generalisation

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Adaptive information extraction: core technologies for information agents

Intelligent information agents

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of the Pangea project at IBM, we needed to design an information extraction module in order to extract some information from datasheets. Contrary to several information extraction systems based on some machine learning techniques that need some linguistic parsing of the documents, we propose an hybrid approach based on association rules mining and decision tree learning that does not require any linguistic processing. The system may be parameterized in various ways that influence the efficiency of the information extraction rules we discovered. The experiments show the system does not need a large training set to perform well.