A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Representation and learning in information retrieval
Representation and learning in information retrieval
WordNet: a lexical database for English
Communications of the ACM
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Foundations of Inductive Logic Programming
Foundations of Inductive Logic Programming
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Learning comprehensible theories from structured data
Advanced lectures on machine learning
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Toward general-purpose learning for information extraction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Applying system combination to base noun phrase identification
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Relational learning via propositional algorithms: an information extraction case study
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Support vector inductive logic programming
DS'05 Proceedings of the 8th international conference on Discovery Science
Feature Construction Using Theory-Guided Sampling and Randomised Search
ILP '08 Proceedings of the 18th international conference on Inductive Logic Programming
Foundations and Trends in Databases
Towards semantic annotation supported by dependency linguistics and ILP
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
Data-based research at IIT Bombay
ACM SIGMOD Record
Hi-index | 0.00 |
Machine-generated documents containing semistructured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data, methods like SVMs are able to construct good models for information extraction (IE). But how are the feature-definitions to be obtained in the first place? (We are referring here to the representation problem: selecting good features from the ones defined comes later.) So far, features have been defined manually or by using special-purpose programs: neither approach scaling well to handle the heterogeneity of the data or new domain-specific information. We suggest that Inductive Logic Programming (ILP) could assist in this. Specifically, we demonstrate the use of ILP to define features for seven IE tasks using two disparate sources of information. Our findings are as follows: (1) the ILP system is able to identify efficiently large numbers of good features. Typically, the time taken to identify the features is comparable to the time taken to construct the predictive model; and (2) SVM models constructed with these ILP-features are better than the best reported to date that rely heavily on hand-crafted features. For the ILP practioneer, we also present evidence supporting the claim that, for IE tasks, using an ILP system to assist in constructing an extensional representation of text data (in the form of features and their values) is better than using it to construct intensional models for the tasks (in the form of rules for information extraction).