Information extraction from HTML: application of a general machine learning approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
ECML '93 Proceedings of the European Conference on Machine Learning
Learning information extraction patterns from examples
Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Toward general-purpose learning for information extraction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of protagonist in fairy tales using verb
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
VAHA: verbs associate with human activity --- a study on fairy tales
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Hi-index | 0.00 |
Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, and percentages. It is useful in applying NER to other natural language tasks such as question-answering, text summarization, building semantic web, etc. This paper presents a system, called BKIE, that uses SRV -- an inductive logic program - to extract name entities in Vietnamese text. New predicates and features are added to SRV to deal with characteristics of Vietnamese language. Also, several strategies are proposed in this paper to improve the efficiency of the SRV algorithm. The data set using in experiments is 80 homepages of scientists in Vietnamese language that were tagged manually. The experiments give us the best F-score of 83% for extracting the "name" entity. It shows that SRV is an efficient NER algorithm given its advantages of generality and flexibility. In order to increase the system's performance, our future work includes (i) building a larger set of training data to improve system's performance; (ii) implementing BKIE using parallel programming to increase system efficiency; and (iii) testing BKIE with other application domains to get a more accurate evaluation of the system.