Distributed fax message processing system
Journal of Network and Computer Applications
Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
An Ontology-Based Approach to Parsing Turkish Sentences
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Ontology-driven information extraction with ontosyphon
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
A hybrid named entity recognizer for Turkish
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper covers the first research activity in the field of automatic processing of business documents in Turkish. In contrast to traditional information extraction systems which process input text as a linear sequence of words and focus on semantic aspects, proposed approach doesn't ignore document layout information and benefits hints provided by layout analysis. In addition, approach not only checks relations of entities across document for verifying its integrity, but also verifies extracted information against real word data (e.g. customer database). This rule-based approach uses a morphological analyzer for Turkish, a lexicon integrated domain ontology, a document layout analyzer, an extraction ontology and a template mining module. Based on extraction ontology, conceptual sentence analysis increases portability which requires only domain concepts when compared to information extraction systems that rely on large set of linguistic patterns.