Machine learning for information extraction in informal domains
Machine learning for information extraction in informal domains
Data & Knowledge Engineering - NLDB2002
A statistical information extraction system for Turkish
Natural Language Engineering
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Web-assisted annotation, semantic indexing and search of television and radio news
WWW '05 Proceedings of the 14th international conference on World Wide Web
An Integrated Architecture for Processing Business Documents in Turkish
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Named Entity Recognition Experiments on Turkish Texts
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Business information extraction from semi-structured webpages
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
An Intelligent information segmentation approach to extract financial data for business valuation
Expert Systems with Applications: An International Journal
RitroveRAI: a web application for semantic indexing and hyperlinking of multimedia news
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
A hybrid approach to Arabic named entity recognition
Journal of Information Science
Hi-index | 12.05 |
Named entity recognition is an important subfield of the broader research area of information extraction from textual data. Yet, named entity recognition research conducted on Turkish texts is still rare as compared to related research carried out on other languages such as English, Spanish, Chinese, and Japanese. In this study, we present a hybrid named entity recognizer for Turkish, which is based on a manually engineered rule based recognizer that we have proposed. Since rule based systems for specific domains require their knowledge sources to be manually revised when ported to other domains, we enrich our rule based recognizer and turn it into a hybrid recognizer so that it learns from annotated data when available and improves its knowledge sources accordingly. The hybrid recognizer is originally engineered for generic news texts, but with its learning capability, it is improved to be applicable to that of financial news texts, historical texts, and child stories as well, without human intervention. Both the hybrid recognizer and its rule based predecessor are evaluated on the same corpora and the hybrid recognizer achieves better results as compared to its predecessor. The proposed hybrid named entity recognizer is significant since it is the first hybrid recognizer proposal for Turkish addressing the above porting problem considering that Turkish possesses different structural properties compared to widely studied languages such as English and there is very limited information extraction research conducted on Turkish texts. Moreover, the employment of the proposed hybrid recognizer for semantic video indexing is shown as a case study on Turkish news videos. The genuine textual and video corpora utilized throughout the paper are compiled and annotated by the authors due to the lack of publicly available annotated corpora for information extraction research on Turkish texts.