Bayan: an Arabic text database management system

  • Authors:
  • Roger King;Ali Morfeq

  • Affiliations:
  • Department of Computer Science, University of Colorado, Boulder, Colorado;Department of Computer Science, University of Colorado, Boulder, Colorado

  • Venue:
  • SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most existing databases lack features which allow for the convenient manipulation of text. It is even more difficult to use them if the text language is not based on the Roman alphabet. The Arabic language is a very good example of this case. Many projects have attempted to use conventional database systems for Arabic data manipulation (including text data), but because of Arabic's many differences with English, these projects have met with limited success. In the Bayan project, the approach has been different. Instead of simply trying to adopt an environment to Arabic, the properties of the Arabic language were the starting point and everything was designed to meet the needs of Arabic, thus avoiding the shortcomings of other projects. A text database management system was designed to overcome the shortcomings of conventional database management systems in manipulating text data. Bayan's data model is based on an object-oriented approach which helps the extensibility of the system for future use. In Bayan, we designed the database with the Arabic text properties in mind. We designed it to support the way Arabic words are derived, classified, and constructed. Furthermore, linguistic algorithms (for word generation and morphological decomposition of words) were designed, leading to a formalization of rules of Arabic language writing and sentence construction. A user interface was designed on top of this environment. A new representation of the Arabic characters was designed, a complete Arabic keyboard layout was created, and a window-based Arabic user interface was also designed.