Handling the taxonomic structure of biological data

  • Authors:
  • Robert Allkin;Richard J. White;Peter J. Winfield

  • Affiliations:
  • Computing, Royal Botanic Gardens, Kew Richmond, Surrey, TW9 3AE, U.K.;Biology Department, 462, Southampton University Southampton, SO9 3TU, U.K.;Scientific Services, S.O.A.F.D. East Craigs, Edinburgh, EH12 8NJ, U.K.

  • Venue:
  • Mathematical and Computer Modelling: An International Journal
  • Year:
  • 1992

Quantified Score

Hi-index 0.98

Visualization

Abstract

The taxonomic principles underlying the organization of biological data are even more relevant today to the construction of biological computer databases than they were for the classification and arrangement of museum specimens or in publishing the results of biological studies. Biological information incorporates a rulebase describing the complex inter-relationships between (i) the names of organisms (ii) those names and the organisms to which they refer and (iii) the organisms themselves. Such a rulebase provides a core (an inventory of organisms with all their alternative names) to which descriptive data about organisms such as their molecular biology, chemistry and distribution may be added. Different ''views'' of this taxonomic core are necessary for diverse purposes. The descriptive data itself is far more complex than is apparent to biologists, accustomed to using their experience and knowledge to interpret and manage their data manually. If biological information systems are to be equally effective, that knowledge must be made explicit and incorporated within databases. Any biologically related information system should incorporate data storage structures modelling these logical relationships, algorithms that mirror biological practice to manipulate these structures and an interface allowing biologists to use familiar concepts and terminology. Biological database designs frequently have deliberately or unwittingly oversimplified data structures. Biologists consequently are discouraged from building databases because of the labour and difficulties that arise, unnecessarily, through a lack of suitable software. We describe the design for such a system ('Baobab') that could be used as a taxonomic data management system or stand as the core module of other biological information systems. The data structures provide essential facilities not provided in existing taxonomically organized databases or in other biological information systems. The Baobab design has been partially implemented in the Alice system which combines a taxonomic core with geographical and descriptive information about those organisms. Using an interface intended for ordinary biologists, Alice enables end users to tailor their database to match their own requirements. Biologists are hampered from building databases and benefiting from existing technology through lack of suitable software. Database projects continue to develop programs for their own immediate needs, though this is rarely cost effective in the medium to long term. The resulting duplication of effort within our community is a waste of scant resources. The development of reliable and easily used software is expensive and labour intensive but the potential benefits are enormous. If the current dearth of software for ordinary biologists is to end, software development requires not only more funds but still more crucially, to be better coordinated and directed.