Towards a SOA infrastructure for statistically analysing public health data

  • Authors:
  • Pierpaolo Vittorini;Stefano Necozione;Ferdinando di Orio

  • Affiliations:
  • University of L'Aquila, Coppito, ITALY;University of L'Aquila, Coppito, ITALY;University of L'Aquila, Coppito, ITALY

  • Venue:
  • Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

To respond to the need for interoperable information systems in public health, several proposals based on XML-related technologies are currently available. For instance, the CDA [8] is an architecture developed by the HL7 organization for representing and managing clinical documents, while the PHIN [12] is a CDC infrastructure whose aim is to automatically exchange XML data between public health partners through ebXML compliant SOAP web services [16, 11]. Despite the large efforts spent in developing standards and infrastructures -- though not conclusive -- useful to achieve more effective interoperability among public health information systems, to the best of our knowledge, there are no researches produced so far to statistically analyse biomedical data represented as XML documents. Among the languages which can query XML documents, XPath can perform only basic statistics (e.g. mean, minimum, maximum [13]), while it is known that high-level tests are mandatory for every common analysis. Thus, the sole current possibility is to convert the data stored into such documents into a tabular format, and to use a standard statistical package to perform the analysis. To overcome this limitation, the paper proposes a complete SOA infrastructure, with major concern with the following components: (i) the XFNSE web-service which contains a list of operations implementing the statistical analyses reported in [6]; (ii) a prototype of a service consumer -- called JXFNSE -- which uses the operations exposed by XFNSE to statistically analyse a dataset; (iii) an eXist [14] module which extends XPath by adding a list of functions "tracing" the XFNSE operations. Finally, by computing several performances, the authors discuss the drawbacks of using services while analysing large datasets, and show a possible improvement in terms of a caching mechanism.