CONTENTUS--technologies for next generation multimedia libraries

  • Authors:
  • Jan Nandzik;Berenike Litz;Nicolas Flores-Herr;Aenne Löhden;Iuliu Konya;Doris Baum;André Bergholz;Dirk Schönfuβ;Christian Fey;Johannes Osterhoff;Jörg Waitelonis;Harald Sack;Ralf Köhler;Patrick Ndjiki-Nya

  • Affiliations:
  • Acosta Consult GmbH, Frankfurt am Main, Germany 60318;Deutsche Nationalbibliothek, Informationstechnik, Frankfurt am Main, Germany 60322;Acosta Consult GmbH, Frankfurt am Main, Germany 60318;Deutsche Nationalbibliothek, Informationstechnik, Frankfurt am Main, Germany 60322;Fraunhofer IAIS, Sankt Augustin, Germany 53754;Fraunhofer IAIS, Sankt Augustin, Germany 53754;Fraunhofer IAIS, Sankt Augustin, Germany 53754;mufin GmbH, Büro Dresden, Dresden, Germany 01219;Institut für Rundfunktechnik GmbH, Production Systems TV, München, Germany 80939;Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam, Germany 14482;Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam, Germany 14482;Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam, Germany 14482;Technicolor - Corporate Research Division, Hanover Image Processing Lab, Deutsche Thomson OHG, Hannover, Germany 30625;Fraunhofer-Institut für Nachrichtentechnik Heinrich-Hertz-Institut, Berlin, Germany 10587

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

An ever-growing amount of digitized content urges libraries and archives to integrate new media types from a large number of origins such as publishers, record labels and film archives, into their existing collections. This is a challenging task, since the multimedia content itself as well as the associated metadata is inherently heterogeneous--the different sources lead to different data structures, data quality and trustworthiness. This paper presents the contentus approach towards an automated media processing chain for cultural heritage organizations and content holders. Our workflow allows for unattended processing from media ingest to availability thorough our search and retrieval interface. We aim to provide a set of tools for the processing of digitized print media, audio/visual, speech and musical recordings. Media specific functionalities include quality control for digitization of still image and audio/visual media and restoration of the most common quality issues encountered with these media. Furthermore, the contentus tools include modules for content analysis like segmentation of printed, audio and audio/visual media, optical character recognition (OCR), speech-to-text transcription, speaker recognition and the extraction of musical features from audio recordings, all aimed at a textual representation of information inherent within the media assets. Once the information is extracted and transcribed in textual form, media independent processing modules offer extraction and disambiguation of named entities and text classification. All contentus modules are designed to be flexibly recombined within a scalable workflow environment using cloud computing techniques. In the next step analyzed media assets can be retrieved and consumed through a search interface using all available metadata. The search engine combines Semantic Web technologies for representing relations between the media and entities such as persons, locations and organizations with a full-text approach for searching within transcribed information gathered through the preceding processing steps. The contentus unified search interface integrates text, images, audio and audio/visual content. Queries can be narrowed and expanded in an exploratory manner, search results can be refined by disambiguating entities and topics. Further, semantic relationships become not only apparent, but can also be navigated.