Web-assisted annotation, semantic indexing and search of television and radio news

  • Authors:
  • Mike Dowman;Valentin Tablan;Hamish Cunningham;Borislav Popov

  • Affiliations:
  • University of Sheffield, Sheffield, UK;University of Sheffield, Sheffield, UK;University of Sheffield, Sheffield, UK;Sirma AI EAD, Sofia, Bulgaria

  • Venue:
  • WWW '05 Proceedings of the 14th international conference on World Wide Web
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Rich News system, that can automatically annotate radio and television news with the aid of resources retrieved from the World Wide Web, is described. Automatic speech recognition gives a temporally precise but conceptually inaccurate annotation model. Information extraction from related web news sites gives the opposite: conceptual accuracy but no temporal data. Our approach combines the two for temporally accurate conceptual semantic annotation of broadcast news. First low quality transcripts of the broadcasts are produced using speech recognition, and these are then automatically divided into sections corresponding to individual news stories. A key phrases extraction component finds key phrases for each story and uses these to search for web pages reporting the same event. The text and meta-data of the web pages is then used to create index documents for the stories in the original broadcasts, which are semantically annotated using the KIM knowledge management platform. A web interface then allows conceptual search and browsing of news stories, and playing of the parts of the media files corresponding to each news story. The use of material from the World Wide Web allows much higher quality textual descriptions and semantic annotations to be produced than would have been possible using the ASR transcript directly. The semantic annotations can form a part of the Semantic Web, and an evaluation shows that the system operates with high precision, and with a moderate level of recall.