A large speech database for Brazilian Portuguese spoken language research

  • Authors:
  • Carlos Alberto Ynoguti;Plínio Almeida Barbosa;Fábio Violaro

  • Affiliations:
  • Instituto Nacional de Telecomunicações, Departamento de Telecomunicações, Santa Rita do Sapucaí, MG, Brazil;Universidade Estadual de Campinas, Instituto de Estudos da Linguagem, Campinas, SP, Brazil;Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica, Campinas, SP, Brazil

  • Venue:
  • PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech recognition systems use statistical methods based algorithms, and therefore need several training samples to perform properly. Consequently such systems require huge databases for training and testing. The development of large speech corpora in Europe and in the USA was possible only with the cooperation among research centers, universities, private companies and the government. In these countries, the availability of such databases provided the resources which were responsible for the great improvement in speech technologies in the last few years. In Brazil, such consortiums are not even mentioned, and the researchers have to work with small, locally developed databases. In this article we report an effort to develop a large speech corpus for Brazilian Portuguese to fill this crucial gap.