Speecon manually pitch-marked reference database for Spanish
View resource name in all available languages
Base de données Speecon des mesures de la fréquence fondamentale (pitch marking) de l’espagnol
This database is intended for the development and the evaluation of noise robust pitch marking (PMA) and/or pitch determination (PDA) algorithms. The audio data used for the construction of the database was selected as a subset of the Speecon Spanish database (see ELRA-S0160)
The acoustical environments found in this database comprise those of the car interior, the office, and living rooms. The office environment is mostly quiet, and slightly affected by stationary and white noises from computer fans or air-conditioning devices. However, in some of the offices the recordings contain also background voices. The living room recordings (entertainment environment) contain a wider range of noises, less stationary and more colored than the office noises. In some utterances, the radio or TV set is on; consequently, voices can be found in the recordings, as well as music, etc. The reverberations are mostly present in office and entertainment environments.
The Speecon Spanish database was recorded at 16 kHz sampling frequency and quantized using 16-bit linear coding. From this database the recordings of 60 speakers was selected (30 male and 30 female speakers, speaker age from 19 to 79 years). In order to manually construct the reference pitch-marked database under low noise conditions and without reverberation the close talking microphone recordings in the amount of 1 minute per speaker were selected. Thus the reference database comprises 60 minutes of pitch-marked speech signal. In the first step, the 60 minutes of selected close-talking channel speech signal were automatically pitch-marked (epoch marked). In the next step accurate manual rechecking and correcting of pitch marks is performed thus resulting in reference pitch-marked database.
Each session consists of 17 utterances:
• 1 isolated digit sequence
• 1 money amount
• 10 phonetically rich sentences
• 5 phonetically rich isolated words
The following age distribution has been obtained:
40 speakers are between 15 and 30, 11 speakers are between 31 and 45, 8 speakers are between 46 and 60, and 1 speaker is over 60.
View resource description in all available languages