FIXED0IT - DB2 – META-SHARE

Last view: 2024-07-08

14 Last view: 2024-07-08

FIXED0IT - DB2

http://catalog.elra.info/product_info.php?products_id=186

ID:

ELRA-S0053

DB1 Phonetically rich sentences & application oriented utterances

The Italian Fixed Network Speech SpeechDat(M) Corpus version 1.0 was recorded within the scope of the SpeechDat(M) project (LRE-63314), funded by the European Commission. Recording was done by using a primary rate ISDN interface, yielding 8 kHz, 8 bits per sample, A-law coded signal. The data files are formatted according to the SAM European project. The speech data are compressed with the GNU gzip program. All software needed to use the corpus is provided on the CDs.

The corpus contains the speech of about 1,000 speakers (about 500 males and 500 females) and was designed to support the creation of voice-driven teleservices. The callers spoke at least 39 items, comprising:

* isolated and connected digits
* natural numbers
* money amounts
* spelled words
* time and date phrases
* yes/no questions
* city names
* common application words
* application words in phrases
* phonetically rich sentences

Most items are read, some are spontaneously spoken.

The recordings come with extensive and standardised documentation. All speech is carefully transcribed at the orthographic level; in addition, a number of clearly audible non-speech events are included in the transcription. Moreover, age and regional background of the speakers are provided. A pronunciation dictionary is added, containing all words that occur in the corpus, with a corresponding SAMPA broad-class phonemic transcription.

Validation and premastering of the CD-ROMs were performed by the Speech Processing Expertise Centre (SPEX), Leidschendam, The Netherlands.

DB2 Phonetically rich sentences sub-set (S0053)

See ELRA-S0052 for description. DB2 is a sub-set of DB1; it contains only the phonetically rich sentences items

View resource description in all available languages

La version 1.0 du corpus de parole italien sur réseau fixe a été enregistrée dans le cadre du projet SpeechDat(M) (LRE-63314), soutenu par la Commission Européenne. L'enregistrement a été réalisé en utilisant une interface numéris (ISDN), avec une fréquence d'échantillonnage de 8 kHz, de 8 bits par échantillon et un signal codé en loi-A. Les données ont été formatées d'après les standards du projet européen SAM. Les données sont compressées par le programme GNU gzip. Les logiciels nécessaires à la bonne utilisation du corpus sont fournis avec les données sur les CD.

Le corpus contient des enregistrements d'environ 1000 locuteurs (500 hommes et 500 femmes). Il a pour but d'aider à la réalisation de services télématiques vocaux. Chaque locuteur a prononcé plus de 39 éléments, dont :

* chiffres isolés et connectés,
* nombres entiers naturels,
* montants,
* mots épelés,
* dates et phrases comportant une notion de temps,
* questions oui/non,
* noms de villes,
* mots de commande courants
* mots de commandes inclus dans des phrases,
* phrases phonétiquement riches.

La plupart de ces éléments sont lus, certains sont spontanés.

Les enregistrements sont fournis avec une documentation complète selon les standards SpeechDat(M). L'intégralité des enregistrements a été soigneusement transcrite au niveau orthographique ; de même, un certain nombre d'événements acoustiques clairement audibles sont inclus dans la transcription. L'âge et l'origine régionale des locuteurs sont également fournis. Un dictionnaire de prononciation est inclus dans les CD. Celui-ci contient tous les mots apparaissant dans le corpus, avec une transcription phonémique correspondante (utilisation de SAMPA).

La validation et la préparation des CD-ROM ont été réalisées par SPEX, Leidschendam, Pays-Bas.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 07/08/1998

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Academic

Contact Person

Mapelli Valérie

audio

Monolingual audio corpusLanguages

Italian

Linguality

Linguality type: Monolingual

Size

no size available

Resource Creation

Funding Project

SpeechDat(M)

Funding Type: Eu Funds

Metadata

Created: 05/12/2005

Version

Version: 1.0

Last Updated: 02/22/2007

People who looked at this resource also viewed the following: