Persian Speech Corpus

59 Last view: 2024-05-05

Persian Speech Corpus

View resource name in all available languages

Corpus oral persan

http://catalog.elra.info/product_info.php?products_id=1307

ID:

ELRA-S0393

This about 2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in Persian (Tehrani accent) by one male speaker using a professional studio, through a "Blubbery" model microphone of "Blue" brand with "Presonus Studio Channel” as preamp and compressor. It has been recorded by "Reaper" software, and some plugins for enhancing his voice. Synthesized speech as an output using this corpus has produced a high quality, natural voice.

This package includes:
- 399 .wav files containing spoken utterances.
- 399 .lab files containing phonetic utterances.
- 399 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. These files can be opened using Praat software (see http://www.fon.hum.uva.nl/praat).
- aligned.mlf which contains the HTS friendly alignments.
- orthographic transcriptions are gathered in one single text file (orthographic-transcript.txt) which has the form "[wav_filename]" "[Orthographic Transcript]" in every line.

Persian Speech Corpus by Nawar Halabi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

View resource description in all available languages

Ce corpus oral d’environ 2,5 heures a été développé en utilisant la même méthodogie utilisée dans le cadre d’un travail de thèse réalisé par Nawar Halabi à l’Université de Southampton. Ce corpus a été enregistré en persan (accent de Téhéran) par un locuteur homme dans un studio professionnel, à partir d’un microphone de modèle "Blubbery" de la marque "Blue" avec une console "Presonus Studio Channel” comme préampli et compresseur. Il a été enregistré via le logiciel "Reaper" accompagné de quelques plugins pour améliorer la voix. La parole synthétisée donnée résultant de ce corpus a permis de produire une voix naturelle de haute qualité.

Ce package contient:
- 399 fichiers .wav contenant les occurrences orales,
- 399 fichiers .lab contenant les occurrences textuelles,
- 399 fichiers .TextGrid contenant les étiquettes de phonèmes et les marques temporelles des limites d’occurrences telles qu’elles apparaissent dans les fichiers .wav. Ces fichiers peuvent être ouverts en utilisant le logiciel Praat (voir http://www.fon.hum.uva.nl/praat/),
- le fichiert aligned.mlf qui contient les alignements HTS,
- les transcriptions orthographiques sont regroupées dans un fichier texte unique (orthographic-transcript.txt) se présentant sous la forme "[nom de fichier wav]" "[Transcription orthographique]" à chaque ligne.

Le corpus oral persan de Nawar Halabi est disponible sous license Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 07/05/2017

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 4,000.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 4,000.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 5,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 5,000.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

Contact Person

Mapelli Valérie

audio

Monolingual audio corpusLanguages

Persian

Linguality

Linguality type: Monolingual

Size

no size available

Metadata

Created: 05/12/2005

Version

Version: 1.0

Last Updated: 07/05/2017

People who looked at this resource also viewed the following: