Save

Speech Synthesis in the “Mother Tongue”: Designing, Training, and Evaluating a Text-to-Speech System for Yiddish

In: Journal of Jewish Languages
Authors:
Isaac L. Bleaman Assistant Professor, Department of Linguistics, University of California, Berkeley Berkeley, CA USA

Search for other papers by Isaac L. Bleaman in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-0410-7369
,
Jacob J. Webber Doctoral Student, Centre for Speech Technology Research, University of Edinburgh Edinburgh UK

Search for other papers by Jacob J. Webber in
Current site
Google Scholar
PubMed
Close
, and
Samuel K. Lo Data Linguist, Speech Graphics Edinburgh UK

Search for other papers by Samuel K. Lo in
Current site
Google Scholar
PubMed
Close
Download Citation Get Permissions

Access options

Get access to the full article by using one of the access options below.

Institutional Login

Log in with Open Athens, Shibboleth, or your institutional credentials

Login via Institution

Purchase

Buy instant access (PDF download and unlimited online access):

$40.00

Abstract

Although few linguistic corpora are available in Yiddish, there are numerous sources of so-called “found data” that can be adapted for language research, pedagogy, and resource development. We describe the steps taken to create the first speech synthesis (text-to-speech) program in Yiddish. A state-of-the-art TTS model, FastSpeech 2, was trained on a hand-corrected data set consisting of literary texts paired with audio narrations by native speakers of the Polish and Lithuanian dialects. A quantitative evaluation by listeners found that the system produced speech that was both intelligible and natural-sounding. To demonstrate the system’s applications for language pedagogy, we offer a qualitative evaluation of Yiddish phonological features that are present or absent in a sample of synthesized recordings. We hope that the success of speech synthesis in Yiddish will inspire future projects to enable technological support for other minority languages in which transcribed recordings are available.

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 471 471 67
Full Text Views 282 282 4
PDF Views & Downloads 327 327 9