Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis

by Markus Toman, Michael Pucher, Sylvia Moosmüller and Dietmar Schabus
Abstract:
Abstract This paper presents an unsupervised method that allows for gradual interpolation between language varieties in statistical parametric speech synthesis using Hidden Semi-Markov Models (HSMMs). We apply dynamic time warping using Kullback–Leibler divergence on two sequences of HSMM states to find adequate interpolation partners. The method operates on state sequences with explicit durations and also on expanded state sequences where each state corresponds to one feature frame. In an intelligibility and dialect rating subjective evaluation of synthesized test sentences, we show that our method can generate intermediate varieties for three Austrian dialects (Viennese, Innervillgraten, Bad Goisern). We also provide an extensive phonetic analysis of the interpolated samples. The analysis includes input-switch rules, which cover historically different phonological developments of the dialects versus the standard language; and phonological processes, which are phonetically motivated, gradual, and common to all varieties. We present an extended method which linearly interpolates phonological processes but uses a step function for input-switch rules. Our evaluation shows that the integration of this kind of phonological knowledge improves dialect authenticity judgment of the synthesized speech, as performed by dialect speakers. Since gradual transitions between varieties are an existing phenomenon, we can use our methods to adapt speech output systems accordingly.
Reference:
Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis (Markus Toman, Michael Pucher, Sylvia Moosmüller and Dietmar Schabus), In Speech Communication, volume 72, 2015.
Bibtex Entry:
@Article{Toman2015,
  Title                    = {Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis},
  Author                   = {Markus Toman and Michael Pucher and Sylvia Moosmüller and Dietmar Schabus},
  Journal                  = {Speech Communication},
  Year                     = {2015},

  Month                    = sep,
  Pages                    = {176 - 193},
  Volume                   = {72},

  Abstract                 = {Abstract This paper presents an unsupervised method that allows for gradual interpolation between language varieties in statistical parametric speech synthesis using Hidden Semi-Markov Models (HSMMs). We apply dynamic time warping using Kullback–Leibler divergence on two sequences of {HSMM} states to find adequate interpolation partners. The method operates on state sequences with explicit durations and also on expanded state sequences where each state corresponds to one feature frame. In an intelligibility and dialect rating subjective evaluation of synthesized test sentences, we show that our method can generate intermediate varieties for three Austrian dialects (Viennese, Innervillgraten, Bad Goisern). We also provide an extensive phonetic analysis of the interpolated samples. The analysis includes input-switch rules, which cover historically different phonological developments of the dialects versus the standard language; and phonological processes, which are phonetically motivated, gradual, and common to all varieties. We present an extended method which linearly interpolates phonological processes but uses a step function for input-switch rules. Our evaluation shows that the integration of this kind of phonological knowledge improves dialect authenticity judgment of the synthesized speech, as performed by dialect speakers. Since gradual transitions between varieties are an existing phenomenon, we can use our methods to adapt speech output systems accordingly.},
  Comment                  = {<br><a href="http://mtoman.neuratec.com/thesis/interpolation/">Speech samples</a>},
  Doi                      = {10.1016/j.specom.2015.06.005},
  File                     = {http://dx.doi.org/10.1016/j.specom.2015.06.005},
  ISSN                     = {0167-6393},
  Keywords                 = {HMM-based speech synthesis},
  Owner                    = {schabus},
  Timestamp                = {2016.02.16}
}