The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech

by Dietmar Schabus, Michael Pucher and Phil Hoole

Abstract:

In this paper, we describe and analyze a corpus of speech data that we have recorded in multiple modalities simultaneously: facial motion via optical motion capturing, tongue motion via electro-magnetic articulography, as well as conventional video and high-quality audio. The corpus consists of 320 phonetically diverse sentences uttered by a male Austrian German speaker at normal, fast and slow speaking rate. We analyze the influence of speaking rate on phone durations and on tongue motion. Furthermore, we investigate the correlation between tongue and facial motion. The data corpus is available free of charge for research use, including phonetic annotations and a playback software which visualizes the 3D data, from the website http://cordelia.ftw.at/mmascs

View PDF

Reference:

The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech (Dietmar Schabus, Michael Pucher and Phil Hoole), In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), 2014.

Bibtex Entry:

@InProceedings{Schabus2014,
  author    = {Dietmar Schabus and Michael Pucher and Phil Hoole},
  title     = {The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech},
  booktitle = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC)},
  year      = {2014},
  pages     = {3411-3416},
  address   = {Reykjavik, Iceland},
  month     = may,
  abstract  = {In this paper, we describe and analyze a corpus of speech data that we have recorded in multiple modalities simultaneously: facial motion via optical motion capturing, tongue motion via electro-magnetic articulography, as well as conventional video and high-quality audio. The corpus consists of 320 phonetically diverse sentences uttered by a male Austrian German speaker at normal, fast and slow speaking rate. We analyze the influence of speaking rate on phone durations and on tongue motion. Furthermore, we investigate the correlation between tongue and facial motion. The data corpus is available free of charge for research use, including phonetic annotations and a playback software which visualizes the 3D data, from the website http://cordelia.ftw.at/mmascs},
  comment   = {<br><a href="/phd/mmascs/mmascs_example.mp4">Example video</a> (1.7 MB)<br><a href="http://speech.kfs.oeaw.ac.at/mmascs/">Get the data corpus for research purposes</a>},
  file      = {/download/schabus_LREC_2014},
  groups    = {FTW, Visual},
  owner     = {schabus},
  timestamp = {2014.03.09},
  url       = {http://www.lrec-conf.org/proceedings/lrec2014/summaries/192.html},
}