Project Description

Speech Synthesis Based on Fluid Dynamic Principles

Project Description

The human vocal system, which produces the acoustic signals corresponding to speech, is a complex physiological mechanism. To date, modeling of sound generation in this system has depended upon linear acoustics, one-dimensional sound propagation in conduits and upon very primitive approximations to the non-linear and flow-dependent factors in sound generation.

Speech synthesis technologies can be classified into three categories: contatenative synthesis, formant synthesis and articulatory synthesis. The concatenative synthesis is currently the most popular technique in commercial and research text-to-speech (TTS) systems. This method relies on extracting model parameters from speech data and concatenating segmental units (such as diphone) to create new utterances. The formant synthesis is a parametric approach which applies a set of rules for controlling the frequencies and amplitudes of the formants and the characteristics of the excitation source. Although these approaches have achieved a remarkable level of success, there are still limitations in the speech produces by these techniques ( e.g. , plane-wave propagation assumption, linear approximation of the non-linear speech production, the need for improvement of the naturalness and intelligibility of the synthesized speech).

An alternate approach is based on direct computational solutions of more fundamental Navier-Stokes equations. These equations describe fluid flow based on first principles of physics. Our research involves modeling speech generation from first principles of physics. A major focus is speech synthesis from an articulatory description in which the physical mechanisms of sound generation and spectral modification are represented in detail. In this project, we try to investigate three major problems. How can we build an compact while complete articulatory model? How can we find the articulatory movement patterns and design the motor control strategy of the articulator? How can we investigate the energy exchange between the convective and propagative components of the fluid flow and the effect of nonlinearity on speech production? The computational power needed for this new approach of speech synthesis is immense. To accomodate this, the Cray Origin2000 supercomputer at the NCSA of UIUC is used to run Reynold-Averaged-Navier-Stokes (RANS) solver developed at Electric Boat Corporation.