Speech synthesis is imitating of human speech by a computer. It is composed from acoustic and visual speech synthesis. Joint audio-visual synthesis is known as talking head or TTAVS system. Acoustic speech synthesis is synthesis of speech component that can be heard, visual speech synthesis is synthesis of speech component that can be seen. Scheme of the syste, is depicted on Fig.1. The input of the system is the sequence of phonemes with prosodic information. The output of the system is audio-visual animation of speech.
When we use 3D model of a head, obtained by reconstruction from a real head recordings, together with the imitated speech of the voice of the same person, we can call the TTAVS system also a virtual double.