Performance analysis of different intonation models in Kannada speech synthesis
Sadashiva Veerappa Chakrasali, Krishnappa Indira, Sunitha Yariyur Narasimhaiah, Shadaksharaiah Chandraiah
Abstract
Text to speech (TTS) is a system that generates artificial speech from text input. The prosodic models used improve the quality of the synthesized speech especially naturalness and intelligibility. The prosody involves intonation, intonation refers to the variations in the pitch frequency (F0) with respect to time in an utterance. This work mainly concentrates on building feedback neural network model to predict F0 contour in the utterances using Fujisaki intonation model parameters as the input features to the network since the Fujisaki intonation model is data driven and not a rule based one. In this work we have built 4-layer feedback neural network in the festival framework. Finally, the synthetically generated Kannada speech using the neural network model, is compared for its performance with the classification and regression tree (CART) model and Tilt model. Database of simple declarative Kannada sentences created by Carnegie Mellon University have been deployed in this work. From the study it is very clear that F0 contours can be accurately predicted using CART and neural network models, whereas naturalness and intelligibility is high in CART model rather than neural network model.
Keywords
CART model; Fujisaki parameters; Intonation models; Kannada TTS; Neural network; Pitch frequency; Tilt model;
DOI:
http://doi.org/10.11591/ijeecs.v26.i1.pp243-252
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).
IJEECS visitor statistics