Unit Selection Synthesis System

993 Words4 Pages
1 Unit selection synthesis (USS) Based approach 1.1 Unit selection used in the CHATR synthesis system 1.2 Unit selection used in the Festival synthesis system 1.3 Building a unit selection synthesizer using the Festival framework 1 Unit selection synthesis : Even though speech synthesis by concatenation of subword units like diphones produces clear speech, it does not have naturalness mainly because each diphone has only a single example. First of all, signal processing inevitably incurs distortion, and the quality of speech gets worse when the pitch and duration are stretched by large amounts. Furthermore, there are many other subtle effects which are outside the scope of most signal processing algorithms. For instance, the amount of vocal…show more content…
The goal of this method is to select the best sequence of units from all the possibilities in the database, and concatenate them to produce the final speech. By selecting units closest to the target, the extent of signal processing required to produce prosodic characteristics are reduced and thus minimize distortion of the natural waveforms. The unit selection is based on two cost functions. The target cost, Ct(ui, ti), is an estimate of the difference between a database unit, ui, and the target, ti, which it is supposed to represent. The concatenation cost, Cc(ui−1, ui), is an estimate of the quality of a join between consecutive units (ui−1) and (ui). The unit that minimizes both costs is selected. In this section we outline two unit selection techniques used in different speech synthesis systems…show more content…
The speech database containing the candidate units can be viewed as a state transition network with each unit in the database being represented by a separate state. Because any unit can potentially be followed by any other, the network is fully connected. The task of picking the best set of units is performed using the Viterbi algorithm in a similar way to HMM speech recognition. Here the target cost is the observation probability and the concatenation cost is the transition probability. 1.2 Unit selection used in the Festival synthesis system : The Festival [24] synthesis system uses a cluster unit selection technique for selecting speech units from a speech database. In this method, the speech inventory is divided into clusters, where each cluster holds units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is selected for a target unit,offering a small set of candidate units. This process is synonymous to finding the units with lowest target cost as described in the previous section. An optimal path is then found through the candidate units based on their distance from the cluster center and an acoustically based join cost

More about Unit Selection Synthesis System

Open Document