Software: Apache/2.2.3 (CentOS). PHP/5.1.6 uname -a: Linux mx-ll-110-164-51-230.static.3bb.co.th 2.6.18-194.el5PAE #1 SMP Fri Apr 2 15:37:44 uid=48(apache) gid=48(apache) groups=48(apache) Safe-mode: OFF (not secure) /usr/share/doc/festival-1.95/ drwxr-xr-x |
Viewing file: festival_20.html (14.49 KB) -rw-r--r-- Select action/file-type: (+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |
20 UniSyn synthesizerSince 1.3 a new general synthesizer module has been included. This designed to replace the older diphone synthesizer described in the next chapter. A redesign was made in order to have a generalized waveform synthesizer, singla processing module that could be used even when the units being concatenated are not diphones. Also at this stage the full diphone (or other) database pre-processing functions were added to the Speech Tool library. 20.1 UniSyn database formatThe Unisyn synthesis modules can use databases in two basic formats, separate and grouped. Separate is when all files (signal, pitchmark and coefficient files) are accessed individually during synthesis. This is the standard use during databse development. Group format is when a database is collected together into a single special file containing all information necessary for waveform synthesis. This format is designed to be used for distribution and general use of the database. A database should consist of a set of waveforms, (which may be translated into a set of coefficients if the desired the signal processing method requires it), a set of pitchmarks and an index. The pitchmarks are necessary as most of our current signal processing are pitch synchronous. 20.1.1 Generating pitchmarks
Pitchmarks may be derived from laryngograph files using the our
proved program `pitchmark' distributed with the speech
tools. The actual parameters to this program are still a bit of
an art form. The first major issue is which direction the lar
files. We have seen both, though it does seem to be CSTR's ones
are most often upside down while others (e.g. OGI's) are the right way
up. The pitchmark -inv lar/file001.lar -o pm/file001.pm -otype est \ -min 0.005 -max 0.012 -fill -def 0.01 -wave_end The `-min', `-max' and `-def' (fill values for unvoiced regions), may need to be changed depending on the speaker pitch range. The above is suitable for a male speaker. The `-fill' option states that unvoiced sections should be filled with equally spaced pitchmarks. 20.1.2 Generating LPC coefficientsLPC coefficients are generated using the `sig2fv' command. Two stages are required, generating the LPC coefficients and generating the residual. The prototypical commands for these are sig2fv wav/file001.wav -o lpc/file001.lpc -otype est -lpc_order 16 \ -coefs "lpc" -pm pm/file001.pm -preemph 0.95 -factor 3 \ -window_type hamming sigfilter wav/file001.wav -o lpc/file001.res -otype nist \ -lpcfilter lpc/file001.lpc -inv_filter For some databases you may need to normalize the power. Properly normalizing power is difficult but we provide a simple function which may do the jobs acceptably. You should do this on the waveform before lpc analysis (and ensure you also do the residual extraction on the normalized waveform rather than the original. ch_wave -scaleN 0.5 wav/file001.wav -o file001.Nwav This normalizes the power by maximizing the signal first then multiplying it by the given factor. If the database waveforms are clean (i.e. no clicks) this can give reasonable results. 20.2 Generating a diphone indexThe diphone index consists of a short header following by an ascii list of each diphone, the file it comes from followed by its start middle and end times in seconds. For most databases this files needs to be generated by some database specific script. An example header is EST_File index DataType ascii NumEntries 2005 IndexName rab_diphone EST_Header_End The most notable part is the number of entries, which you should note can get out of sync with the actual number of entries if you hand edit entries. I.e. if you add an entry and the system still can't find it check that the number of entries is right. The entries themselves may take on one of two forms, full entries or index entries. Full entries consist of a diphone name, where the phones are separated by "-"; a file name which is used to index into the pitchmark, LPC and waveform file; and the start, middle (change over point between phones) and end of the phone in the file in seconds of the diphone. For example r-uh edx_1001 0.225 0.261 0.320 r-e edx_1002 0.224 0.273 0.326 r-i edx_1003 0.240 0.280 0.321 r-o edx_1004 0.212 0.253 0.320 The second form of entry is an index entry which simply states that reference to that diphone should actually be made to another. For example aa-ll &aa-l
This states that the diphone Some checks are made one reading this index to ensure times etc are reasonable but multiple entries for the same diphone are not, in that case the later one will be selected. 20.3 Database declarationThere two major types of database grouped and ungrouped. Grouped databases come as a single file containing the diphone index, coeficinets and residuals for the diphones. This is the standard way databases are distributed as voices in Festoval. Ungrouped access diphones from individual files and is designed as a method for debugging and testing databases before distribution. Using ungrouped dataabse is slower but allows quicker changes to the index, and associated coefficient files and residuals without rebuilding the group file.
A database is declared to the system through the command
An example database definition is (set! rab_diphone_dir "/projects/festival/lib/voices/english/rab_diphone") (set! rab_lpc_group (list '(name "rab_lpc_group") (list 'index_file (path-append rab_diphone_dir "group/rablpc16k.group")) '(alternates_left ((i ii) (ll l) (u uu) (i@ ii) (uh @) (a aa) (u@ uu) (w @) (o oo) (e@ ei) (e ei) (r @))) '(alternates_right ((i ii) (ll l) (u uu) (i@ ii) (y i) (uh @) (r @) (w @))) '(default_diphone @-@@) '(grouped "true"))) (us_dipohone_init rab_lpc_group) 20.4 Making groupfiles
The function
20.5 UniSyn module selectionIn a voice selection a UniSyn database may be selected as follows (set! UniSyn_module_hooks (list rab_diphone_const_clusters )) (set! us_abs_offset 0.0) (set! window_factor 1.0) (set! us_rel_offset 0.0) (set! us_gain 0.9) (Parameter.set 'Synth_Method 'UniSyn) (Parameter.set 'us_sigpr 'lpc) (us_db_select rab_db_name)
The An optional implementation of TD-PSOLA moulines90 has been written but fear of legal problems unfortunately prevents it being in the public distribution, but this policy should not be taken as acknowledging or not acknowledging any alleged patent violation. 20.6 Diphone selection
Diphone names are constructed for each phone-phone pair in the Segment
relation in an utterance. If a segment has the feature in forming a
diphone name UniSyn first checks for the feature
This feature is used to specify consonant cluster diphone names
for our English voices. The hook
Once the diphone name is created it is used to select the diphone from
the database. If it is not found the name is converted using the list
of Go to the first, previous, next, last section, table of contents. |
:: Command execute :: | |
:: Shadow's tricks :D :: | |
Useful Commands
|
:: Preddy's tricks :D :: | |
Php Safe-Mode Bypass (Read Files)
|
--[ c999shell v. 1.0 pre-release build #16 Modded by Shadow & Preddy | RootShell Security Group | r57 c99 shell | Generation time: 0.0127 ]-- |