Software: Apache/2.2.3 (CentOS). PHP/5.1.6 uname -a: Linux mx-ll-110-164-51-230.static.3bb.co.th 2.6.18-194.el5PAE #1 SMP Fri Apr 2 15:37:44 uid=48(apache) gid=48(apache) groups=48(apache) Safe-mode: OFF (not secure) /usr/share/doc/festival-1.95/ drwxr-xr-x |
Viewing file: festival_21.html (16.72 KB) -rw-r--r-- Select action/file-type: (+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |
21 Diphone synthesizer
NOTE: use of this diphone synthesis is depricated and it
will probably be removed from future versions, all of its functionality
has been replaced by the UniSyn synthesizer. It is not
compiled by default, if required add A basic diphone synthesizer offers a method for making speech from segments, durations and intonation targets. This module was mostly written by Alistair Conkie but the base diphone format is compatible with previous CSTR diphone synthesizers. The synthesizer offers residual excited LPC based synthesis (hunt89) and PSOLA (TM) (moulines90) (PSOLA is not available for distribution). 21.1 Diphone database formatA diphone database consists of a dictionary file, a set of waveform files, and a set of pitch mark files. These files are the same format as the previous CSTR (Osprey) synthesizer. The dictionary file consist of one entry per line. Each entry consists of five fields: a diphone name of the form P1-P2, a filename (without extension), a floating point start position in the file in milliseconds, a mid position in milliseconds (change in phone), and an end position in milliseconds. Lines starting with a semi-colon and blank lines are ignored. The list may be in any order. For example a partial list of phones may look like. ch-l r021 412.035 463.009 518.23 jh-l d747 305.841 382.301 446.018 h-l d748 356.814 403.54 437.522 #-@ d404 233.628 297.345 331.327 @-# d001 836.814 938.761 1002.48 Waveform files may be in any form, as long as every file is the same type, headered or unheadered as long as the format is supported the speech tools wave reading functions. These may be standard linear PCM waveform files in the case of PSOLA or LPC coefficients and residual when using the residual LPC synthesizer. section 21.2 LPC databases Pitch mark files consist a simple list of positions in milliseconds (plus places after the point) in order, one per line of each pitch mark in the file. For high quality diphone synthesis these should be derived from laryngograph data. During unvoiced sections pitch marks should be artificially created at reasonable intervals (e.g. 10 ms). In the current format there is no way to determine the "real" pitch marks from the "unvoiced" pitch marks. It is normal to hold a diphone database in a directory with a number of sub-directories namely `dic/' contain the dictionary file, `wave/' for the waveform files, typically of whole nonsense words (sometimes this directory is called `vox/' for historical reasons) and `pm/' for the pitch mark files. The filename in the dictionary entry should be the same for waveform file and the pitch mark file (with different extensions). 21.2 LPC databasesThe standard method for diphone resynthesis in the released system is residual excited LPC (hunt89). The actual method of resynthesis isn't important to the database format, but if residual LPC synthesis is to be used then it is necessary to make the LPC coefficient files and their corresponding residuals. Previous versions of the system used a "host of hacky little scripts" to this but now that the Edinburgh Speech Tools supports LPC analysis we can provide a walk through for generating these. We assume that the waveform file of nonsense words are in a directory called `wave/'. The LPC coefficients and residuals will be, in this example, stored in `lpc16k/' with extensions `.lpc' and `.res' respectively.
Before starting it is worth considering power normalization. We have
found this important on all of the databases we have collected so far.
The The following shell command generates the files for i in wave/*.wav do fname=`basename $i .wav` echo $i lpc_analysis -reflection -shift 0.01 -order 18 -o lpc16k/$fname.lpc \ -r lpc16k/$fname.res -otype htk -rtype nist $i done It is said that the LPC order should be sample rate divided by one thousand plus 2. This may or may not be appropriate and if you are particularly worried about the database size it is worth experimenting. The program `lpc_analysis', found in `speech_tools/bin', can be used to generate the lpc coefficients and residual. Note these should be reflection coefficients so they may be quantised (as they are in group files).
The coefficients and residual files produced by different LPC analysis
programs may start at different offsets. For example the Entropic's ESPS
functions generate LPC coefficients that are offset by one frame shift
(e.g. 0.01 seconds). Our own `lpc_analysis' routine has no offset.
The (lpc_frame_offset 0) (lpc_res_offset 0.0) While when generating using ESPS routines the description should be (lpc_frame_offset 1) (lpc_res_offset 0.01)
The defaults actually follow the ESPS form, that is Note the biggest problem we have in implementing the residual excited LPC resynthesizer was getting the right part of the residual to line up with the right LPC coefficients describing the pitch mark. Making errors in this degrades the synthesized waveform notably, but not seriously, making it difficult to determine if it is an offset problem or some other bug. Although we have started investigating if extracting pitch synchronous LPC parameters rather than fixed shift parameters gives better performance, we haven't finished this work. `lpc_analysis' supports pitch synchronous analysis but the raw "ungrouped" access method does not yet. At present the LPC parameters are extracted at a particular pitch mark by interpolating over the closest LPC parameters. The "group" files hold these interpolated parameters pitch synchronously. The American English voice `kd' was created using the speech tools `lpc_analysis' program and its set up should be looked at if you are going to copy it. The British English voice `rb' was constructed using ESPS routines. 21.3 Group filesDatabases may be accessed directly but this is usually too inefficient for any purpose except debugging. It is expected that group files will be built which contain a binary representation of the database. A group file is a compact efficient representation of the diphone database. Group files are byte order independent, so may be shared between machines of different byte orders and word sizes. Certain information in a group file may be changed at load time so a database name, access strategy etc. may be changed from what was set originally in the group file. A group file contains the basic parameters, the diphone index, the signal (original waveform or LPC residual), LPC coefficients, and the pitch marks. It is all you need for a run-time synthesizer. Various compression mechanisms are supported to allow smaller databases if desired. A full English LPC plus residual database at 8k ulaw is about 3 megabytes, while a full 16 bit version at 16k is about 8 megabytes.
Group files are created with the Group files may be partially loaded (see access strategies) at run time for quicker start up and to minimise run-time memory requirements. 21.4 Diphone_Init
The basic method for describing a database is through the
Examples of both general set up, making group files and general use are in `lib/voices/english/rab_diphone/festvox/rab_diphone.scm' 21.5 Access strategiesThree basic accessing strategies are available when using diphone databases. They are designed to optimise access time, start up time and space requirements.
Note that in group files pitch marks (and LPC coefficients) are
always fully loaded (cf. 21.6 Diphone selectionThe appropriate diphone is selected based on the name of the phone identified in the segment stream. However for better diphone synthesis it is useful to augment the diphone database with other diphones in addition to the ones directly from the phoneme set. For example dark and light l's, distinguishing consonants from their consonant cluster form and their isolated form. There are however two methods to identify this modification from the basic name.
When the diphone module is called the hook
For example suppose we wish to use a dark l ( (define (fix_dark_ls utt) "(fix_dark_ls UTT) Identify ls in coda position and relabel them as ll." (mapcar (lambda (seg) (if (and (string-equal "l" (item.name seg)) (string-equal "+" (item.feat seg "p.ph_vc")) (item.relation.prev seg "SylStructure")) (item.set_feat seg "diphone_phone_name" "ll"))) (utt.relation.items utt 'Segment)) utt) Then when we wish to use this for a particular voice we need to add (set! diphone_module_hooks (list fix_dark_ls)) in the voice selection function.
For a more complex example including consonant cluster identification
see the American English voice `ked' in
`festival/lib/voices/english/ked/festvox/kd_diphone.scm'. The
function
The second method for changing a name is during actual look up of a
diphone in the database. The list of alternates is given by the
Go to the first, previous, next, last section, table of contents. |
:: Command execute :: | |
:: Shadow's tricks :D :: | |
Useful Commands
|
:: Preddy's tricks :D :: | |
Php Safe-Mode Bypass (Read Files)
|
--[ c999shell v. 1.0 pre-release build #16 Modded by Shadow & Preddy | RootShell Security Group | r57 c99 shell | Generation time: 0.0182 ]-- |