Software: Apache/2.2.3 (CentOS). PHP/5.1.6 uname -a: Linux mx-ll-110-164-51-230.static.3bb.co.th 2.6.18-194.el5PAE #1 SMP Fri Apr 2 15:37:44 uid=48(apache) gid=48(apache) groups=48(apache) Safe-mode: OFF (not secure) /usr/share/doc/festival-1.95/ drwxr-xr-x |
Viewing file: festival_13.html (43.06 KB) -rw-r--r-- Select action/file-type: (+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |
13 LexiconsA Lexicon in Festival is a subsystem that provides pronunciations for words. It can consist of three distinct parts: an addenda, typically short consisting of hand added words; a compiled lexicon, typically large (10,000s of words) which sits on disk somewhere; and a method for dealing with words not in either list. 13.1 Lexical entriesLexical entries consist of three basic parts, a head word, a part of speech and a pronunciation. The headword is what you might normally think of as a word e.g. `walk', `chairs' etc. but it might be any token. The part-of-speech field currently consist of a simple atom (or nil if none is specified). Of course there are many part of speech tag sets and whatever you mark in your lexicon must be compatible with the subsystems that use that information. You can optionally set a part of speech tag mapping for each lexicon. The value should be a reverse assoc-list of the following form (lex.set.pos.map '((( punc fpunc) punc) (( nn nnp nns nnps ) n))) All part of speech tags not appearing in the left hand side of a pos map are left unchanged. The third field contains the actual pronunciation of the word. This is an arbitrary Lisp S-expression. In many of the lexicons distributed with Festival this entry has internal format, identifying syllable structure, stress markigns and of course the phones themselves. In some of our other lexicons we simply list the phones with stress marking on each vowel. Some typical example entries are ( "walkers" n ((( w oo ) 1) (( k @ z ) 0)) ) ( "present" v ((( p r e ) 0) (( z @ n t ) 1)) ) ( "monument" n ((( m o ) 1) (( n y u ) 0) (( m @ n t ) 0)) ) Note you may have two entries with the same headword, but different part of speech fields allow differentiation. For example ( "lives" n ((( l ai v z ) 1)) ) ( "lives" v ((( l i v z ) 1)) ) See section 13.3 Lookup process for a description of how multiple entries with the same headword are used during lookup. By current conventions, single syllable function words should have no stress marking, while single syllable content words should be stressed. NOTE: the POS field may change in future to contain more complex formats. The same lexicon mechanism (but different lexicon) is used for holding part of speech tag distributions for the POS prediction module. 13.2 Defining lexiconsAs stated above, lexicons consist of three basic parts (compiled form, addenda and unknown word method) plus some other declarations. Each lexicon in the system has a name which allows different lexicons to be selected from efficiently when switching between voices during synthesis. The basic steps involved in a lexicon definition are as follows. First a new lexicon must be created with a new name (lex.create "cstrlex") A phone set must be declared for the lexicon, to allow both checks on the entries themselves and to allow phone mapping between different phone sets used in the system (lex.set.phoneset "mrpa") The phone set must be already declared in the system. A compiled lexicon, the construction of which is described below, may be optionally specified (lex.set.compile.file "/projects/festival/lib/dicts/cstrlex.out") The method for dealing with unknown words, See section 13.4 Letter to sound rules, may be set (lex.set.lts.method 'lts_rules) (lex.set.lts.ruleset 'nrl) In this case we are specifying the use of a set of letter to sound rules originally developed by the U.S. Naval Research Laboratories. The default method is to give an error if a word is not found in the addenda or compiled lexicon. (This and other options are discussed more fully below.) Finally addenda items may be added for words that are known to be common, but not in the lexicon and cannot reasonably be analysed by the letter to sound rules. (lex.add.entry '( "awb" n ((( ei ) 1) ((d uh) 1) ((b @ l) 0) ((y uu) 0) ((b ii) 1)))) (lex.add.entry '( "cstr" n ((( s ii ) 1) (( e s ) 1) (( t ii ) 1) (( aa ) 1)) )) (lex.add.entry '( "Edinburgh" n ((( e m ) 1) (( b r @ ) 0))) ))
Using
For large lists, compiled lexicons are best. The function
Compilation can take some time and may require lots of memory, as all entries are loaded in, checked and then sorted before being written out again. During compilation if some entry is malformed the reading process halts with a not so useful message. Note that if any of your entries include quote or double quotes the entries will probably be misparsed and cause such a weird error. In such cases try setting (debug_output t) before compilation. This will print out each entry as it is read in which should help to narrow down where the error is. 13.3 Lookup process
When looking up a word, either through the C++ interface, or
Lisp interface, a word is identified by its headword and part of
speech. If no part of speech is specified,
The lexicon look up process first checks the addenda, if there is
a full match (head word plus part of speech) it is returned. If
there is an addenda entry whose head word matches and whose part
of speech is
If no match is found in the addenda, the compiled lexicon, if present,
is checked. Again a match is when both head word and part of speech tag
match, or either the word being searched for has a part of speech
Finally if the word is not found in the compiled lexicon it is passed to whatever method is defined for unknown words. This is most likely a letter to sound module. See section 13.4 Letter to sound rules. Optional pre- and post-lookup hooks can be specified for a lexicon. As a single (or list of) Lisp functions. The pre-hooks will be called with two arguments (word and features) and should return a pair (word and features). The post-hooks will be given a lexical entry and should return a lexical entry. The pre- and post-hooks do nothing by default. Compiled lexicons may be created from lists of lexical entries. A compiled lexicon is much more efficient for look up than the addenda. Compiled lexicons use a binary search method while the addenda is searched linearly. Also it would take a prohibitively long time to load in a typical full lexicon as an addenda. If you have more than a few hundred entries in your addenda you should seriously consider adding them to your compiled lexicon. Because many publicly available lexicons do not have syllable markings for entries the compilation method supports automatic syllabification. Thus for lexicon entries for compilation, two forms for the pronunciation field are supported: the standard full syllabified and stressed form and a simpler linear form found in at least the BEEP and CMU lexicons. If the pronunciation field is a flat atomic list it is assumed syllabification is required. Syllabification is done by finding the minimum sonorant position between vowels. It is not guaranteed to be accurate but does give a solution that is sufficient for many purposes. A little work would probably improve this significantly. Of course syllabification requires the entry's phones to be in the current phone set. The sonorant values are calculated from the vc, ctype, and cvox features for the current phoneset. See `src/arch/festival/Phone.cc:ph_sonority()' for actual definition. Additionally in this flat structure vowels (atoms starting with a, e, i, o or u) may have 1 2 or 0 appended marking stress. This is again following the form found in the BEEP and CMU lexicons. Some example entries in the flat form (taken from BEEP) are ("table" nil (t ei1 b l)) ("suspicious" nil (s @ s p i1 sh @ s))
Also if syllabification is required there is an opportunity to run a set
of "letter-to-sound"-rules on the input (actually an arbitrary re-write
rule system). If the variable
A list of all matching entries in the addenda and the compiled lexicon
may be found by the function
You can optionall intercept the words as they are lookup up, and after
they have been found through For example suppose we were trying to use a Scottish English voice with the US English (cmu) lexicon. A number of entgries will be inapporpriate but we can redefine some entries thus (set! cmu_us_awb::lexicon_addenda '( ("edinburgh" n (((eh d) 1) ((ax n) 0) ((b r ax) 0))) ("poem" n (((p ow) 1) ((y ax m) 0))) ("usual" n (((y uw) 1) ((zh ax l) 0))) ("air" n (((ey r) 1))) ("hair" n (((hh ey r) 1))) ("fair" n (((f ey r) 1))) ("chair" n (((ch ey r) 1))))) We can the define a function that chesk to see if the word looked up is in the speaker specific exception list and use that entry instead. (define (cmu_us_awb::cmu_lookup_post entry) "(cmu_us_awb::cmu_lookup_post entry) Speaker specific lexicon addeda." (let ((ne (assoc_string (car entry) cmu_us_awb::lexicon_addenda))) (if ne ne entry))) And then for the particualr voice set up we need to add both a selection part and a reset part. Thuis following the FestVox vonventions for voice set up. (define (cmu_us_awb::select_lexicon) ... (lex.select "cmu") ;; Get old var for reset and to append our function to is (set! cmu_us_awb::old_cmu_post_hooks (lex.set.post_hooks nil)) (lex.set.post_hooks (append cmu_us_awb::old_cmu_post_hooks (list cmu_us_awb::cmu_lookup_post))) ... ) ... (define (cmu_us_awb::reset_lexicon) ... ;; reset CMU's post_hooks back to original (lex.set.post_hooks cmu_us_awb::old_cmu_post_hooks) ... ) The above isn't the most efficient way as the word is looked up first then it is checked with the speaker specific list.
The 13.4 Letter to sound rulesEach lexicon may define what action to take when a word cannot be found in the addenda or the compiled lexicon. There are a number of options which will hopefully be added to as more general letter to sound rule systems are added. The method is set by the command (lex.set.lts.method METHOD) Where METHOD can be any of the following
The basic letter to sound rule system is very simple but is powerful enough to build reasonably complex letter to sound rules. Although we've found trained LTS rules better than hand written ones (for complex languages) where no data is available and rules must be hand written the following rule formalism is much easier to use than that generated by the LTS training system (described in the next section). The basic form of a rule is as follows ( LEFTCONTEXT [ ITEMS ] RIGHTCONTEXT = NEWITEMS ) This interpretation is that if ITEMS appear in the specified right and left context then the output string is to contain NEWITEMS. Any of LEFTCONTEXT, RIGHTCONTEXT or NEWITEMS may be empty. Note that NEWITEMS is written to a different "tape" and hence cannot feed further rules (within this ruleset). An example is ( # [ c h ] C = k )
The special character The symbols in the rules are treated as set names if they are declared as such or as symbols in the input/output alphabets. The symbols may be more than one character long and the names are case sensitive. The rules are tried in order until one matches the first (or more) symbol of the tape. The rule is applied adding the right hand side to the output tape. The rules are again applied from the start of the list of rules. The function used to apply a set of rules if given an atom will explode it into a list of single characters, while if given a list will use it as is. This reflects the common usage of wishing to re-write the individual letters in a word to phonemes but without excluding the possibility of using the system for more complex manipulations, such as multi-pass LTS systems and phoneme conversion. From lisp there are three basic access functions, there are corresponding functions in the C/C++ domain.
The letter to sound rule system may be used directly from Lisp and can easily be used to do relatively complex operations for analyzing words without requiring modification of the C/C++ system. For example the Welsh letter to sound rule system consists or three rule sets, first to explicitly identify epenthesis, then identify stressed vowels, and finally rewrite this augmented letter string to phonemes. This is achieved by the following function (define (welsh_lts word features) (let (epen str wel) (set! epen (lts.apply (downcase word) 'newepen)) (set! str (lts.apply epen 'newwelstr)) (set! wel (lts.apply str 'newwel)) (list word nil (lex.syllabify.phstress wel))))
The LTS method for the Welsh lexicon is set to 13.5 Building letter to sound rulesAs writing letter to sound rules by hand is hard and very time consuming, an alternative method is also available where a latter to sound system may be built from a lexicon of the language. This technique has successfully been used from English (British and American), French and German. The difficulty and appropriateness of using letter to sound rules is very language dependent, The following outlines the processes involved in building a letter to sound model for a language given a large lexicon of pronunciations. This technique is likely to work for most European languages (including Russian) but doesn't seem particularly suitable for very language alphabet languages like Japanese and Chinese. The process described here is not (yet) fully automatic but the hand intervention required is small and may easily be done even by people with only a very little knowledge of the language being dealt with. The process involves the following steps
All except the first two stages of this are fully automatic. Before building a model its wise to think a little about what you want it to do. Ideally the model is an auxiluary to the lexicon so only words not found in the lexicon will require use of the letter to sound rules. Thus only unusual forms are likely to require the rules. More precisely the most common words, often having the most non-standard pronunciations, should probably be explicitly listed always. It is possible to reduce the size of the lexicon (sometimes drastically) by removing all entries that the training LTS model correctly predicts. Before starting it is wise to consider removing some entries from the lexicon before training, I typically will remove words under 4 letters and if part of speech information is available I remove all function words, ideally only training from nouns verbs and adjectives as these are the most likely forms to be unknown in text. It is useful to have morphologically inflected and derived forms in the training set as it is often such variant forms that not found in the lexicon even though their root morpheme is. Note that in many forms of text, proper names are the most common form of unknown word and even the technique presented here may not adequately cater for that form of unknown words (especially if they unknown words are non-native names). This is all stating that this may or may not be appropriate for your task but the rules generated by this learning process have in the examples we've done been much better than what we could produce by hand writing rules of the form described in the previous section. First preprocess the lexicon into a file of lexical entries to be used for training, removing functions words and changing the head words to all lower case (may be language dependent). The entries should be of the form used for input for Festival's lexicon compilation. Specifical the pronunciations should be simple lists of phones (no syllabification). Depending on the language, you may wish to remve the stressing--for examples here we have though later tests suggest that we should keep it in even for English. Thus the training set should look something like ("table" nil (t ei b l)) ("suspicious" nil (s @ s p i sh @ s)) It is best to split the data into a training set and a test set if you wish to know how well your training has worked. In our tests we remove every tenth entry and put it in a test set. Note this will mean our test results are probably better than if we removed say the last ten in every hundred. The second stage is to define the set of allowable letter to phone mappings irrespective of context. This can sometimes be initially done by hand then checked against the training set. Initially constract a file of the form (require 'lts_build) (set! allowables '((a _epsilon_) (b _epsilon_) (c _epsilon_) ... (y _epsilon_) (z _epsilon_) (# #)))
All letters that appear in the alphabet should (at least) map to
To incrementally add to this allowable list run festival as festival allowables.scm and at the prompt type festival> (cummulate-pairs "oald.train") with your train file. This will print out each lexical entry that couldn't be aligned with the current set of allowables. At the start this will be every entry. Looking at these entries add to the allowables to make alignment work. For example if the following word fails ("abate" nil (ah b ey t))
Add
It is worth while being consistent on defining your set of allowables.
(At least) two mappings are possible for the letter sequence
It may also be the case that some letters give rise to more than one
phone. For example the letter The allowables for OALD end up being (set! allowables ' ((a _epsilon_ ei aa a e@ @ oo au o i ou ai uh e) (b _epsilon_ b ) (c _epsilon_ k s ch sh @-k s t-s) (d _epsilon_ d dh t jh) (e _epsilon_ @ ii e e@ i @@ i@ uu y-uu ou ei aa oi y y-u@ o) (f _epsilon_ f v ) (g _epsilon_ g jh zh th f ng k t) (h _epsilon_ h @ ) (i _epsilon_ i@ i @ ii ai @@ y ai-@ aa a) (j _epsilon_ h zh jh i y ) (k _epsilon_ k ch ) (l _epsilon_ l @-l l-l) (m _epsilon_ m @-m n) (n _epsilon_ n ng n-y ) (o _epsilon_ @ ou o oo uu u au oi i @@ e uh w u@ w-uh y-@) (p _epsilon_ f p v ) (q _epsilon_ k ) (r _epsilon_ r @@ @-r) (s _epsilon_ z s sh zh ) (t _epsilon_ t th sh dh ch d ) (u _epsilon_ uu @ w @@ u uh y-uu u@ y-u@ y-u i y-uh y-@ e) (v _epsilon_ v f ) (w _epsilon_ w uu v f u) (x _epsilon_ k-s g-z sh z k-sh z g-zh ) (y _epsilon_ i ii i@ ai uh y @ ai-@) (z _epsilon_ z t-s s zh ) (# #) )) Note this is an exhaustive list and (deliberately) says nothing about the contexts or frequency that these letter to phone pairs appear. That information will be generated automatically from the training set.
Once the number of failed matches is signficantly low enough
let Next call festival> (save-table "oald-") with the name of your lexicon. This changes the cummulation table into probabilities and saves it. Restart festival loading this new table festival allowables.scm oald-pl-table.scm Now each word can be aligned to an equally-lengthed string of phones, epsilon and multiphones. festival> (aligndata "oald.train" "oald.train.align") Do this also for you test set. This will produce entries like aaronson _epsilon_ aa r ah n s ah n abandon ah b ae n d ah n abate ah b ey t _epsilon_ abbe ae b _epsilon_ iy The next stage is to build features suitable for `wagon' to build models. This is done by festival> (build-feat-file "oald.train.align" "oald.train.feats") Again the same for the test set. Now you need to constructrure a description file for `wagon' for the given data. The can be done using the script `make_wgn_desc' provided with the speech tools Here is an example script for building the models, you will need to modify it for your particualr database but it shows the basic processes for i in a b c d e f g h i j k l m n o p q r s t u v w x y z do # Stop value for wagon STOP=2 echo letter $i STOP $STOP # Find training set for letter $i cat oald.train.feats | awk '{if ($6 == "'$i'") print $0}' >ltsdataTRAIN.$i.feats # split training set to get heldout data for stepwise testing traintest ltsdataTRAIN.$i.feats # Extract test data for letter $i cat oald.test.feats | awk '{if ($6 == "'$i'") print $0}' >ltsdataTEST.$i.feats # run wagon to predict model wagon -data ltsdataTRAIN.$i.feats.train -test ltsdataTRAIN.$i.feats.test \ -stepwise -desc ltsOALD.desc -stop $STOP -output lts.$i.tree # Test the resulting tree against wagon_test -heap 2000000 -data ltsdataTEST.$i.feats -desc ltsOALD.desc \ -tree lts.$i.tree done The script `traintest' splits the given file `X' into `X.train' and `X.test' with every tenth line in `X.test' and the rest in `X.train'. This script can take a significnat amount of time to run, about 6 hours on a Sun Ultra 140. Once the models are created the must be collected together into a single list structure. The trees generated by `wagon' contain fully probability distributions at each leaf, at this time this information can be removed as only the most probable will actually be predicted. This substantially reduces the size of the tress. (merge_models 'oald_lts_rules "oald_lts_rules.scm")
( To test a set of lts models load the saved model and call the following function with the test align file festival oald-table.scm oald_lts_rules.scm festival> (lts_testset "oald.test.align" oald_lts_rules) The result (after showing all the failed ones), will be a table showing the results for each letter, for all letters and for complete words. The failed entries may give some notion of how good or bad the result is, sometimes it will be simple vowel diferences, long versus short, schwa versus full vowel, other times it may be who consonants missing. Remember the ultimate quality of the letter sound rules is how adequate they are at providing acceptable pronunciations rather than how good the numeric score is. For some languages (e.g. English) it is necessary to also find a stree pattern for unknown words. Ultimately for this to work well you need to know the morphological decomposition of the word. At present we provide a CART trained system to predict stress patterns for English. If does get 94.6% correct for an unseen test set but that isn't really very good. Later tests suggest that predicting stressed and unstressed phones directly is actually better for getting whole words correct even though the models do slightly worse on a per phone basis black98.
As the lexicon may be a large part of the system we have also
experimented with removing entries from the lexicon if the letter to
sound rules system (and stree assignment system) can correct predict
them. For OALD this allows us to half the size of the lexicon, it could
possibly allow more if a certain amount of fuzzy acceptance was allowed
(e.g. with schwa). For other languages the gain here can be very
signifcant, for German and French we can reduce the lexicon by over 90%.
The function The technique described in this section and its relative merits with respect to a number of languages/lexicons and tasks is dicussed more fully in black98. 13.6 Lexicon requirementsFor English there are a number of assumptions made about the lexicon which are worthy of explicit mention. If you are basically going to use the existing token rules you should try to include at least the following in any lexicon that is to work with them.
13.7 Available lexiconsCurrently Festival supports a number of different lexicons. They are all defined in the file `lib/lexicons.scm' each with a number of common extra words added to their addendas. They are
All of the above lexicons have some distribution restrictions (though mostly pretty light), but as they are mostly freely available we provide programs that can convert the originals into Festival's format. The MOBY lexicon has recently been released into the public domain and will be converted into our format soon. 13.8 Post-lexical rulesIt is the lexicon's job to produce a pronunciation of a given word. However in most languages the most natural pronunciation of a word cannot be found in isolation from the context in which it is to be spoken. This includes such phenomena as reduction, phrase final devoicing and r-insertion. In Festival this is done by post-lexical rules.
The Although a rule system could be devised for post-lexical sound rules it is unclear what the scope of them should be, so we have left it completely open. Our vowel reduction model uses a CART decision tree to predict which syllables should be reduced, while the "'s" rule is very simple (shown in `festival/lib/postlex.scm').
The
For our English voices we have a lexical entry for "'s" as a
schwa followed by a "z". We use a post lexical rule function called
In the following rule we check each segment to see if it is part of a word labelled "'s", if so we check to see if are we currently looking at the schwa or the z part, and test if modification is required (define (postlex_apos_s_check utt) "(postlex_apos_s_check UTT) Deal with possesive s for English (American and British). Delete schwa of 's if previous is not a fricative or affricative, and change voiced to unvoiced s if previous is not voiced." (mapcar (lambda (seg) (if (string-equal "'s" (item.feat seg "R:SylStructure.parent.parent.name")) (if (string-equal "a" (item.feat seg 'ph_vlng)) (if (and (member_string (item.feat seg 'p.ph_ctype) '(f a)) (not (member_string (item.feat seg "p.ph_cplace") '(d b g)))) t;; don't delete schwa (item.delete seg)) (if (string-equal "-" (item.feat seg "p.ph_cvox")) (item.set_name seg "s")))));; from "z" (utt.relation.items utt 'Segment)) utt) Go to the first, previous, next, last section, table of contents. |
:: Command execute :: | |
:: Shadow's tricks :D :: | |
Useful Commands
|
:: Preddy's tricks :D :: | |
Php Safe-Mode Bypass (Read Files)
|
--[ c999shell v. 1.0 pre-release build #16 Modded by Shadow & Preddy | RootShell Security Group | r57 c99 shell | Generation time: 0.0103 ]-- |