Tuesday, August 21, 2012

Ronanki: GSoC 2012 Pronunciation Evaluation Week 6


I uploaded all my codes (except few ongoing) here at 
 http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/speecheval/ronanki/scripts/ . Please follow README files in each folder for detailed instructions on how to use them. 


This week, I have concentrated on new features for speech recognition. I read a paper on Power-Normalized Cepstral Coefficients [1] which are more robust towards speech recognition and a few papers on phonological features [2],[3]. I hope to investigate mapping the acoustic speech features of each phoneme derived from machine phonetic transcription to phonological features. Using this mapping, mispronunciations at phone level can be identified using phonological features along with acoustic pronunciation scores and edit distances. I got some mapping here at http://talknicer.net/~ronanki/phonological_features/ based on those papers.

Ongoing tasks:
1. In random phrase scoring method, another column is added to store the position of each phone with respect to word (begin/middle/end) such that each phone will have three statistics 
http://talknicer.net/~ronanki/phrase_data/all_phrases_stats_position
2. Standard word scores are derived along with phoneme standard (acoustic + duration) scores in the current forced-alignment.
3. Linking edit-distance algorithm with pronunciation evaluation website
4. Complete a full-pledged website at http://talknicer.net/~ronanki/test/ with all test cases (junk speech, silence, misread etc.,) before mid-evaluation and publicize the system so that it can be tested by large number of users. 

References:

[1] Chanwoo Kim and Richard M.Stern, "Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition", ICASSP 2012.

[2] Katrin Kirchhoff a, Gernot A. Fink b, Gerhard Sagerer b, "Combining acoustic and articulatory feature information for robust speech recognition", Speech Communications 37 (2002) 303–319.

[3]  S. King and P. Taylor, “Detection of phonological features in continuous speech using neural networks,” Computer Speech and Language, vol. 14, no. 4, pp. 333–353, 2000.

No comments:

Post a Comment