Saturday, November 17, 2012

UPDATED: Status update

I wish that this were a progress report instead of a status update, but so far we haven't raised enough to begin data collection with Mechanical Turk.  We have had a paper accepted for publication and we are trying to get in to the Google Compute Engine to save expenses for the huge Amazon bill for asking people who claim to have good pronunciation and reading skill to record exemplars. The problem is that the number of such exemplars needs to be relatively large. For those of you familiar with the TalkNicer demo, this is the "exemplar sufficiency index" and it needs to meet a certain threshold for at least 5,000 words of instructional material before I feel comfortable committing to an expensive data collection effort.

So in summary, please donate more, or if you have already donated, please ask multiple people to at least match your donation. It will be worth it.

Update: How much more do we need? About $4,000 based on the preliminary per-phoneme exemplar sufficiency index including English homographs and Mechanical Turk performance expectation estimates. Also updated: cmusphinx.sourceforge.net/wiki/pronunciation_evaluation

Further update: I am very sorry about delaying Troy's posts here (it was due to the WebRTC and related questions) but they have been available at e.g. cmusphinx.sourceforge.net/2012/08/gsoc-2012-pronunciation-evaluation-troy-project-conclusions