Lattice Based Mispronunciation Detection For The Assessment Of The Childhood Apraxia Of Speech

Mostafa Ali Shahin; Beena Ahmed; Kirrie Ballard

doi:10.5339/qfarc.2014.HBPP0441

Abstract

Background and Objectives Childhood Apraxia of Speech (CAS) is a speech disorder characterized by articulation errors, i.e. the replacement of certain phonemes with alternatives. In previous work we proposed a simple method to evaluate the child's speech as correct or incorrect with an overall accuracy of 88.2%. In this work we present an enhanced method that increases the accuracy of the correct/incorrect evaluation to 92.7%, in addition to identifying the incorrect phonemes with an accuracy of 60%. Method The goal of the mispronunciation detection system is to compare each phoneme in the child's production to their given prompt and identify mispronunciations. Figure 1 shows the block diagram of the system, which uses a search lattice for each prompt in the child's speech therapy treatment protocol to identify errors made. Each prompt is transcribed as per the corresponding phoneme sequence using the CMU pronunciation dictionary and then passed to the lattice generator along with the expected mispronunciation rules to generate the search lattice. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech signal with delta and acceleration to produce a 39- dimensional feature vector per frame. The extracted features are then fed to the speech recognizer along with the created lattice and the Hidden Markov Model (HMM) acoustic models to generate a sequence of phones from the child's utterance. An evaluation report is then generated by matching the recognized phoneme sequence with the correct phoneme sequence and specifying the errors made by the child. We use a search lattice with a specific number of alternative pronunciations for each phoneme; this limits the decoder search, making it faster and more accurate. Each phoneme in the correct phoneme sequence is compared with expected mispronunciation rules developed by a therapist after an assessment of 20 children with CAS; if a rule is matched, the pronunciation variants are added as alternative arcs to the current phoneme sequence. The mispronunciation rules depend on the type of the phoneme (consonant/vowel), the phoneme position in the word (Initial/Medial/Final) and the context of the phoneme. The lattice is then created using the matched rules as shown in Figure 2, where the garbage model absorbs any mispronounced phoneme not in the lattice. PA and PG are insertion penalties added to the alternative and the garbage arcs respectively so the decoder does not align the speech to the alternative error phonemes or the garbage node unless it is confident enough. Results The system overall system accuracy is 92.7% where the Correct Acceptance (CA) is 97.6% and the Correct Rejection (CR) is 83.1%. The system also detects phoneme errors made by the child with 60% accuracy. Conclusion In this paper we proposed a mispronunciation detection tool that can detect phonemes mispronounced by children with CAS and specify the errors made.

oa Lattice Based Mispronunciation Detection For The Assessment Of The Childhood Apraxia Of Speech

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Prevalence of Multi-Antibiotic Resistant Escherichia coli and Klebsiella species obtained from a Tertiary Medical Institution in Oyo State, Nigeria

Effect of green marketing on consumer purchase behavior

Osteoporosis: An under-recognized public health problem

Evolution of emergency medical services in Saudi Arabia