Lattice Based Mispronunciation Detection For The Assessment Of The Childhood Apraxia Of Speech

Mostafa Ali Shahin; Beena Ahmed; Kirrie Ballard

doi:10.5339/qfarc.2014.HBPP0441

oa Lattice Based Mispronunciation Detection For The Assessment Of The Childhood Apraxia Of Speech
المؤلفون: Mostafa Ali Shahin¹, Beena Ahmed¹ and Kirrie Ballard¹
عرض الانتماءات إخفاء الانتسابات المهنية

¹ Texas A&m University At Qatar, Doha, Qatar
الناشر: Hamad bin Khalifa University Press (HBKU Press)
المصدر: Qatar Foundation Annual Research Conference Proceedings, Qatar Foundation Annual Research Conference Proceedings Volume 2014 Issue 1, نوفمبر ٢٠١٤, المجلد 2014, HBPP0441
DOI https://doi.org/10.5339/qfarc.2014.HBPP0441

Background and Objectives Childhood Apraxia of Speech (CAS) is a speech disorder characterized by articulation errors, i.e. the replacement of certain phonemes with alternatives. In previous work we proposed a simple method to evaluate the child's speech as correct or incorrect with an overall accuracy of 88.2%. In this work we present an enhanced method that increases the accuracy of the correct/incorrect evaluation to 92.7%, in addition to identifying the incorrect phonemes with an accuracy of 60%. Method The goal of the mispronunciation detection system is to compare each phoneme in the child's production to their given prompt and identify mispronunciations. Figure 1 shows the block diagram of the system, which uses a search lattice for each prompt in the child's speech therapy treatment protocol to identify errors made. Each prompt is transcribed as per the corresponding phoneme sequence using the CMU pronunciation dictionary and then passed to the lattice generator along with the expected mispronunciation rules to generate the search lattice. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech signal with delta and acceleration to produce a 39- dimensional feature vector per frame. The extracted features are then fed to the speech recognizer along with the created lattice and the Hidden Markov Model (HMM) acoustic models to generate a sequence of phones from the child's utterance. An evaluation report is then generated by matching the recognized phoneme sequence with the correct phoneme sequence and specifying the errors made by the child. We use a search lattice with a specific number of alternative pronunciations for each phoneme; this limits the decoder search, making it faster and more accurate. Each phoneme in the correct phoneme sequence is compared with expected mispronunciation rules developed by a therapist after an assessment of 20 children with CAS; if a rule is matched, the pronunciation variants are added as alternative arcs to the current phoneme sequence. The mispronunciation rules depend on the type of the phoneme (consonant/vowel), the phoneme position in the word (Initial/Medial/Final) and the context of the phoneme. The lattice is then created using the matched rules as shown in Figure 2, where the garbage model absorbs any mispronounced phoneme not in the lattice. PA and PG are insertion penalties added to the alternative and the garbage arcs respectively so the decoder does not align the speech to the alternative error phonemes or the garbage node unless it is confident enough. Results The system overall system accuracy is 92.7% where the Correct Acceptance (CA) is 97.6% and the Correct Rejection (CR) is 83.1%. The system also detects phoneme errors made by the child with 60% accuracy. Conclusion In this paper we proposed a mispronunciation detection tool that can detect phonemes mispronounced by children with CAS and specify the errors made.

جارٍ تحميل قياسات المقالة...

/content/papers/10.5339/qfarc.2014.HBPP0441

٢٠١٤-١١-١٨

٢٠٢٥-١٢-١٣

القياسات

Full text loading...

/content/papers/10.5339/qfarc.2014.HBPP0441

الأكثر اقتباسًا لهذا الشهر Most Cited RSS feed

- Barriers and facilitators influencing the physical activity of Arabic adults: A literature review
  
  المؤلفون: Kathleen Benjamin and Tam Truong Donnelly
- Multiple organ dysfunction syndrome: Contemporary insights on the clinicopathological spectrum
  
  المؤلفون: Mohammad Asim, Farhana Amin and Ayman El-Menyar
- Prevalence of Multi-Antibiotic Resistant Escherichia coli and Klebsiella species obtained from a Tertiary Medical Institution in Oyo State, Nigeria
  
  المؤلفون: AA Ayandele, EK Oladipo, O Oyebisi and MO Kaka
- Effect of green marketing on consumer purchase behavior
  
  المؤلفون: Narges Delafrooz, Mohammad Taleghani and Bahareh Nouri
- Evolution of emergency medical services in Saudi Arabia
  
  المؤلفون: Talal AlShammari, Paul Jennings and Brett Williams
مزيد أقل

oa Lattice Based Mispronunciation Detection For The Assessment Of The Childhood Apraxia Of Speech

القياسات

Most Read This Month

الأكثر اقتباسًا لهذا الشهر Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Multiple organ dysfunction syndrome: Contemporary insights on the clinicopathological spectrum

Prevalence of Multi-Antibiotic Resistant Escherichia coli and Klebsiella species obtained from a Tertiary Medical Institution in Oyo State, Nigeria

Effect of green marketing on consumer purchase behavior

Evolution of emergency medical services in Saudi Arabia