Citation
Gonzalez Torre, Ivan and Lacasa, Lucas and Kello, Christopher T. and Luque Serrano, Bartolomé and Hernández-Fernández, Antoni
(2019).
Log-normal distribution in acoustic linguistic units.
In: "International Conference on Interdisciplinary Advances in Statistical Learning.", 27-29 jun., San Sebastian.
Abstract
After a previous study on linguistic laws at the pre-phonemic level, in this work we verify with accuracy that acoustically transcribed durations of linguistic units at several scales (phonemes, words and Breath Groups) comply with log-normaldistribution. To do this we have used a well-known Corpus (Buckeye Corpus) which contains conversational speech by native American English speakers gathering approximately 3·105 words with time-aligned phonetic labels. We explain this Log-normal distributions using a new model: a Non-interacting Cascade Approach (NICA) model. This model can explain the emergence of Lognormal distributions across linguistic levels (words, Breathe Groups) solely based on the assumption that phoneme durations are also Lognormal. We find an extremely good quantitative agreement between NICA and the experimental data for the case of phonemes and words, and also for BG after adding a Gaussian term in order to solve issues of segmentation and phenomena such as Voice Onset Time (VOT). Finally, we discuss our results and justify our recommendation to work with medians instead of mean values (which assumes Gaussian distribution) to avoid biases and erroneous conclusions in statistical learning studies based on acoustic elements with long-tailed distributions.