Log-normal distribution in acoustic linguistic units

Gonzalez Torre, Ivan and Lacasa, Lucas and Kello, Christopher T. and Luque Serrano, Bartolomé and Hernández-Fernández, Antoni (2019). Log-normal distribution in acoustic linguistic units. In: "International Conference on Interdisciplinary Advances in Statistical Learning.", 27-29 jun., San Sebastian.

Description

Title: Log-normal distribution in acoustic linguistic units
Author/s:
  • Gonzalez Torre, Ivan
  • Lacasa, Lucas
  • Kello, Christopher T.
  • Luque Serrano, Bartolomé
  • Hernández-Fernández, Antoni
Item Type: Presentation at Congress or Conference (Poster)
Event Title: International Conference on Interdisciplinary Advances in Statistical Learning.
Event Dates: 27-29 jun.
Event Location: San Sebastian
Title of Book: International Conference on Interdisciplinary Advances in Statistical Learning
Date: June 2019
Subjects:
Faculty: E.T.S. de Ingeniería Aeronáutica y del Espacio (UPM)
Department: Matemática Aplicada a la Ingeniería Aeroespacial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

After a previous study on linguistic laws at the pre-phonemic level, in this work we verify with accuracy that acoustically transcribed durations of linguistic units at several scales (phonemes, words and Breath Groups) comply with log-normaldistribution. To do this we have used a well-known Corpus (Buckeye Corpus) which contains conversational speech by native American English speakers gathering approximately 3·105 words with time-aligned phonetic labels. We explain this Log-normal distributions using a new model: a Non-interacting Cascade Approach (NICA) model. This model can explain the emergence of Lognormal distributions across linguistic levels (words, Breathe Groups) solely based on the assumption that phoneme durations are also Lognormal. We find an extremely good quantitative agreement between NICA and the experimental data for the case of phonemes and words, and also for BG after adding a Gaussian term in order to solve issues of segmentation and phenomena such as Voice Onset Time (VOT). Finally, we discuss our results and justify our recommendation to work with medians instead of mean values (which assumes Gaussian distribution) to avoid biases and erroneous conclusions in statistical learning studies based on acoustic elements with long-tailed distributions.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainFIS2017-84151-PUnspecifiedUnspecifiedUnspecified
Government of SpainTIN2017-89244-RUnspecifiedUnspecifiedUnspecified

More information

Item ID: 67881
DC Identifier: https://oa.upm.es/67881/
OAI Identifier: oai:oa.upm.es:67881
Deposited by: Memoria Investigacion
Deposited on: 19 Nov 2021 12:05
Last Modified: 19 Nov 2021 12:05
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM