Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words

Sendín, Eneko, Conde Díaz, Javier ORCID: https://orcid.org/0000-0002-5304-0626, Reviriego Vasallo, Pedro ORCID: https://orcid.org/0000-0003-2540-5234, Haro Rodríguez, Juan ORCID: https://orcid.org/0000-0002-3456-4731, Ferré Romeu, Pilar ORCID: https://orcid.org/0000-0002-3192-0040, Hinojosa Poveda, José Antonio ORCID: https://orcid.org/0000-0002-7482-9503 and Brysbaert, Marc ORCID: https://orcid.org/0000-0002-3645-3189 (2025). Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words. "Psicologica", v. 46 (n. 2); https://doi.org/10.20350/digitalCSIC/17563.

Descripción

Título: Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Psicologica
Fecha: 2025
Volumen: 46
Número: 2
Materias:
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería de Sistemas Telemáticos
Grupo Investigación UPM: Internet de Nueva Generación
Licencias Creative Commons: Reconocimiento

Texto completo

[thumbnail of Sendin_2025_FINAL.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (807kB)

Resumen

This study examined the ability of a large language model, GPT-4o mini, to predict age of acquisition (AoA) for Spanish words, as compared to human ratings. We found a strong correlation (ρ=.75) between the model's AoA estimates and mean human ratings. This correlation was lower than the level of agreement observed between individual human raters (ρ=.85), but we found that finetuning the model on a relatively small dataset of 2000 human AoA ratings has the potential to enhance the model's performance to a level comparable to human consensus. Consistent with theoretical expectations, our analyses confirmed that AoA estimates are meaningful only for words within an individual's vocabulary. Finally, we present a novel dataset of AoA estimates for 28,453 Spanish words likely known by adult speakers.

Más información

ID de Registro: 91144
Identificador DC: https://oa.upm.es/91144/
Identificador OAI: oai:oa.upm.es:91144
Identificador DOI: 10.20350/digitalCSIC/17563
URL Oficial: https://psicologicajournal.com/combining-the-power...
Depositado por: Javier Conde Díaz
Depositado el: 28 Sep 2025 07:36
Ultima Modificación: 28 Sep 2025 07:36