Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (723kB) |
ORCID: https://orcid.org/0000-0002-9125-6225, Conde Díaz, Javier
ORCID: https://orcid.org/0000-0002-5304-0626, Merino Gómez, Elena
ORCID: https://orcid.org/0000-0003-4129-4626, Bermúdez Margaretto, Beatriz, Hernández Gutiérrez, José Alberto
ORCID: https://orcid.org/0000-0002-9551-4308, Reviriego Vasallo, Pedro
ORCID: https://orcid.org/0000-0003-2540-5234 and Brysbaert, Marc
ORCID: https://orcid.org/0000-0002-3645-3189
(2024).
Establishing vocabulary tests as a benchmark for evaluating large language models.
"PLOS ONE", v. 19
(n. 12);
pp. 1-17.
https://doi.org/10.1371/journal.pone.0308259.
| Título: | Establishing vocabulary tests as a benchmark for evaluating large language models |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | PLOS ONE |
| Fecha: | Diciembre 2024 |
| Volumen: | 19 |
| Número: | 12 |
| Materias: | |
| Palabras Clave Informales: | AI, LLMs, Evaluation |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería de Sistemas Telemáticos |
| Grupo Investigación UPM: | Internet de Nueva Generación |
| Licencias Creative Commons: | Reconocimiento |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (723kB) |
Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect the fundamental linguistic aspects of language understanding. In this paper, we advocate for the revival of vocabulary tests as a valuable tool for assessing LLM performance. We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge. These findings shed light on the intricacies of LLM word representations, their learning mechanisms, and performance variations across models and languages. Moreover, the ability to automatically generate and perform vocabulary tests offers new opportunities to expand the approach and provide a more complete picture of LLMs’ language skills.
| ID de Registro: | 85330 |
|---|---|
| Identificador DC: | https://oa.upm.es/85330/ |
| Identificador OAI: | oai:oa.upm.es:85330 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10333324 |
| Identificador DOI: | 10.1371/journal.pone.0308259 |
| URL Oficial: | https://journals.plos.org/plosone/article?id=10.13... |
| Depositado por: | Javier Conde Díaz |
| Depositado el: | 15 Dic 2024 18:37 |
| Ultima Modificación: | 15 Oct 2025 01:01 |
Publicar en el Archivo Digital desde el Portal Científico