GPT for medical entity recognition in Spanish

García Barragán, Álvaro ORCID: https://orcid.org/0009-0007-6377-8150, González Calatayud, Alberto ORCID: https://orcid.org/0009-0006-6439-4581, Solarte Pabón, Oswaldo ORCID: https://orcid.org/0000-0003-0315-2838, Provencio Pulla, Mariano ORCID: https://orcid.org/0000-0001-6315-7919, Menasalvas Ruiz, Ernestina ORCID: https://orcid.org/0000-0002-5615-6798 and Robles Forcada, Víctor ORCID: https://orcid.org/0000-0003-3937-2269 (2024). GPT for medical entity recognition in Spanish. "Multimedia Tools and Applications" ; ISSN 1380-7501. https://doi.org/10.1007/s11042-024-19209-5.

Descripción

Título: GPT for medical entity recognition in Spanish
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Multimedia Tools and Applications
Fecha: 2024
ISSN: 1380-7501
Materias:
ODS:
Palabras Clave Informales: BERT; Breast Cancer; EHR; GPT; Information Extraction; LLM; NER
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Lenguajes y Sistemas Informáticos e Ingeniería del Software
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of 10209285.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

In recent years, there has been a remarkable surge in the development of Natural Language Processing (NLP) models, particularly in the realm of Named Entity Recognition (NER). Models such as BERT have demonstrated exceptional performance, leveraging annotated corpora for accurate entity identification. However, the question arises: Can newer Large Language Models (LLMs) like GPT be utilized without the need for extensive annotation, thereby enabling direct entity extraction? In this study, we explore this issue, comparing the efficacy of fine-tuning techniques with prompting methods to elucidate the potential of GPT in the identification of medical entities within Spanish electronic health records (EHR).This study utilized a dataset of Spanish EHRs related to breast cancer and implemented both a traditional NER method using BERT, and a contemporary approach that combines few shot learning and integration of external knowledge, driven by LLMs using GPT, to structure the data. The analysis involved a comprehensive pipeline that included these methods. Key performance metrics, such as precision, recall, and F-score, were used to evaluate the effectiveness of each method. This comparative approach aimed to highlight the strength sand limitations of each method in the context of structuring Spanish EHRs efficiently and accurately. The comparative analysis undertaken in this article demonstrates that both the traditional BERT-based NER method and the few-shot LLM-driven approach, augmented with external knowledge, provide comparable levels of precision in metrics such as precision, recall, and F score when applied to Spanish EHR. Contrary to expectations, the LLM-driven approach, which necessitates minimal data annotation, performs on par with BERT’s capability to discern complex medical terminologies and contextual nuances within the EHRs. The results of this study highlight a notable advance in the field of NER for Spanish EHRs, with the few shot approach driven by LLM, enhanced by external knowledge, slightly edging out the traditional BERT-based method in overall effectiveness. GPT’s superiority in F-score and its minimal reliance on extensive data annotation underscore its potential in medical data processing.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
Horizonte 2020
875160
CLARIFY project
Sin especificar
Sin especificar

Más información

ID de Registro: 88005
Identificador DC: https://oa.upm.es/88005/
Identificador OAI: oai:oa.upm.es:88005
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10209285
Identificador DOI: 10.1007/s11042-024-19209-5
URL Oficial: https://link.springer.com/article/10.1007/s11042-0...
Depositado por: iMarina Portal Científico
Depositado el: 25 Feb 2025 09:15
Ultima Modificación: 25 Feb 2025 09:38