GPT for medical entity recognition in Spanish

García Barragán, Álvaro

, González Calatayud, Alberto

, Solarte Pabón, Oswaldo

, Provencio Pulla, Mariano

, Menasalvas Ruiz, Ernestina

and Robles Forcada, Víctor

(2024). GPT for medical entity recognition in Spanish. "Multimedia Tools and Applications" ; ISSN 1380-7501. https://doi.org/10.1007/s11042-024-19209-5.

Descripción

Título:	GPT for medical entity recognition in Spanish
Autor/es:	García Barragán, Álvaro https://orcid.org/0009-0007-6377-8150 González Calatayud, Alberto https://orcid.org/0009-0006-6439-4581 Solarte Pabón, Oswaldo https://orcid.org/0000-0003-0315-2838 Provencio Pulla, Mariano https://orcid.org/0000-0001-6315-7919 Menasalvas Ruiz, Ernestina https://orcid.org/0000-0002-5615-6798 Robles Forcada, Víctor https://orcid.org/0000-0003-3937-2269
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Multimedia Tools and Applications
Fecha:	2024
ISSN:	1380-7501
Materias:	Informática Medicina
ODS:	03. Salud y bienestar 09. Industria, innovación e infraestructura
Palabras Clave Informales:	BERT; Breast Cancer; EHR; GPT; Information Extraction; LLM; NER
Escuela:	E.T.S. de Ingenieros Informáticos (UPM)
Departamento:	Lenguajes y Sistemas Informáticos e Ingeniería del Software
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

In recent years, there has been a remarkable surge in the development of Natural Language Processing (NLP) models, particularly in the realm of Named Entity Recognition (NER). Models such as BERT have demonstrated exceptional performance, leveraging annotated corpora for accurate entity identification. However, the question arises: Can newer Large Language Models (LLMs) like GPT be utilized without the need for extensive annotation, thereby enabling direct entity extraction? In this study, we explore this issue, comparing the efficacy of fine-tuning techniques with prompting methods to elucidate the potential of GPT in the identification of medical entities within Spanish electronic health records (EHR).This study utilized a dataset of Spanish EHRs related to breast cancer and implemented both a traditional NER method using BERT, and a contemporary approach that combines few shot learning and integration of external knowledge, driven by LLMs using GPT, to structure the data. The analysis involved a comprehensive pipeline that included these methods. Key performance metrics, such as precision, recall, and F-score, were used to evaluate the effectiveness of each method. This comparative approach aimed to highlight the strength sand limitations of each method in the context of structuring Spanish EHRs efficiently and accurately. The comparative analysis undertaken in this article demonstrates that both the traditional BERT-based NER method and the few-shot LLM-driven approach, augmented with external knowledge, provide comparable levels of precision in metrics such as precision, recall, and F score when applied to Spanish EHR. Contrary to expectations, the LLM-driven approach, which necessitates minimal data annotation, performs on par with BERT’s capability to discern complex medical terminologies and contextual nuances within the EHRs. The results of this study highlight a notable advance in the field of NER for Spanish EHRs, with the few shot approach driven by LLM, enhanced by external knowledge, slightly edging out the traditional BERT-based method in overall effectiveness. GPT’s superiority in F-score and its minimal reliance on extensive data annotation underscore its potential in medical data processing.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Horizonte 2020

875160

CLARIFY project

Sin especificar

Más información

ID de Registro:	88005
Identificador DC:	https://oa.upm.es/88005/
Identificador OAI:	oai:oa.upm.es:88005
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10209285
Identificador DOI:	10.1007/s11042-024-19209-5
URL Oficial:	https://link.springer.com/article/10.1007/s11042-0...
Depositado por:	iMarina Portal Científico
Depositado el:	25 Feb 2025 09:15
Ultima Modificación:	25 Feb 2025 09:38

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

GPT for medical entity recognition in Spanish

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional