Widaug. Data augmentation for named entity recognition using Wikidata

Calleja Ibáñez, Pablo

, Sánchez Alberca, Alfredo and Corcho, Oscar

(2023). Widaug. Data augmentation for named entity recognition using Wikidata. "Procesamiento de Lenguaje Natural" (n. 70); pp. 145-155. ISSN 1135-5948. https://doi.org/10.26342/2023-70-12.

Descripción

Título:	Widaug. Data augmentation for named entity recognition using Wikidata
Autor/es:	Calleja Ibáñez, Pablo https://orcid.org/0000-0001-8423-8240 Sánchez Alberca, Alfredo Corcho, Oscar https://orcid.org/0000-0002-9260-0753
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Procesamiento de Lenguaje Natural
Fecha:	1 Marzo 2023
ISSN:	1135-5948
Número:	70
Materias:	Informática
Palabras Clave Informales:	Data augmentation, Wikidata, Named entity recognition
Escuela:	E.T.S. de Ingenieros Informáticos (UPM)
Departamento:	Inteligencia Artificial
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

The current state of the art of Natural Language Processing models are based on the use of a big amount of data to be trained. The more, the better. However, this is quite a limitation in the creation of datasets for specific natural language processing tasks such as Named Entity Recognition, which involves one or more annotators to read, understand and annotate those required named entities along a corpus. Currently, there are many good general domain corpora for the English language. However, particular domains or scenarios and other non-English languages are still not so represented in the research community. Thus, data augmentation techniques are explored to create synthetic data similar to the originals to enrich the training process of the models. On the other hand, knowledge graphs contain a lot of valuable information that is not being used to help in the data augmentation process. This work proposes a data augmentation method based on the Wikidata knowledge graph which is tested in a Spanish corpus for a Named Entity Recognition challenge.

Más información

ID de Registro:	86404
Identificador DC:	https://oa.upm.es/86404/
Identificador OAI:	oai:oa.upm.es:86404
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10041164
Identificador DOI:	10.26342/2023-70-12
URL Oficial:	http://journal.sepln.org/sepln/ojs/ojs/index.php/p...
Depositado por:	iMarina Portal Científico
Depositado el:	21 Ene 2025 14:33
Ultima Modificación:	21 Ene 2025 14:33

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Widaug. Data augmentation for named entity recognition using Wikidata

Cita

Descripción

Texto completo

Resumen

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional