Widaug. Data augmentation for named entity recognition using Wikidata

Calleja Ibáñez, Pablo ORCID: https://orcid.org/0000-0001-8423-8240, Sánchez Alberca, Alfredo and Corcho, Oscar ORCID: https://orcid.org/0000-0002-9260-0753 (2023). Widaug. Data augmentation for named entity recognition using Wikidata. "Procesamiento de Lenguaje Natural" (n. 70); pp. 145-155. ISSN 1135-5948. https://doi.org/10.26342/2023-70-12.

Descripción

Título: Widaug. Data augmentation for named entity recognition using Wikidata
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Procesamiento de Lenguaje Natural
Fecha: 1 Marzo 2023
ISSN: 1135-5948
Número: 70
Materias:
Palabras Clave Informales: Data augmentation, Wikidata, Named entity recognition
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of 10041164.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

The current state of the art of Natural Language Processing models are based on the use of a big amount of data to be trained. The more, the better. However, this is quite a limitation in the creation of datasets for specific natural language processing tasks such as Named Entity Recognition, which involves one or more annotators to read, understand and annotate those required named entities along a corpus. Currently, there are many good general domain corpora for the English language. However, particular domains or scenarios and other non-English languages are still not so represented in the research community. Thus, data augmentation techniques are explored to create synthetic data similar to the originals to enrich the training process of the models. On the other hand, knowledge graphs contain a lot of valuable information that is not being used to help in the data augmentation process. This work proposes a data augmentation method based on the Wikidata knowledge graph which is tested in a Spanish corpus for a Named Entity Recognition challenge.

Más información

ID de Registro: 86404
Identificador DC: https://oa.upm.es/86404/
Identificador OAI: oai:oa.upm.es:86404
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10041164
Identificador DOI: 10.26342/2023-70-12
URL Oficial: http://journal.sepln.org/sepln/ojs/ojs/index.php/p...
Depositado por: iMarina Portal Científico
Depositado el: 21 Ene 2025 14:33
Ultima Modificación: 21 Ene 2025 14:33