Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) |
ORCID: https://orcid.org/0000-0001-8423-8240, Sánchez Alberca, Alfredo and Corcho, Oscar
ORCID: https://orcid.org/0000-0002-9260-0753
(2023).
Widaug. Data augmentation for named entity recognition using Wikidata.
"Procesamiento de Lenguaje Natural"
(n. 70);
pp. 145-155.
ISSN 1135-5948.
https://doi.org/10.26342/2023-70-12.
| Título: | Widaug. Data augmentation for named entity recognition using Wikidata |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | Procesamiento de Lenguaje Natural |
| Fecha: | 1 Marzo 2023 |
| ISSN: | 1135-5948 |
| Número: | 70 |
| Materias: | |
| Palabras Clave Informales: | Data augmentation, Wikidata, Named entity recognition |
| Escuela: | E.T.S. de Ingenieros Informáticos (UPM) |
| Departamento: | Inteligencia Artificial |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) |
The current state of the art of Natural Language Processing models are based on the use of a big amount of data to be trained. The more, the better. However, this is quite a limitation in the creation of datasets for specific natural language processing tasks such as Named Entity Recognition, which involves one or more annotators to read, understand and annotate those required named entities along a corpus. Currently, there are many good general domain corpora for the English language. However, particular domains or scenarios and other non-English languages are still not so represented in the research community. Thus, data augmentation techniques are explored to create synthetic data similar to the originals to enrich the training process of the models. On the other hand, knowledge graphs contain a lot of valuable information that is not being used to help in the data augmentation process. This work proposes a data augmentation method based on the Wikidata knowledge graph which is tested in a Spanish corpus for a Named Entity Recognition challenge.
| ID de Registro: | 86404 |
|---|---|
| Identificador DC: | https://oa.upm.es/86404/ |
| Identificador OAI: | oai:oa.upm.es:86404 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10041164 |
| Identificador DOI: | 10.26342/2023-70-12 |
| URL Oficial: | http://journal.sepln.org/sepln/ojs/ojs/index.php/p... |
| Depositado por: | iMarina Portal Científico |
| Depositado el: | 21 Ene 2025 14:33 |
| Ultima Modificación: | 21 Ene 2025 14:33 |
Publicar en el Archivo Digital desde el Portal Científico