Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB) |
ORCID: https://orcid.org/0000-0001-7801-8815, Badenes Olmedo, Carlos
ORCID: https://orcid.org/0000-0002-2753-9917 and Corcho, Oscar
ORCID: https://orcid.org/0000-0002-9260-0753
(2024).
Dynamic topic modelling for exploring the scientific literature on coronavirus: an unsupervised labelling technique.
"International Journal of Data Science and Analytics"
;
ISSN 2364-415X.
https://doi.org/10.1007/s41060-024-00610-0.
| Título: | Dynamic topic modelling for exploring the scientific literature on coronavirus: an unsupervised labelling technique |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | International Journal of Data Science and Analytics |
| Fecha: | 13 Agosto 2024 |
| ISSN: | 2364-415X |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | CORD-1; CORD-19; Coronavirus; Coronaviruses; COVID-19; Dynamic topic model; Dynamic topic models; Interpretability; Labeling techniques; Labelings; Scientific Literature; Stem-Cell Transplantation; Tim; Topic interpretability; Topic labeling; Topic labelling; topic modeling |
| Escuela: | E.T.S. de Ingenieros Informáticos (UPM) |
| Departamento: | Inteligencia Artificial |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB) |
The work presented in this article focusses on improving the interpretability of probabilistic topic models created from a large collection of scientific documents that evolve over time. Several time-dependent approaches based on topic models were compared to analyse the annual evolution of latent concepts in the CORD-19 corpus: Dynamic Topic Model, Dynamic Embedded Topic Model, and BERTopic. Then COVID-19 period (December 2019-present) has been analysed in greater depth, month by month, to explore the evolution of what is written about the disease. The evaluations suggest that the Dynamic Topic Model is the best choice to analyse the CORD-19 corpus. A novel topic labelling strategy is proposed for dynamic topic models to analyse the evolution of latent concepts. It incorporates content changes in both the annual evolution of the corpus and the monthly evolution of the COVID-19 disease. The generated labels are manually validated using two approaches: through the most relevant documents on the topic and through the documents that share the most semantically similar label topics. The labelling enables the interpretation of topics. The novel method for dynamic topic labelling fits the content of each topic and supports the semantics of the topics.
| ID de Registro: | 88044 |
|---|---|
| Identificador DC: | https://oa.upm.es/88044/ |
| Identificador OAI: | oai:oa.upm.es:88044 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10243148 |
| Identificador DOI: | 10.1007/s41060-024-00610-0 |
| URL Oficial: | https://link.springer.com/article/10.1007/s41060-0... |
| Depositado por: | iMarina Portal Científico |
| Depositado el: | 26 Feb 2025 08:41 |
| Ultima Modificación: | 26 Feb 2025 08:57 |
Publicar en el Archivo Digital desde el Portal Científico