Applying deep learning techniques to terminology extraction in specific domains

Barrufet Ribes, Júlia (2021). Applying deep learning techniques to terminology extraction in specific domains. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Applying deep learning techniques to terminology extraction in specific domains
Author/s:
  • Barrufet Ribes, Júlia
Contributor/s:
  • Rico Almodóvar, Mariano
Item Type: Thesis (Master thesis)
Masters title: Ciencia de Datos
Date: 12 July 2021
Subjects:
Freetext Keywords: Natural Language Processing, Deep learning, Neural networks, Keyphrase extraction
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (3MB) | Preview

Abstract

Keyphrase extraction is one of the most complex research fields of Natural Language Processing. This project presents a deep learning model for the automatic keyphrase extraction task that achieves to outperform state of the art approaches. The baseline of the method described in this work consists of a BiLSTM neural network [1] combined with a CRF layer, forming the so called BiLSTM-CRF model [2]. In this configuration, the BiLSTM layer helps to capture the semantics of document contexts, while the CRF layer focuses on the dependencies among the labels of neighboring words, with the goal of overcoming the limitations in previous approaches. A key element that provided a significant performance improvement in this task has been the use of contextual word embeddings instead of fixed embeddings [3]. In our model, we combine the most successful contributions presented in state of the art publications, obtaining an F1 score higher than what they achieve separately. In this work, we modify the structure of the BiLSTM-CRF model on contextual word embeddings described above, by adding additional BiLSTM layers to the neural network. This approach was introduced by [4] on the BiLSTM using fixed embeddings. We prove with our model that a BiLSTMCRF model with two BiLSTM layers, used on contextual embeddings, achieves the best results to date for the keyphrase extraction task using deep learning. The developed model can be found in a Github repository 1 , together with the preprocessed dataset used in all the tests presented in this work.

More information

Item ID: 68765
DC Identifier: https://oa.upm.es/68765/
OAI Identifier: oai:oa.upm.es:68765
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 07 Oct 2021 11:48
Last Modified: 07 Oct 2021 11:48
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM