Full text
![]() |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) |
González López, Ángel Luis (2021). Transformer-based multistage architectures for code search. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).
Title: | Transformer-based multistage architectures for code search |
---|---|
Author/s: |
|
Contributor/s: |
|
Item Type: | Thesis (Master thesis) |
Masters title: | Digital Innovation: Data Science |
Date: | September 2021 |
Subjects: | |
Freetext Keywords: | Code search, Natural Language Processing, BERT, Information retrieval |
Faculty: | E.T.S. de Ingenieros Informáticos (UPM) |
Department: | Otro |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
![]() |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) |
Code Search is one of the most common tasks for developers. The open-source software movement and the rise of social media have made this process easier thanks to the vast public software repositories available to everyone and the Q&A sites where individuals can resolve their doubts. However, in the case of poorly documented code that is difficult to search in a repository, or in the case of private enterprise frameworks that are not publicly available, so there is not a community on Q&A sites to answer questions, searching for code snippets to solve doubts or learn how to use an API becomes very complicated. In order to solve this problem, this thesis studies the use of natural language in code retrieval. In particular, it studies transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), which are currently state of the art in natural language processing but present high latency in information retrieval tasks. That is why this project proposes a multi-stage architecture that seeks to maintain the performance of standard BERT-based models while reducing the high latency usually associated with the use of this type of framework. Experiments show that this architecture outperforms previous nonBERT-based models by +0.17 on the Top 1 (or Recall@1) metric and reduces latency with inference times 5% of those of standard BERT models.
Item ID: | 70462 |
---|---|
DC Identifier: | https://oa.upm.es/70462/ |
OAI Identifier: | oai:oa.upm.es:70462 |
Deposited by: | Biblioteca Facultad de Informatica |
Deposited on: | 11 May 2022 10:52 |
Last Modified: | 11 May 2022 10:52 |