Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Buz, Tolga (2018). Comparative analysis of neural NLP models for information extraction from accounting documents. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).
Title: | Comparative analysis of neural NLP models for information extraction from accounting documents |
---|---|
Author/s: |
|
Contributor/s: |
|
Item Type: | Thesis (Master thesis) |
Masters title: | Data Science |
Date: | 18 October 2018 |
Subjects: | |
Faculty: | E.T.S. de Ingenieros Informáticos (UPM) |
Department: | Otro |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Natural Language Processing has reached a high importance in research and business applications. The state-of-the-art techniques are being used to automate tasks like extracting relevant entities from documents or translating texts from one language to another. This thesis focuses on the task of selecting models that have performed well on standard benchmarks for those tasks, and adapting them to a new and specialised problem: the labelling of entities in invoice documents. For this purpose, five state-of-the-art Neural Network models are presented, applied and evaluated. The results show that four out of the five selected models, based on recurrent and convolutional architectures, can be implemented successfully and perform similarly well on test documents with average F1 performance scores of 68-71% on word level and 67-69% on entity level. A detailed error analysis reveals that low data quality and suboptimal choice of labels due to the dataset’s origins are the main factors that influence the models’ performances. The thesis proposes a ranking of the five models with regards to their prediction performance as well as their cost and difficulty of implementation in order to answer the main research question. Possible improvements are proposed for future work, while the limitations of the project’s setting are explored and discussed. This project aims to contribute a different perspective to NER research by analysing and discussing errors and poor design choices in order to propose future improvements.
Item ID: | 57522 |
---|---|
DC Identifier: | https://oa.upm.es/57522/ |
OAI Identifier: | oai:oa.upm.es:57522 |
Deposited by: | Biblioteca Facultad de Informatica |
Deposited on: | 16 Dec 2019 08:59 |
Last Modified: | 16 Dec 2019 08:59 |