Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Romeo, Paolo (2020). Twitter sentiment analysis: a comparison of available techniques and services. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).
Title: | Twitter sentiment analysis: a comparison of available techniques and services |
---|---|
Author/s: |
|
Contributor/s: |
|
Item Type: | Thesis (Master thesis) |
Masters title: | Data Science |
Date: | July 2020 |
Subjects: | |
Faculty: | E.T.S. de Ingenieros Informáticos (UPM) |
Department: | Lenguajes y Sistemas Informáticos e Ingeniería del Software |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Many different models and services to perform Sentiment Analysis are available. It is often difficult to choose the right one for the use case of interest. This thesis analyses relevant techniques that have been successfully applied to classify sentiment polarity and it proposes a comparison of their performances based on experiments run on the dataset Sentiment140. Moreover, it proposes an analysis to understand when the models agree on the correct classification to highlight the margin of improvement that is possible to achieve in theory. Three main macrocategories of models are considered: traditional models based on mathematical theorems or intuitions (Naive Bayes, Support Vector Machine, Logistic Regression and Random Forest), neural models (ANN, CNN, Bi-LSTM and a hybrid approach) and classification services offered by top technology companies (AWS Comprehend, Google Natural Language API and Meaning Cloud). The tested models achieved very similar performances, with the best model represented by Logistic Regression. Despite the potential of neural models and the advantages of ready-to-use services, traditional models proved to be the best trade-off and provided the best performances. Analysing when the models agree, it was possible to observe that there is a subset of the dataset that is not correctly classified by any model, although in theory it is possible to achieve much better performances than those obtained by individual models.
Item ID: | 63070 |
---|---|
DC Identifier: | https://oa.upm.es/63070/ |
OAI Identifier: | oai:oa.upm.es:63070 |
Deposited by: | Biblioteca Facultad de Informatica |
Deposited on: | 24 Jul 2020 13:51 |
Last Modified: | 31 May 2022 17:55 |