Twitter sentiment analysis: a comparison of available techniques and services

Romeo, Paolo (2020). Twitter sentiment analysis: a comparison of available techniques and services. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Twitter sentiment analysis: a comparison of available techniques and services
Author/s:
  • Romeo, Paolo
Contributor/s:
  • Menasalvas Ruíz, Ernestina
Item Type: Thesis (Master thesis)
Masters title: Data Science
Date: July 2020
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Lenguajes y Sistemas Informáticos e Ingeniería del Software
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

Many different models and services to perform Sentiment Analysis are available. It is often difficult to choose the right one for the use case of interest. This thesis analyses relevant techniques that have been successfully applied to classify sentiment polarity and it proposes a comparison of their performances based on experiments run on the dataset Sentiment140. Moreover, it proposes an analysis to understand when the models agree on the correct classification to highlight the margin of improvement that is possible to achieve in theory. Three main macrocategories of models are considered: traditional models based on mathematical theorems or intuitions (Naive Bayes, Support Vector Machine, Logistic Regression and Random Forest), neural models (ANN, CNN, Bi-LSTM and a hybrid approach) and classification services offered by top technology companies (AWS Comprehend, Google Natural Language API and Meaning Cloud). The tested models achieved very similar performances, with the best model represented by Logistic Regression. Despite the potential of neural models and the advantages of ready-to-use services, traditional models proved to be the best trade-off and provided the best performances. Analysing when the models agree, it was possible to observe that there is a subset of the dataset that is not correctly classified by any model, although in theory it is possible to achieve much better performances than those obtained by individual models.

More information

Item ID: 63070
DC Identifier: http://oa.upm.es/63070/
OAI Identifier: oai:oa.upm.es:63070
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 24 Jul 2020 13:51
Last Modified: 24 Jul 2020 13:51
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM