Multi-view versus single-view machine learning for disease diagnosis in primary healthcare

Labroski, Aleksandar (2018). Multi-view versus single-view machine learning for disease diagnosis in primary healthcare. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Multi-view versus single-view machine learning for disease diagnosis in primary healthcare
Author/s:
  • Labroski, Aleksandar
Contributor/s:
  • Håkansson, Anne
Item Type: Thesis (Master thesis)
Masters title: Data Science
Date: 16 September 2018
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Otro
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

The work presented in this report considers and compares two different approaches of machine learning towards solving the problem of disease diagnosis prediction in primary healthcare: single-view and multi-view machine learning. In particular, the problem of disease diagnosis prediction refers to the issue of predicting a (possible) diagnosis for a given patient based on her past medical history. The problem area is extensive, especially considering the fact that there are over 14,400 unique possible diagnoses (grouped into 22 high level categories) that can be considered as prediction targets. The approach taken in this work considers the high-level categories as prediction targets and attempts to use the two different machine learning techniques towards getting close to an optimal solution of the issue. The multi-view machine learning paradigm was chosen as an approach that can improve predictive performance of classifiers in settings where we have multiple heterogeneous data sources (different views of the same data), which is exactly the case here. In order to compare the single-view and multi-view machine learning paradigms (based on the concept of supervised learning), several different experiments are devised which explore the possible solution space under each paradigm. The work closely touches on other machine learning concepts such as ensemble learning, stacked generalization and dimensionality reduction-based learning. As we shall see, the results show that multiview stacked generalization is a powerful paradigm that can significantly improve the predictive performance in a supervised learning setting. The different models performance was evaluated using F1 scores and we have been able to observe an average increase of performance of 0.04 and a maximum increase of 0.114 F1 score points. The findings also show that approach of multi-view stacked ensemble learning is particularly well suited as a noise reduction technique and works well in cases where the feature data is expected to contain a notable amount of noise. This can be very beneficial and of interest to projects where the features are not manually chosen by domain experts.

More information

Item ID: 56719
DC Identifier: http://oa.upm.es/56719/
OAI Identifier: oai:oa.upm.es:56719
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 07 Oct 2019 08:34
Last Modified: 07 Oct 2019 08:34
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM