Detection and classification of anomalies in road traffic using spark streaming

Consuegra Rengifo, Nathan Adolfo (2018). Detection and classification of anomalies in road traffic using spark streaming. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Detection and classification of anomalies in road traffic using spark streaming
Author/s:
  • Consuegra Rengifo, Nathan Adolfo
Contributor/s:
  • Abbas, Zainab
  • Al-Shishtawy, Ahmad
  • Vlassov, Vladimir
Item Type: Thesis (Master thesis)
Masters title: Data Science
Date: 2018
Subjects:
Freetext Keywords: Anomaly detection; Traffic flow; Accidents; Weather; Decision tree; Random forest; Logistic regression; Streaming
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Otro
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB) | Preview

Abstract

Road traffic control has been around for a long time to guarantee the safety of vehicles and pedestrians. However, anomalies such as accidents or natural disasters cannot be avoided. Therefore, it is important to be prepared as soon as possible to prevent a higher number of human losses. Nevertheless, there is no system accurate enough that detects and classifies anomalies from the road traffic in real time. To solve this issue, the following study proposes the training of a machine learning model for detection and classification of anomalies on the highways of Stockholm. Due to the lack of a labeled dataset, the first phase of the work is to detect the different kind of outliers that can be found and manually label them based on the results of a data exploration study. Datasets containing information regarding accidents and weather are also included to further expand the amount of anomalies. All experiments use real world datasets coming from either the sensors located on the highways of Stockholm or from official accident and weather reports. Then, three models (Decision Trees, Random Forest and Logistic Regression) are trained to detect and classify the outliers. The design of an Apache Spark streaming application that uses the model with the best results is also provided. The outcomes indicate that Logistic Regression is better than the rest but still suffers from the imbalanced nature of the dataset. In the future, this project can be used to not only contribute to future research on similar topics but also to monitor the highways of Stockholm.

More information

Item ID: 56722
DC Identifier: http://oa.upm.es/56722/
OAI Identifier: oai:oa.upm.es:56722
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 07 Oct 2019 08:14
Last Modified: 07 Oct 2019 08:14
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM