Two strategies for large-scale multi-label classification on the YouTube-8M dataset

Li, Dalei (2017). Two strategies for large-scale multi-label classification on the YouTube-8M dataset. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Two strategies for large-scale multi-label classification on the YouTube-8M dataset
Author/s:
  • Li, Dalei
Contributor/s:
  • Baumela Molina, Luis
Item Type: Thesis (Master thesis)
Masters title: Ingeniería Informática
Date: July 2017
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

This project is a part of the YouTube-8M Video Understanding Challenge. The challenge asks participants to build prediction models based on the YouTube-8M dataset which comprises 7 million videos. The model should be able to predict a set of labels for an unseen video accurately. The challenge is essentially a multi-label classification problem. There are two main challenges to solve this problem. The first one comes from the huge number of videos in the dataset and a relatively large number of features. This makes the implementations of traditional machine learning algorithms that are designed for running on a single machine infeasible. To tackle this, we adopt two strategies, streaming instances and incremental learning. Both of them splits the whole dataset into a sequence of small batches. The streaming strategy builds a model based on the accumulated information extracted from each batch. The incremental learning improves a model based on each batch in an iterative manner. The second challenge is the huge number of not mutually exclusive labels. We use the computation-efficient one-vs-rest strategy to solve it. For the streaming instances strategy, we adapt and implement the multi-label knearest neighbor (ML-kNN) and the multi-label radial basis function (RBF) network algorithms. As for the incremental learning strategy, we implement the one-vs-rest logistic regression and the multi-layer neural network algorithms. All the algorithms are implemented in Tensorflow on the video-level dataset instead of the frame-level dataset. Further, we experiment with these algorithms and identify the ways to make them work well. We also try an ensemble technique to improve the result. For a single model, the experimental result shows that the multi-layer neural network model performs the best in terms of the global average precision at 20 (GAP) in the private test set (around 0.78). The best RBF network model achieves 0.77 while the best ML-kNN reaches 0.72. And the best logistic regression model obtains 0.70. Finally, the best GAP is achieved by a bag of 6 multi-layer neural network models, which is 0.80106 (top 18% over 650 teams). Having not used the more informative frame-level dataset, the result completely satisfies our expectations.

More information

Item ID: 55867
DC Identifier: http://oa.upm.es/55867/
OAI Identifier: oai:oa.upm.es:55867
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 18 Jul 2019 07:09
Last Modified: 18 Jul 2019 07:09
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM