The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study

Martín Stickle, Miguel and Jiménez Martín, Antonio and Mateos Caballero, Alfonso (2017). The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study. In: "ICORES 2017 6th International Conference on Operations Research and Enterprise Systems", 23-25, February 2017, Porto, Portugal. ISBN 978-989-758-218-9. pp. 75-84. https://doi.org/10.5220/0006189900960107.

Description

Title: The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study
Author/s:
  • Martín Stickle, Miguel
  • Jiménez Martín, Antonio
  • Mateos Caballero, Alfonso
Item Type: Presentation at Congress or Conference (Article)
Event Title: ICORES 2017 6th International Conference on Operations Research and Enterprise Systems
Event Dates: 23-25, February 2017
Event Location: Porto, Portugal
Title of Book: Proceedings of the 6th International Conference on Operations Research and Enterprise Systems
Date: 2017
ISBN: 978-989-758-218-9
Volume: 1
Subjects:
Freetext Keywords: Multi-armed Bandit Problem, Possibilistic Reward, Numerical Study
Faculty: E.T.S.I. Caminos, Canales y Puertos (UPM)
Department: Matemática e Informática Aplicadas a la Ingenierías Civil y Naval
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilities close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainMTM2014-56949-C3-2-RUnspecifiedJiménez Martín, AntonioApoyo a decisiones en análisis de riesgos. Seguridad operacional aérea

More information

Item ID: 53405
DC Identifier: http://oa.upm.es/53405/
OAI Identifier: oai:oa.upm.es:53405
DOI: 10.5220/0006189900960107
Official URL: http://www.icores.org/Home.aspx?y=2017
Deposited by: Memoria Investigacion
Deposited on: 09 Jan 2019 11:58
Last Modified: 09 Jan 2019 11:58
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM