Possibilistic reward methods for the multi-armed bandit problem

Martín Blanco, Miguel Carlos and Jiménez Martín, Antonio and Mateos Caballero, Alfonso (2018). Possibilistic reward methods for the multi-armed bandit problem. "Neurocomputing", v. 310 ; pp. 201-212. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2018.04.078.

Description

Title: Possibilistic reward methods for the multi-armed bandit problem
Author/s:
  • Martín Blanco, Miguel Carlos
  • Jiménez Martín, Antonio
  • Mateos Caballero, Alfonso
Item Type: Article
Título de Revista/Publicación: Neurocomputing
Date: 2018
ISSN: 0925-2312
Volume: 310
Subjects:
Freetext Keywords: Multi-armed bandit problem; Possibilistic reward; Numerical study
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (984kB) | Preview

Abstract

In this paper, we propose a set of allocation strategies to deal with the multi-armed bandit problem, the possibilistic reward (PR) methods. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arm, derived from a set of infinite confidence intervals nested around the expected value. Depending on the inequality used to compute the confidence intervals, there are three possible PR methods with different features. Next, we use a pignistic probability transformation to convert these possibilistic functions into probability distributions following the insufficient reason principle. Finally, Thompson sampling techniques are used to identify the arm with the higher expected reward and play that arm. A numerical study analyses the performance of the proposed methods with respect to other policies in the literature. Two PR methods perform well in all representative scenarios under consideration, and are the best allocation strategies if truncated poisson or exponential distributions in [0,10] are considered for the arms.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainMTM2017-86875-C3-3-RUnspecifiedUniversidad Politécnica de MadridToma de decisiones multicriterio y modelos de interdependencia para la gestión de riesgos. Seguridad en ATM
Government of SpainMTM2014-56949-C3-2-RUnspecifiedUniversidad Politécnica de MadridApoyo a decisiones en análisis de riesgos. Seguridad operacional aérea

More information

Item ID: 54720
DC Identifier: http://oa.upm.es/54720/
OAI Identifier: oai:oa.upm.es:54720
DOI: 10.1016/j.neucom.2018.04.078
Official URL: https://reader.elsevier.com/reader/sd/pii/S0925231218305630?token=760193A2A0A2590D3DDFD3C81999DF2564C6F7D8392A212D0C7566ACE75A29B92554BE3BB6987F3E3EB943C0D84F0B5C
Deposited by: Memoria Investigacion
Deposited on: 09 May 2019 06:58
Last Modified: 09 May 2019 06:58
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM