The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study

Martín Stickle, Miguel; Jiménez Martín, Antonio y Mateos Caballero, Alfonso (2017). The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study. En: "ICORES 2017 6th International Conference on Operations Research and Enterprise Systems", 23-25, February 2017, Porto, Portugal. ISBN 978-989-758-218-9. pp. 75-84. https://doi.org/10.5220/0006189900960107.

Descripción

Título: The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study
Autor/es:
  • Martín Stickle, Miguel
  • Jiménez Martín, Antonio
  • Mateos Caballero, Alfonso
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: ICORES 2017 6th International Conference on Operations Research and Enterprise Systems
Fechas del Evento: 23-25, February 2017
Lugar del Evento: Porto, Portugal
Título del Libro: Proceedings of the 6th International Conference on Operations Research and Enterprise Systems
Fecha: 2017
ISBN: 978-989-758-218-9
Volumen: 1
Materias:
Palabras Clave Informales: Multi-armed Bandit Problem, Possibilistic Reward, Numerical Study
Escuela: E.T.S.I. Caminos, Canales y Puertos (UPM)
Departamento: Matemática e Informática Aplicadas a la Ingenierías Civil y Naval
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) | Vista Previa

Resumen

Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilities close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions.

Proyectos asociados

TipoCódigoAcrónimoResponsableTítulo
Gobierno de EspañaMTM2014-56949-C3-2-RSin especificarJiménez Martín, AntonioApoyo a decisiones en análisis de riesgos. Seguridad operacional aérea

Más información

ID de Registro: 53405
Identificador DC: http://oa.upm.es/53405/
Identificador OAI: oai:oa.upm.es:53405
Identificador DOI: 10.5220/0006189900960107
URL Oficial: http://www.icores.org/Home.aspx?y=2017
Depositado por: Memoria Investigacion
Depositado el: 09 Ene 2019 11:58
Ultima Modificación: 09 Ene 2019 11:58
  • InvestigaM
  • GEO_UP4
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM