Full text
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (984kB) | Preview |
Martín Blanco, Miguel Carlos and Jiménez Martín, Antonio and Mateos Caballero, Alfonso (2018). Possibilistic reward methods for the multi-armed bandit problem. "Neurocomputing", v. 310 ; pp. 201-212. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2018.04.078.
Title: | Possibilistic reward methods for the multi-armed bandit problem |
---|---|
Author/s: |
|
Item Type: | Article |
Título de Revista/Publicación: | Neurocomputing |
Date: | 2018 |
ISSN: | 0925-2312 |
Volume: | 310 |
Subjects: | |
Freetext Keywords: | Multi-armed bandit problem; Possibilistic reward; Numerical study |
Faculty: | E.T.S. de Ingenieros Informáticos (UPM) |
Department: | Inteligencia Artificial |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (984kB) | Preview |
In this paper, we propose a set of allocation strategies to deal with the multi-armed bandit problem, the possibilistic reward (PR) methods. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arm, derived from a set of infinite confidence intervals nested around the expected value. Depending on the inequality used to compute the confidence intervals, there are three possible PR methods with different features. Next, we use a pignistic probability transformation to convert these possibilistic functions into probability distributions following the insufficient reason principle. Finally, Thompson sampling techniques are used to identify the arm with the higher expected reward and play that arm. A numerical study analyses the performance of the proposed methods with respect to other policies in the literature. Two PR methods perform well in all representative scenarios under consideration, and are the best allocation strategies if truncated poisson or exponential distributions in [0,10] are considered for the arms.
Type | Code | Acronym | Leader | Title |
---|---|---|---|---|
Government of Spain | MTM2017-86875-C3-3-R | Unspecified | Universidad Politécnica de Madrid | Toma de decisiones multicriterio y modelos de interdependencia para la gestión de riesgos. Seguridad en ATM |
Government of Spain | MTM2014-56949-C3-2-R | Unspecified | Universidad Politécnica de Madrid | Apoyo a decisiones en análisis de riesgos. Seguridad operacional aérea |
Item ID: | 54720 |
---|---|
DC Identifier: | http://oa.upm.es/54720/ |
OAI Identifier: | oai:oa.upm.es:54720 |
DOI: | 10.1016/j.neucom.2018.04.078 |
Official URL: | https://reader.elsevier.com/reader/sd/pii/S0925231218305630?token=760193A2A0A2590D3DDFD3C81999DF2564C6F7D8392A212D0C7566ACE75A29B92554BE3BB6987F3E3EB943C0D84F0B5C |
Deposited by: | Memoria Investigacion |
Deposited on: | 09 May 2019 06:58 |
Last Modified: | 09 May 2019 06:58 |