Full text
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Martín Stickle, Miguel and Jiménez Martín, Antonio and Mateos Caballero, Alfonso (2017). The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study. In: "ICORES 2017 6th International Conference on Operations Research and Enterprise Systems", 23-25, February 2017, Porto, Portugal. ISBN 978-989-758-218-9. pp. 75-84. https://doi.org/10.5220/0006189900960107.
Title: | The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | ICORES 2017 6th International Conference on Operations Research and Enterprise Systems |
Event Dates: | 23-25, February 2017 |
Event Location: | Porto, Portugal |
Title of Book: | Proceedings of the 6th International Conference on Operations Research and Enterprise Systems |
Date: | 2017 |
ISBN: | 978-989-758-218-9 |
ISSN: | 978-989-758-218-9 |
Volume: | 1 |
Subjects: | |
Freetext Keywords: | Multi-armed Bandit Problem, Possibilistic Reward, Numerical Study |
Faculty: | E.T.S.I. Caminos, Canales y Puertos (UPM) |
Department: | Matemática e Informática Aplicadas a la Ingenierías Civil y Naval |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilities close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions.
Type | Code | Acronym | Leader | Title |
---|---|---|---|---|
Government of Spain | MTM2014-56949-C3-2-R | Unspecified | Jiménez Martín, Antonio | Apoyo a decisiones en análisis de riesgos. Seguridad operacional aérea |
Item ID: | 53405 |
---|---|
DC Identifier: | https://oa.upm.es/53405/ |
OAI Identifier: | oai:oa.upm.es:53405 |
DOI: | 10.5220/0006189900960107 |
Official URL: | http://www.icores.org/Home.aspx?y=2017 |
Deposited by: | Memoria Investigacion |
Deposited on: | 09 Jan 2019 11:58 |
Last Modified: | 09 Jan 2019 11:58 |