Cooperative off-policy prediction of markov decision processes in adaptive networks

Valcarcel Macua, Sergio, Chen, Jianshu, Zazo Bello, Santiago

and Sayed, Ali H. (2013). Cooperative off-policy prediction of markov decision processes in adaptive networks. En: "IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)", 26/05/2013 - 31/05/2013, Vancouver, Canada. pp. 4539-4543. https://doi.org/10.1109/ICASSP.2013.6638519.

Descripción

Título:	Cooperative off-policy prediction of markov decision processes in adaptive networks
Autor/es:	Valcarcel Macua, Sergio Chen, Jianshu Zazo Bello, Santiago https://orcid.org/0000-0001-9073-7927 Sayed, Ali H.
Tipo de Documento:	Ponencia en Congreso o Jornada (Artículo)
Título del Evento:	IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Fechas del Evento:	26/05/2013 - 31/05/2013
Lugar del Evento:	Vancouver, Canada
Título del Libro:	IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Fecha:	2013
Materias:	Telecomunicaciones
ODS:	09. Industria, innovación e infraestructura
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Señales, Sistemas y Radiocomunicaciones
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

Vista Previa

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (851kB)

Resumen

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Más información

ID de Registro:	28941
Identificador DC:	https://oa.upm.es/28941/
Identificador OAI:	oai:oa.upm.es:28941
Identificador DOI:	10.1109/ICASSP.2013.6638519
Depositado por:	Memoria Investigacion
Depositado el:	29 Jun 2014 11:38
Ultima Modificación:	22 Sep 2014 11:43

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Cooperative off-policy prediction of markov decision processes in adaptive networks

Cita

Descripción

Texto completo

Resumen

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional