Random forests for regression as a weighted sum of k-Potential Nearest Neighbors

Fernández González, Pablo, Bielza Lozoya, María Concepción ORCID: https://orcid.org/0000-0001-7109-2668 and Larrañaga Múgica, Pedro María ORCID: https://orcid.org/0000-0002-1885-4501 (2019). Random forests for regression as a weighted sum of k-Potential Nearest Neighbors. "IEEE Access", v. 7 ; pp. 25660-25672. ISSN 2169-3536. https://doi.org/10.1109/ACCESS.2019.2900755.

Description

Title: Random forests for regression as a weighted sum of k-Potential Nearest Neighbors
Author/s:
Item Type: Article
Título de Revista/Publicación: IEEE Access
Date: 2019
ISSN: 2169-3536
Volume: 7
Subjects:
Freetext Keywords: Random forests; Regression; Bagging; Bootstrap; Nearest neighbors; K-Potential Nearest Neighbors
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2019_302777.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (336kB) | Preview

Abstract

In this paper, we tackle the problem of random forests for regression expressed as weightedsums of datapoints. We study the theoretical behavior ofk-potential nearest neighbors (k-PNNs) underbagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splittingcriterion, provided that we use unpruned trees that stop growing only when there arekor less datapoints attheir leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms)introduced in this paper, to derive the explicit expression of weights for datapoints in a random (k-PNNs)selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive otherbagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression ofweights of a regression estimate equivalent to a random forest regression estimate with the random splittingcriterion and demonstrate its equivalence both theoretically and practically.

Funding Projects

Type
Code
Acronym
Leader
Title
Government of Spain
C080020-09
Unspecified
Unspecified
Cajal Blue Brain Project
Government of Spain
TIN2016-79684-P
Unspecified
Universidad Politécnica de Madrid
Avances en clasificación multidimensional y detección de anomalías con redes bayesianas

More information

Item ID: 63472
DC Identifier: https://oa.upm.es/63472/
OAI Identifier: oai:oa.upm.es:63472
DOI: 10.1109/ACCESS.2019.2900755
Official URL: https://ieeexplore.ieee.org/document/8648334
Deposited by: Memoria Investigacion
Deposited on: 05 Nov 2020 12:23
Last Modified: 05 Nov 2020 12:23
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM