eprintid: 64356 rev_number: 19 eprint_status: archive userid: 1903 dir: disk0/00/06/43/56 datestamp: 2021-01-18 12:47:42 lastmod: 2021-01-18 14:52:34 status_changed: 2021-01-18 14:52:34 type: article metadata_visibility: show creators_name: Ordozgoiti Rubio, Bruno creators_name: Mozo Velasco, Bonifacio Alberto creators_name: Garcia Lopez De Lacalle, Jesus title: Regularized greedy column subset selection publisher: Elsevier rights: by-nc-nd ispublished: pub subjects: informatica full_text_status: public keywords: Feature selection; Column subset selection; Unsupervised learning abstract: The Column Subset Selection Problem is a hard combinatorial optimization problem that provides a natural framework for unsupervised feature selection, and there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use. date_type: published date: 2019-06 publication: Information Sciences volume: 486 pagerange: 393-418 id_number: 10.1016/j.ins.2019.02.039 institution: ETSI_Sistemas_Infor department: Sistemas_informaticos_2014 refereed: TRUE issn: 0020-0255 official_url: https://www.sciencedirect.com/science/article/abs/pii/S0020025519301495 citation: Ordozgoiti Rubio, Bruno and Mozo Velasco, Bonifacio Alberto and Garcia Lopez De Lacalle, Jesus (2019). Regularized greedy column subset selection. "Information Sciences", v. 486 ; pp. 393-418. ISSN 0020-0255. https://doi.org/10.1016/j.ins.2019.02.039 . document_url: https://oa.upm.es/64356/2/INVE_MEM_2019_325814.pdf