Citation
Luengo Sánchez, Sergio and Bielza Lozoya, María Concepción and Larrañaga Múgica, Pedro María
(2017).
Directional-linear data clustering using structural expectation-maximization algorithm.
In: "ADISTA 2017: International Directional Statistics Workshop", 8-9 Jun 2017, Roma, Italia. p. 1.
Abstract
The study of plethora of phenomena requires the measurement of their magni-
tude and direction as in meteorology (Carta et al. 2009), rhythmometry, medicine
or demography (Batschelet 1981, Batschelet et al. 1973). Probabilistic cluster-
ing of this data is typically tackled by means of mixtures of Gaussians (Fraley
and Raftery 2002, McLachlan and Basford 1988, Melnykov and Maitra 2010),
although they tend to underperform due to their inability to handle periodic-
ity of directional data. To address this problem several distributions have been
proposed to cluster bivariate cylindrical data (Carta et al. 2009, Gatto and Jam-
malamadaka 2007, Mardia and Sutton 1978, Qin et al. 2010) and multivariate
data having one circular variable (Roy et al. 2014).
Recently, an approach (Luengo-Sanchez et al. 2016) based on exploiting the con-
ditional independence assumptions encoded by a Bayesian network enables effi-
cient clustering of multivariate directional-linear data, distributed as Gaussian
and von Mises respectively, even when there is more than one directional variable
by means of the structural expectation-maximization algorithm (Friedman 1997).
However, strong constraints on the structure of the Bayesian network must be
imposed.
Here we propose measures of divergence and distance among clusters, as Kullback-
Leibler divergence and Bhattacharyya distance, for the previous model to evalu-
ate the quality of the clustering outcomes and we extend the model by relaxing
the structural constraints to include relations of dependence between directional
variables and Gaussians. We present an application for neuroscience to cluster
dendritic spines according to a set of morphological features that combine direc-
tional and linear variables.