Sample Size vs. Bias in Defect Prediction

Rahman, Foyzur, Posnett, Daryl, Herraiz Tabernero, Israel and Devanbu, Premkumar (2013). Sample Size vs. Bias in Defect Prediction. En: "9th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering", August 2013. ISBN 978-1-4503-2237-9.

Descripción

Título:	Sample Size vs. Bias in Defect Prediction
Autor/es:	Rahman, Foyzur Posnett, Daryl Herraiz Tabernero, Israel Devanbu, Premkumar
Tipo de Documento:	Ponencia en Congreso o Jornada (Artículo)
Título del Evento:	9th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering
Fechas del Evento:	August 2013
Título del Libro:	Proceedings of the 9th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering
Fecha:	Agosto 2013
ISBN:	978-1-4503-2237-9
Materias:	Informática
ODS:	09. Industria, innovación e infraestructura
Escuela:	E.T.S.I. Caminos, Canales y Puertos (UPM)
Departamento:	Matemática e Informática Aplicadas a la Ingeniería Civil [hasta 2014]
Licencias Creative Commons:	Reconocimiento

Texto completo

[thumbnail of Versión actualizada en Julio de 2013]

Vista Previa

PDF (Portable Document Format) (Versión actualizada en Julio de 2013) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (347kB) | Vista Previa

Resumen

Most empirical disciplines promote the reuse and sharing of datasets,
as it leads to greater possibility of
replication.
While this is increasingly the case in Empirical Software Engineering,
some of the most popular bug-fix datasets are now known to be biased.
This raises two significants concerns: first, that sample bias
may lead to underperforming prediction models, and second, that
the external validity of the studies based on biased datasets may be suspect.
This issue has raised considerable consternation in the ESE literature in recent
years. However, there is a confounding factor
of these datasets that has not been examined carefully: size.
Biased datasets are sampling only some of the data
that could be sampled, and doing so in a biased fashion; but biased
samples could be smaller, or larger. Smaller data sets in general
provide less reliable bases for estimating models, and thus could
lead to inferior model performance.
In this setting, we ask the question,
what affects performance more? bias, or size?
We conduct a detailed, large-scale meta-analysis, using simulated
datasets sampled with bias from a high-quality dataset which is
relatively free of bias.
Our results suggest that size always matters just as much bias direction, and
in fact much more than bias direction
when considering information-retrieval measures such as
AUC and F-score.
This indicates that at least for prediction models, even when dealing with sampling bias,
simply finding larger samples can sometimes be sufficient. Our analysis also exposes
the complexity of the bias issue, and raises further issues to be explored in the future.

Más información

ID de Registro:	15712
Identificador DC:	https://oa.upm.es/15712/
Identificador OAI:	oai:oa.upm.es:15712
URL Oficial:	http://esec-fse.inf.ethz.ch/
Depositado por:	Israel Herraiz
Depositado el:	09 Jun 2013 13:54
Ultima Modificación:	21 Abr 2016 15:59

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Sample Size vs. Bias in Defect Prediction

Cita

Descripción

Texto completo

Resumen

Más información

Acciones

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional