Large language models: a new approach for privacy policy analysis at scale

Rodriguez Torrado, David

, Yang, Ian, Álamo Ramiro, José María del

and Sadeh, Norman

(2024). Large language models: a new approach for privacy policy analysis at scale. "Computing", v. 106 (n. 12); pp. 3879-3903. ISSN 1436-5057. https://doi.org/10.1007/s00607-024-01331-9.

Descripción

Título:	Large language models: a new approach for privacy policy analysis at scale
Autor/es:	Rodriguez Torrado, David https://orcid.org/0000-0002-0911-4608 Yang, Ian Álamo Ramiro, José María del https://orcid.org/0000-0002-6513-0303 Sadeh, Norman https://orcid.org/0000-0003-4829-5533
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Computing
Fecha:	2024
ISSN:	1436-5057
Volumen:	106
Número:	12
Materias:	Informática
Palabras Clave Informales:	Large language models; Natural language processing; Privacy policies; Data protection; Privacy; Feature extraction
Escuela:	E.T.S.I. y Sistemas de Telecomunicación (UPM)
Departamento:	Ingeniería de Sistemas Telemáticos
Licencias Creative Commons:	Ninguna

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people’s privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Gobierno de España

TED2021-130455A-I00

Sin especificar

Más información

ID de Registro:	87559
Identificador DC:	https://oa.upm.es/87559/
Identificador OAI:	oai:oa.upm.es:87559
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10247169
Identificador DOI:	10.1007/s00607-024-01331-9
URL Oficial:	https://link.springer.com/article/10.1007/s00607-0...
Depositado por:	Señor David Rodríguez Torrado
Depositado el:	31 Ene 2025 19:22
Ultima Modificación:	31 Ene 2025 19:22

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Large language models: a new approach for privacy policy analysis at scale

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional