Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

Luna Jiménez, Cristina

, Gil Martín, Manuel

, D'Haro Enríquez, Luis Fernando

, Fernández Martínez, Fernando

and San Segundo Hernández, Rubén

(2024). Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models. "Expert Systems with Applications", v. 255 ; p. 124524. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2024.124524.

Descripción

Título:	Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models
Autor/es:	Luna Jiménez, Cristina https://orcid.org/0000-0001-5369-856X Gil Martín, Manuel https://orcid.org/0000-0002-4285-6224 D'Haro Enríquez, Luis Fernando https://orcid.org/0000-0002-3411-7384 Fernández Martínez, Fernando https://orcid.org/0000-0003-3877-0089 San Segundo Hernández, Rubén https://orcid.org/0000-0001-9659-5464
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Expert Systems with Applications
Fecha:	Diciembre 2024
ISSN:	0957-4174
Volumen:	255
Materias:	Telecomunicaciones
ODS:	09. Industria, innovación e infraestructura
Palabras Clave Informales:	Data and text mining, Dialogues generation, Dialogues evaluation, Affective-computing
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería Electrónica
Licencias Creative Commons:	Reconocimiento

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB)

Resumen

The appearance of Large Language Models (LLM) has implied a qualitative step forward in the performance of conversational agents, and even in the generation of creative texts. However, previous applications of these models in generating dialogues neglected the impact of ‘hallucinations’ in the context of generating synthetic dialogues, thus omitting this central aspect in their evaluations. For this reason, we propose an open-source and flexible framework called GenEvalGPT framework: a comprehensive multi-stage evaluation strategy utilizing diverse metrics. The objective is two-fold: first, the goal is to assess the extent to which synthetic dialogues between a chatbot and a human align with the specified commands, determining the successful creation of these dialogues based on the provided specifications; and second, to evaluate various aspects of emotional and subjective responses. Assuming that dialogues to be evaluated were synthetically produced from specific profiles, the first evaluation stage utilizes LLMs to reconstruct the original templates employed in dialogue creation. The success of this reconstruction is then assessed in a second stage using lexical and semantic objective metrics. On the other hand, crafting a chatbot’s behaviors demands careful consideration to encompass a diverse range of interactions it is meant to engage in. Synthetic dialogues play a pivotal role in this context, as they can be deliberately synthesized to emulate various behaviors. This is precisely the objective of the third stage: evaluating whether the generated dialogues adhere to the required aspects concerning emotional and subjective responses. To validate the capabilities of the proposed framework, we applied it to recognize whether the chatbot exhibited one of two distinct behaviors in the synthetically generated dialogues: being emotional and providing subjective responses, or remaining neutral. This evaluation will encompass traditional metrics and automatic metrics generated by the LLM. In our use case of art-related dialogues, our findings reveal that the capacity to recover templates or profiles is more effective for information or profile items that are objective and factual, in contrast to those related to mental states or subjective facts. For the emotional and subjective behavior assessment, rule-based metrics achieved a 79% of accuracy in detecting emotions or subjectivity (anthropic), and an 82% on the LLM automatic metrics. The combination of these metrics and stages could help to decide which of the generated dialogues should be maintained depending on the applied policy, which could vary from preserving between 57% to 93% of the initial

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Horizonte 2020

101071191

Sin especificar

Horizonte 2020

PID2020-118112RB-C22

Sin especificar

Gobierno de España

PID2020-118112RB-C21

Sin especificar

Gobierno de España

PID2021-126061OB-C43

Sin especificar

Gobierno de España

PDC2021-120846-C42

Sin especificar

Más información

ID de Registro:	82496
Identificador DC:	https://oa.upm.es/82496/
Identificador OAI:	oai:oa.upm.es:82496
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10236157
Identificador DOI:	10.1016/j.eswa.2024.124524
URL Oficial:	https://www.sciencedirect.com/science/article/pii/...
Depositado por:	Dr. Manuel Gil-Martín
Depositado el:	10 Jul 2024 08:09
Ultima Modificación:	12 Mar 2025 18:49

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional