Measuring and improving the energy efficiency of large language models inference

Argerich, Mauricio Fadel

and Patiño Martínez, Marta

(2024). Measuring and improving the energy efficiency of large language models inference. "IEEE Access", v. 12 ; pp. 80194-80207. ISSN 2169-3536. https://doi.org/10.1109/ACCESS.2024.3409745.

Descripción

Título:	Measuring and improving the energy efficiency of large language models inference
Autor/es:	Argerich, Mauricio Fadel https://orcid.org/0009-0008-9348-8426 Patiño Martínez, Marta https://orcid.org/0000-0003-2997-3722
Tipo de Documento:	Artículo
Título de Revista/Publicación:	IEEE Access
Fecha:	5 Junio 2024
ISSN:	2169-3536
Volumen:	12
Materias:	Informática
Palabras Clave Informales:	Computational modeling, Deep learning, Energy consumption, Energy efficiency, Energy measurement, Graphics processing units, Large language model, Large language models, Machine learning, Software, Software measurement, Training
Escuela:	E.T.S. de Ingenieros Informáticos (UPM)
Departamento:	Lenguajes y Sistemas Informáticos e Ingeniería del Software
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (5MB)

Resumen

Recent improvements in the accuracy of machine learning (ML) models in the language domain have propelled their use in a multitude of products and services, touching millions of lives daily. These new levels of accuracy have been attained mainly through exponential growth in model size, creating a new category of models known as Large Language Models (LLMs) and leading to a substantial increase in computing and energy demands. While recent studies have focused on measuring and improving the energy consumption of LLMs during training, inference has received little attention. In this article, we present an approach to profile the energy consumption of LLMs during inference and leverage it to improve energy efficiency. For this, we deploy several state-of-the-art LLMs and observe how model size, number of layers, parallelized attention, and even vocabulary size affect their energy consumption. In addition, we leverage input batch size and different quantization levels to optimize their inference energy efficiency and latency.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Horizonte 2020

101004480

Sin especificar

Más información

ID de Registro:	86674
Identificador DC:	https://oa.upm.es/86674/
Identificador OAI:	oai:oa.upm.es:86674
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10226248
Identificador DOI:	10.1109/ACCESS.2024.3409745
URL Oficial:	https://ieeexplore.ieee.org/document/10549890
Depositado por:	iMarina Portal Científico
Depositado el:	23 Ene 2025 10:59
Ultima Modificación:	23 Ene 2025 10:59

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Measuring and improving the energy efficiency of large language models inference

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional