Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (5MB) |
ORCID: https://orcid.org/0009-0008-9348-8426 and Patiño Martínez, Marta
ORCID: https://orcid.org/0000-0003-2997-3722
(2024).
Measuring and improving the energy efficiency of large language models inference.
"IEEE Access", v. 12
;
pp. 80194-80207.
ISSN 2169-3536.
https://doi.org/10.1109/ACCESS.2024.3409745.
| Título: | Measuring and improving the energy efficiency of large language models inference |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | IEEE Access |
| Fecha: | 5 Junio 2024 |
| ISSN: | 2169-3536 |
| Volumen: | 12 |
| Materias: | |
| Palabras Clave Informales: | Computational modeling, Deep learning, Energy consumption, Energy efficiency, Energy measurement, Graphics processing units, Large language model, Large language models, Machine learning, Software, Software measurement, Training |
| Escuela: | E.T.S. de Ingenieros Informáticos (UPM) |
| Departamento: | Lenguajes y Sistemas Informáticos e Ingeniería del Software |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (5MB) |
Recent improvements in the accuracy of machine learning (ML) models in the language domain have propelled their use in a multitude of products and services, touching millions of lives daily. These new levels of accuracy have been attained mainly through exponential growth in model size, creating a new category of models known as Large Language Models (LLMs) and leading to a substantial increase in computing and energy demands. While recent studies have focused on measuring and improving the energy consumption of LLMs during training, inference has received little attention. In this article, we present an approach to profile the energy consumption of LLMs during inference and leverage it to improve energy efficiency. For this, we deploy several state-of-the-art LLMs and observe how model size, number of layers, parallelized attention, and even vocabulary size affect their energy consumption. In addition, we leverage input batch size and different quantization levels to optimize their inference energy efficiency and latency.
| ID de Registro: | 86674 |
|---|---|
| Identificador DC: | https://oa.upm.es/86674/ |
| Identificador OAI: | oai:oa.upm.es:86674 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10226248 |
| Identificador DOI: | 10.1109/ACCESS.2024.3409745 |
| URL Oficial: | https://ieeexplore.ieee.org/document/10549890 |
| Depositado por: | iMarina Portal Científico |
| Depositado el: | 23 Ene 2025 10:59 |
| Ultima Modificación: | 23 Ene 2025 10:59 |
Publicar en el Archivo Digital desde el Portal Científico