A real-time retail analytics pipeline

El Abbassi, Widad (2020). A real-time retail analytics pipeline. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).


Title: A real-time retail analytics pipeline
  • El Abbassi, Widad
  • Patiño-Martínez, Marta
Item Type: Thesis (Master thesis)
Masters title: Ciencia de Datos
Date: July 2020
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Lenguajes y Sistemas Informáticos e Ingeniería del Software
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB) | Preview


Stream processing technologies are becoming more and more popular within the retail industry whether we are talking about the physical stores or the ecommerce. Retailers and mall managers are becoming extremely competitive among them to provide the best solutions that meet customer expectations. This can be achieved by studying the customer shopping behavior inside the shopping center and stores, which will provide us with information about the pattern of shopping activities and movements of consumers, we can track what are the most common used routes inside the mall, when we can have a peak occupancy and what kind of brands or products attracts one more than another (determine shopper profile). This gathered data will then allow us to make more affective decisions, about the store layout, product positioning, marketing, controlling the traffic jam and more. The purpose of this project is to examine the consumer behavior inside shopping centers and stores. In particular, we want to generate insights regarding two types of analytics: Mall Analytics, by measuring the foot traffic, In-mall proximity traffic and location marketing with the intention to help mall managers improve security and advertising. Then In-Store Analytics, to help retailers define the underperforming product categories, compare sales potential as well as improve the inventory management. Although, the real challenge doesn’t only lie in storing and managing this huge amount of data but also in accessing the results and providing reports in real time. With this in mind our work propose a real-time data processing architecture able to ingest, analyze and generate visualization reports almost immediately. In detail, the first component in the proposed pipeline is Kafka connect framework, it will be responsible for generating continuous flows of sensors and POS (point of sale) data that will be sent after to the second component, Apache Kafka, a distributed messaging system that will store those incoming messages into multiple Kafka topics (for instance: sensor1 in zone1 area1 inside the mall, will be stored in a particular topic1). The third component in this architecture will be the processing unit , Apache Flink, a streaming dataflow engine and scalable data analytics framework that deliver data analytics in real time, one of its most interesting features is the usage of event timestamp to build time windows for computations, in this section, several Flink queries will be developed to measure ii the pre-defined metrics (Mall Foot Traffic, location Marketing…).The fourth component will be a real-time search and analytic engine, Elasticseach, in which the results of the previous queries will be stored in indexes and then used by the final component, Kibana, a powerful visualization tool to deliver insights and dynamic visualization reports. Our work consists of implementing this streaming analytics pipeline that will help mall managers and retailers investigate the whole shopping process, thus, design more effective development plans and marketing strategies.

More information

Item ID: 63622
DC Identifier: https://oa.upm.es/63622/
OAI Identifier: oai:oa.upm.es:63622
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 07 Sep 2020 10:24
Last Modified: 07 Sep 2020 10:24
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM