eprintid: 55806 rev_number: 14 eprint_status: archive userid: 2101 dir: disk0/00/05/58/06 datestamp: 2019-07-15 07:50:14 lastmod: 2019-07-15 07:51:30 status_changed: 2019-07-15 07:51:30 type: other metadata_visibility: show creators_name: López García, Luis Cristobal contributors_name: Iglesias Fernández, Carlos Ángel title: Development of an event detector in Twitter streams based on mention-anomaly detection for the city of Madrid rights: by-nc-nd ispublished: unpub subjects: telecomunicaciones full_text_status: public keywords: event, detection, detector, cluster, clustering, mentions, redundancy, filter, visualization, Machine Learning, Python, Twitter abstract: Event detection has been a field of research long before social networks reached the high impact they have nowadays. Events were tracked from traditional news web sites, blogs or other information channels. However when microblogging as a form of social media emerged all this landscape changed. In this project we have developed a system capable of detecting the most important events occurred in a city by analyzing data published on social networks. For this, we have adapted and improved an already existing clustering approach named MABED, which relies on the number of interactions between users to measure the impact. Our main contributions to this model has been to improve that impact algorithm accuracy and to provide a new definition of redundancy leading to a better performance on duplicated events. The social network our detector reads is Twitter, considered a valuable source of what is known as Social Data. Information is provided by short length documents posted by users, called tweets. These publications are collected from our Streamer, gathering posts that have just been published in the city of Madrid. In addition to the cluster we have also developed an architecture that turns our project into a system. Streamer is in charge of collecting the data that we feed to our detector. However it first needs to pass through a preprocessing module which filters spam out and lemmatizes the text in order to achieve a better performance. Once the detection task is finished results are saved in a persistence subsystem. These results are finally visualized in a dashboard which interacts with the user and facilitates the cognitive process of the performed analysis. All this data ow is supervised by an orchestrator which assures the correct interaction between modules. The process we have just explained is repeated periodically every half an hour showing top three events with the higher impact that took place in the city of Madrid in the last 24 hours. date_type: completed date: 2019 place_of_pub: Madrid institution: Telecomunicacion department: Ingenieria_Sistemas refereed: TRUE grado: Grado en Ingeniería de Tecnologías y Servicios de Telecomunicación geolocation_latitudenorth: 40.4167047 geolocation_longitudeeast: -3.7035825 geolocation_name: Madrid, Comunidad de Madrid, España citation: López García, Luis Cristobal (2019). Development of an event detector in Twitter streams based on mention-anomaly detection for the city of Madrid. Proyecto Fin de Carrera / Trabajo Fin de Grado, E.T.S.I. Telecomunicación (UPM) , Madrid. document_url: https://oa.upm.es/55806/1/PFC_LUIS_LOPEZ_GARCIA_2019.pdf