Design and implementation of a Scraping system for Sport News

Ochoa Serna, Javier (2017). Design and implementation of a Scraping system for Sport News. Proyecto Fin de Carrera / Trabajo Fin de Grado, E.T.S.I. Telecomunicación (UPM), Madrid.

Description

Title: Design and implementation of a Scraping system for Sport News
Author/s:
  • Ochoa Serna, Javier
Contributor/s:
  • Iglesias Fernández, Carlos Ángel
Item Type: Final Project
Degree: Grado en Ingeniería de Tecnologías y Servicios de Telecomunicación
Date: January 2017
Subjects:
Freetext Keywords: Sefarad, Scrapy, Senpy, Elasticsearch, Luigi, Polymer, Web pScraping, Sport, News, Football
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería de Sistemas Telemáticos [hasta 2014]
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (4MB) | Preview

Abstract

Nowadays, the way people read print media has changed. The sale of newspapers in their printed format has fallen by around 60% in the last decade. This has occurred as a result of the invention of the internet and the capability of accessing content from thousands of sites through the World Wide Web. Based on this information we have completed this project of design and implementation of a system of extracting news on the Web for further analysis. The thematic of the project could be diverse, but it has been decided to focus on sports news, in particular news about the most popular football teams in Spain. The project is divided into several phases related to the processing of information. For the first three phases it has been used a workflow between them using Luigi. First of those phases is the extraction of data from the Web and to implement it several web spiders have been developed using Scrapy. Subsequently, this information has been analyzed using Senpy to extract feelings and emotions in each article. The last step in the work ow is to store obtained data in Elasticsearch. The last phase of the project has been to create a graphical interface in order to display obtained data in the analysis and being able to compare them. To do this, Polymer Web Components has been used in conjunction with D3.js library. Using these technologies, dfferent widgets have been created in order to compare results between dfferent teams and newspapers. This project will allow the user to perform a complete analysis of the dfferent newspapers along with the diferent football teams thanks to the knowledge of emotions and feelings that are generated in the written press over time.

More information

Item ID: 44707
DC Identifier: http://oa.upm.es/44707/
OAI Identifier: oai:oa.upm.es:44707
Deposited by: Biblioteca ETSI Telecomunicación
Deposited on: 16 Feb 2017 15:09
Last Modified: 16 Feb 2017 15:12
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM