Design and implementation of a Scraping system for Sport News

Ochoa Serna, Javier (2017). Design and implementation of a Scraping system for Sport News. Proyecto Fin de Carrera / Trabajo Fin de Grado, E.T.S.I. Telecomunicación (UPM), Madrid.

Descripción

Título: Design and implementation of a Scraping system for Sport News
Autor/es:
  • Ochoa Serna, Javier
Director/es:
  • Iglesias Fernández, Carlos Ángel
Tipo de Documento: Proyecto Fin de Carrera/Grado
Grado: Grado en Ingeniería de Tecnologías y Servicios de Telecomunicación
Fecha: Enero 2017
Materias:
Palabras Clave Informales: Sefarad, Scrapy, Senpy, Elasticsearch, Luigi, Polymer, Web pScraping, Sport, News, Football
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería de Sistemas Telemáticos [hasta 2014]
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (4MB) | Vista Previa

Resumen

Nowadays, the way people read print media has changed. The sale of newspapers in their printed format has fallen by around 60% in the last decade. This has occurred as a result of the invention of the internet and the capability of accessing content from thousands of sites through the World Wide Web. Based on this information we have completed this project of design and implementation of a system of extracting news on the Web for further analysis. The thematic of the project could be diverse, but it has been decided to focus on sports news, in particular news about the most popular football teams in Spain. The project is divided into several phases related to the processing of information. For the first three phases it has been used a workflow between them using Luigi. First of those phases is the extraction of data from the Web and to implement it several web spiders have been developed using Scrapy. Subsequently, this information has been analyzed using Senpy to extract feelings and emotions in each article. The last step in the work ow is to store obtained data in Elasticsearch. The last phase of the project has been to create a graphical interface in order to display obtained data in the analysis and being able to compare them. To do this, Polymer Web Components has been used in conjunction with D3.js library. Using these technologies, dfferent widgets have been created in order to compare results between dfferent teams and newspapers. This project will allow the user to perform a complete analysis of the dfferent newspapers along with the diferent football teams thanks to the knowledge of emotions and feelings that are generated in the written press over time.

Más información

ID de Registro: 44707
Identificador DC: http://oa.upm.es/44707/
Identificador OAI: oai:oa.upm.es:44707
Depositado por: Biblioteca ETSI Telecomunicación
Depositado el: 16 Feb 2017 15:09
Ultima Modificación: 16 Feb 2017 15:12
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM