A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping

Fernández Villamor, José Ignacio and Blasco Garcia, Jacobo and Iglesias Fernandez, Carlos Angel and Garijo Ayestaran, Mercedes (2011). A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping. In: "ICAART 2011 3rd International Conference on Agents and Artificial Intelligence", 28/01/2011 - 30/01/2011, Roma, Italia. pp. 451-456.

Description

Title: A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping
Author/s:
  • Fernández Villamor, José Ignacio
  • Blasco Garcia, Jacobo
  • Iglesias Fernandez, Carlos Angel
  • Garijo Ayestaran, Mercedes
Item Type: Presentation at Congress or Conference (Article)
Event Title: ICAART 2011 3rd International Conference on Agents and Artificial Intelligence
Event Dates: 28/01/2011 - 30/01/2011
Event Location: Roma, Italia
Title of Book: Proceedings of ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence
Date: 2011
Subjects:
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería de Sistemas Telemáticos [hasta 2014]
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (318kB) | Preview

Abstract

In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies

More information

Item ID: 13159
DC Identifier: http://oa.upm.es/13159/
OAI Identifier: oai:oa.upm.es:13159
Official URL: http://www.icaart.org/ICAART2011/
Deposited by: Memoria Investigacion
Deposited on: 29 Nov 2012 11:41
Last Modified: 21 Apr 2016 12:28
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM