eprintid: 13159 rev_number: 14 eprint_status: archive userid: 1903 dir: disk0/00/01/31/59 datestamp: 2012-11-29 11:41:50 lastmod: 2016-04-21 12:28:21 status_changed: 2012-11-29 11:41:50 type: conference_item metadata_visibility: show item_issues_count: 0 creators_name: Fernández Villamor, José Ignacio creators_name: Blasco Garcia, Jacobo creators_name: Iglesias Fernandez, Carlos Angel creators_name: Garijo Ayestaran, Mercedes title: A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping rights: by-nc-nd ispublished: unpub subjects: informatica full_text_status: public pres_type: paper abstract: In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies date_type: published date: 2011 pagerange: 451-456 event_title: ICAART 2011 3rd International Conference on Agents and Artificial Intelligence event_location: Roma, Italia event_dates: 28/01/2011 - 30/01/2011 event_type: conference institution: Telecomunicacion department: Ingenieria_Sistemas refereed: TRUE book_title: Proceedings of ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence official_url: http://www.icaart.org/ICAART2011/ citation: Fernández Villamor, José Ignacio and Blasco Garcia, Jacobo and Iglesias Fernandez, Carlos Angel and Garijo Ayestaran, Mercedes (2011). A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping. In: "ICAART 2011 3rd International Conference on Agents and Artificial Intelligence", 28/01/2011 - 30/01/2011, Roma, Italia. pp. 451-456. document_url: https://oa.upm.es/13159/1/INVE_MEM_2011_109693.pdf