A Focused Crawler in order to Get Semantic Web Resources (CSR)

Barbosa Santillán, Liliana Ibeth; Campos Quirarte, Juana Elizabeth y Castro Munguía, Aldo (2013). A Focused Crawler in order to Get Semantic Web Resources (CSR). En: "Workshop on Semantic Web, ENC 2013", 30 Oct - 1 Nov 2013, Michoacán, Méjico.. ISBN 978-607-9343-23-1. pp. 114-120.

Descripción

Título: A Focused Crawler in order to Get Semantic Web Resources (CSR)
Autor/es:
  • Barbosa Santillán, Liliana Ibeth
  • Campos Quirarte, Juana Elizabeth
  • Castro Munguía, Aldo
Tipo de Documento: Ponencia en Congreso o Jornada (Otro)
Título del Evento: Workshop on Semantic Web, ENC 2013
Fechas del Evento: 30 Oct - 1 Nov 2013
Lugar del Evento: Michoacán, Méjico.
Título del Libro: Workshops Proceedings in the Mexican International Conference on Computer Science (ENC 2013)
Fecha: 2013
ISBN: 978-607-9343-23-1
Materias:
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (368kB) | Vista Previa

Resumen

This paper presents a Focused Crawler in order to Get Semantic Web Resources (CSR). Structured data web are available in formats such as Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology Web Language (OWL) that can be used for processing. One of the main challenges for performing a manual search and download semantic web resources is that this task consumes a lot of time. Our research work propose a focused crawler which allow to download these resources automatically and store them on disk in order to have a collection that will be used for data processing. CRS consists of three layers: (a) The User Interface Layer, (b) The Focus Crawler Layer and (c) The Base Crawler Layer. CSR uses as a selection policie the Shark-Search method. CSR was conducted with two experiments. The first one starts on December 15 2012 at 7:11 am and ends on December 16 2012 at 4:01 were obtained 448,123,537 bytes of data. The CSR ends by itself after to analyze 80,4375 seeds with an unlimited depth. CSR got 16,576 semantic resources files where the 89 % was RDF, the 10 % was XML and the 1% was OWL. The second one was based on the Web Data Commons work of the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. This began at 4:46 am of June 2 2013 and 1:37 am June 9 2013. After 162.51 hours of execution the result was 285,279 semantic resources where predominated the XML resources with 99 % and OWL and RDF with 1 % each one.

Más información

ID de Registro: 36867
Identificador DC: http://oa.upm.es/36867/
Identificador OAI: oai:oa.upm.es:36867
URL Oficial: http://computo.fismat.umich.mx/enc2013/new/index.php/accepted-papers/accepted-papers-workshop-web
Depositado por: Memoria Investigacion
Depositado el: 10 Sep 2015 09:52
Ultima Modificación: 10 Sep 2015 09:52
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM