Fuzzy Semantic Labeling of Semi-structured Numerical Datasets

Alobaid, Ahmad and Corcho, Oscar (2018). Fuzzy Semantic Labeling of Semi-structured Numerical Datasets. In: "21st International Conference on Knowledge Engineering and Knowledge Management", 12-16 Nov 2018, Nancy, France. ISBN 978-3-030-03667-6. pp. 19-33. https://doi.org/10.1007/978-3-030-03667-6_2.

Description

Title: Fuzzy Semantic Labeling of Semi-structured Numerical Datasets
Author/s:
  • Alobaid, Ahmad
  • Corcho, Oscar
Item Type: Presentation at Congress or Conference (Article)
Event Title: 21st International Conference on Knowledge Engineering and Knowledge Management
Event Dates: 12-16 Nov 2018
Event Location: Nancy, France
Title of Book: Knowledge Engineering and Knowledge Management
Date: 2018
ISBN: 978-3-030-03667-6
Volume: 11313
Subjects:
Freetext Keywords: Fuzzy clustering Semantic labeling Semantic web
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
UPM's Research Group: Ontology Engineering Group OEG
Creative Commons Licenses: None

Available versions for this object

This is the latest version for this electronic publication.

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (286kB) | Preview

Abstract

SPARQL endpoints provide access to rich sources of data (e.g. knowledge graphs), which can be used to classify other less structured datasets (e.g. CSV files or HTML tables on the Web). We propose an approach to suggest types for the numerical columns of a collection of input files available as CSVs. Our approach is based on the application of the fuzzy c-means clustering technique to numerical data in the input files, using existing SPARQL endpoints to generate training datasets. Our approach has three major advantages: it works directly with live knowledge graphs, it does not require knowledge-graph profiling beforehand, and it avoids tedious and costly manual training to match values with types. We evaluate our approach against manually annotated datasets. The results show that the proposed approach classifies most of the types correctly for our test sets.

More information

Item ID: 56289
DC Identifier: http://oa.upm.es/56289/
OAI Identifier: oai:oa.upm.es:56289
DOI: 10.1007/978-3-030-03667-6_2
Official URL: https://link.springer.com/chapter/10.1007/978-3-030-03667-6_2
Deposited by: Ahmad Alobaid
Deposited on: 05 Sep 2019 08:26
Last Modified: 12 Sep 2019 09:12
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM