Defying Wikidata: validation of terminological relations in the web of data

Martín Chozas, Patricia and Ahmadi, Sina and Montiel-Ponsoda, Elena (2020). Defying Wikidata: validation of terminological relations in the web of data. In: "12th Conference on Language Resources and Evaluation (LREC 2020)", 11-16 May 2020, Marseille, Francia. ISBN 979-10-95546-34-4. pp. 5654-5659.

Description

Title: Defying Wikidata: validation of terminological relations in the web of data
Author/s:
  • Martín Chozas, Patricia
  • Ahmadi, Sina
  • Montiel-Ponsoda, Elena
Item Type: Presentation at Congress or Conference (Article)
Event Title: 12th Conference on Language Resources and Evaluation (LREC 2020)
Event Dates: 11-16 May 2020
Event Location: Marseille, Francia
Title of Book: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020)
Date: 2020
ISBN: 979-10-95546-34-4
Subjects:
Freetext Keywords: Linguistic linked open data, Knowledge representation, Terminological relations, Resource population, Multilingualism
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Lingüistica Aplicada a la Ciencia y a la Tecnología
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (342kB) | Preview

Abstract

In this paper we present an approach to validate terminological data retrieved from open encyclopaedic knowledge bases. This need arises from the enrichment of automatically extracted terms with information from existing resources in the Linguistic Linked Open Data cloud. Specifically, the resource employed for this enrichment is WIKIDATA, since it is one of the biggest knowledge bases freely available within the Semantic Web. During the experiment, we noticed that certain RDF properties in the Knowledge Base did not contain the data they are intended to represent, but a different type of information. In this paper we propose an approach to validate the retrieved data based on four axioms that rely on two linguistic theories: the x-bar theory and the multidimensional theory of terminology. The validation process is supported by a second knowledge base specialised in linguistic data; in this case, CONCEPTNET. In our experiment, we validate terms from the legal domain in four languages: Dutch, English, German and Spanish. The final aim is to generate a set of sound and reliable terminological resources in RDF to contribute to the population of the Linguistic Linked Open Data cloud.

Funding Projects

TypeCodeAcronymLeaderTitle
Horizon 2020780602LynxUniversidad Politécnica de MadridBuilding the legal knowledge graph for smart compliance services in multilingual Europe
Horizon 2020825182Pret-a-LLODNational University of Ireland GalwayReady-to-use multilingual linked language data for knowledge services across sectors
Horizon 2020731015ELEXISInstitut Jozef StefanEuropean Lexicographic Infrastructure

More information

Item ID: 67255
DC Identifier: http://oa.upm.es/67255/
OAI Identifier: oai:oa.upm.es:67255
Official URL: https://www.aclweb.org/anthology/2020.lrec-1.694
Deposited by: Memoria Investigacion
Deposited on: 26 May 2021 09:10
Last Modified: 26 May 2021 09:10
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM