Legal entity extraction with NER Systems

Badji, Inés (2018). Legal entity extraction with NER Systems. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Legal entity extraction with NER Systems
Author/s:
  • Badji, Inés
Contributor/s:
  • Corcho, Oscar
  • Rodríguez Doncel, Víctor
Item Type: Thesis (Master thesis)
Masters title: Inteligencia Artificial
Date: June 2018
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

Named Entity Recognition over texts belonging to the legal domain focuses on cat- egories (legal entities) like references to specific laws, judgments, name of courts or stages in a legal process. Although there is a rich choice of libraries for implementing NER systems, these late ones are not domain specific and do not work well on text pertaining to the Legal domain. Similarly, little focus is given to Spanish since most research is done on the English language. The objective of the work presented in this thesis is the identification of legal entities in Spanish and English texts, with a main focus on informal references to legislative documents found in news, Twitter, contracts or journal articles. The work is framed in the H2020 Lynx project, aimed at creating a Legal Knowledge Graph enabling the provision of compliance-related services. A Rule Based approach can be used to recognize references to norms in Spanish and English documents belonging to the legal domain applied on top of a combination of Natural Language Processing Tools. To recognize the mentions in documents of a less formal nature, a number of vulgar variants for the names of the public acts or judgments is necessary. By querying on Wikidata, DBpedia and BOE a table of synonyms is produced. These resources have been published along with a small annotated data set taken as gold standard.

More information

Item ID: 51740
DC Identifier: http://oa.upm.es/51740/
OAI Identifier: oai:oa.upm.es:51740
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 25 Jul 2018 13:00
Last Modified: 25 Jul 2018 13:00
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM