Legal entity extraction with NER Systems

Badji, Inés (2018). Legal entity extraction with NER Systems. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Legal entity extraction with NER Systems
Author/s:
  • Badji, Inés
Contributor/s:
Item Type: Thesis (Master thesis)
Masters title: Inteligencia Artificial
Date: June 2018
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of TFM_INES_BADJI.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

Named Entity Recognition over texts belonging to the legal domain focuses on cat-
egories (legal entities) like references to specific laws, judgments, name of courts or
stages in a legal process. Although there is a rich choice of libraries for implementing
NER systems, these late ones are not domain specific and do not work well on text
pertaining to the Legal domain. Similarly, little focus is given to Spanish since most
research is done on the English language.
The objective of the work presented in this thesis is the identification of legal
entities in Spanish and English texts, with a main focus on informal references to
legislative documents found in news, Twitter, contracts or journal articles. The
work is framed in the H2020 Lynx project, aimed at creating a Legal Knowledge
Graph enabling the provision of compliance-related services.
A Rule Based approach can be used to recognize references to norms in Spanish and
English documents belonging to the legal domain applied on top of a combination
of Natural Language Processing Tools. To recognize the mentions in documents of
a less formal nature, a number of vulgar variants for the names of the public acts
or judgments is necessary. By querying on Wikidata, DBpedia and BOE a table
of synonyms is produced. These resources have been published along with a small
annotated data set taken as gold standard.

More information

Item ID: 51740
DC Identifier: https://oa.upm.es/51740/
OAI Identifier: oai:oa.upm.es:51740
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 25 Jul 2018 13:00
Last Modified: 25 Jul 2018 13:00
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM