Legal entity extraction with NER Systems

Badji, Inés (2018). Legal entity extraction with NER Systems. Tesis (Master), E.T.S. de Ingenieros Informáticos (UPM).


Título: Legal entity extraction with NER Systems
  • Badji, Inés
  • Corcho, Oscar
  • Rodríguez Doncel, Víctor
Tipo de Documento: Tesis (Master)
Título del máster: Inteligencia Artificial
Fecha: Junio 2018
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) | Vista Previa


Named Entity Recognition over texts belonging to the legal domain focuses on cat- egories (legal entities) like references to specific laws, judgments, name of courts or stages in a legal process. Although there is a rich choice of libraries for implementing NER systems, these late ones are not domain specific and do not work well on text pertaining to the Legal domain. Similarly, little focus is given to Spanish since most research is done on the English language. The objective of the work presented in this thesis is the identification of legal entities in Spanish and English texts, with a main focus on informal references to legislative documents found in news, Twitter, contracts or journal articles. The work is framed in the H2020 Lynx project, aimed at creating a Legal Knowledge Graph enabling the provision of compliance-related services. A Rule Based approach can be used to recognize references to norms in Spanish and English documents belonging to the legal domain applied on top of a combination of Natural Language Processing Tools. To recognize the mentions in documents of a less formal nature, a number of vulgar variants for the names of the public acts or judgments is necessary. By querying on Wikidata, DBpedia and BOE a table of synonyms is produced. These resources have been published along with a small annotated data set taken as gold standard.

Más información

ID de Registro: 51740
Identificador DC:
Identificador OAI:
Depositado por: Biblioteca Facultad de Informatica
Depositado el: 25 Jul 2018 13:00
Ultima Modificación: 25 Jul 2018 13:00
  • InvestigaM
  • GEO_UP4
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM