Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Badji, Inés (2018). Legal entity extraction with NER Systems. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).
Title: | Legal entity extraction with NER Systems |
---|---|
Author/s: |
|
Contributor/s: |
|
Item Type: | Thesis (Master thesis) |
Masters title: | Inteligencia Artificial |
Date: | June 2018 |
Subjects: | |
Faculty: | E.T.S. de Ingenieros Informáticos (UPM) |
Department: | Inteligencia Artificial |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview |
Named Entity Recognition over texts belonging to the legal domain focuses on cat-
egories (legal entities) like references to specific laws, judgments, name of courts or
stages in a legal process. Although there is a rich choice of libraries for implementing
NER systems, these late ones are not domain specific and do not work well on text
pertaining to the Legal domain. Similarly, little focus is given to Spanish since most
research is done on the English language.
The objective of the work presented in this thesis is the identification of legal
entities in Spanish and English texts, with a main focus on informal references to
legislative documents found in news, Twitter, contracts or journal articles. The
work is framed in the H2020 Lynx project, aimed at creating a Legal Knowledge
Graph enabling the provision of compliance-related services.
A Rule Based approach can be used to recognize references to norms in Spanish and
English documents belonging to the legal domain applied on top of a combination
of Natural Language Processing Tools. To recognize the mentions in documents of
a less formal nature, a number of vulgar variants for the names of the public acts
or judgments is necessary. By querying on Wikidata, DBpedia and BOE a table
of synonyms is produced. These resources have been published along with a small
annotated data set taken as gold standard.
Item ID: | 51740 |
---|---|
DC Identifier: | https://oa.upm.es/51740/ |
OAI Identifier: | oai:oa.upm.es:51740 |
Deposited by: | Biblioteca Facultad de Informatica |
Deposited on: | 25 Jul 2018 13:00 |
Last Modified: | 25 Jul 2018 13:00 |