Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (108kB) | Preview |
Lana Serrano, Sara, Villena Román, Julio and Goñi Menoyo, José Miguel (2007). MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information. In: "8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007", 19/09/2007-21/09/2007, Budapest, Hungria.
Title: | MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007 |
Event Dates: | 19/09/2007-21/09/2007 |
Event Location: | Budapest, Hungria |
Title of Book: | Working Notes for the CLEF 2007 Workshop |
Date: | 2007 |
Subjects: | |
Freetext Keywords: | Linguistic Engineering, classification, geographical IR, geographic entity recognition, gazetteer, semantic expansion, Wordnet. |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Matemática Aplicada a las Tecnologías de la Información [hasta 2014] |
UPM's Research Group: | Grupo de Sistemas Inteligentes |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (108kB) | Preview |
This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives.
Item ID: | 4684 |
---|---|
DC Identifier: | https://oa.upm.es/4684/ |
OAI Identifier: | oai:oa.upm.es:4684 |
Official URL: | http://ims-sites.dei.unipd.it/documents/71612/8636... |
Deposited by: | Memoria Investigacion |
Deposited on: | 22 Oct 2010 10:12 |
Last Modified: | 20 Apr 2016 13:48 |