MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information

Lana Serrano, Sara and Villena Román, Julio and Goñi Menoyo, José Miguel (2007). MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information. In: "8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007", 19/09/2007-21/09/2007, Budapest, Hungria.

Description

Title: MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information
Author/s:
  • Lana Serrano, Sara
  • Villena Román, Julio
  • Goñi Menoyo, José Miguel
Item Type: Presentation at Congress or Conference (Article)
Event Title: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007
Event Dates: 19/09/2007-21/09/2007
Event Location: Budapest, Hungria
Title of Book: Working Notes for the CLEF 2007 Workshop
Date: 2007
Subjects:
Freetext Keywords: Linguistic Engineering, classification, geographical IR, geographic entity recognition, gazetteer, semantic expansion, Wordnet.
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Matemática Aplicada a las Tecnologías de la Información [hasta 2014]
UPM's Research Group: Grupo de Sistemas Inteligentes
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (108kB) | Preview

Abstract

This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives.

More information

Item ID: 4684
DC Identifier: http://oa.upm.es/4684/
OAI Identifier: oai:oa.upm.es:4684
Official URL: http://ims-sites.dei.unipd.it/documents/71612/86368/CLEF2007wn-GeoCLEF-LanaSerranoEt2007.pdf
Deposited by: Memoria Investigacion
Deposited on: 22 Oct 2010 10:12
Last Modified: 20 Apr 2016 13:48
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM