MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information

Lana Serrano, Sara; Villena Román, Julio y Goñi Menoyo, José Miguel (2007). MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information. En: "8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007", 19/09/2007-21/09/2007, Budapest, Hungria.

Descripción

Título: MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information
Autor/es:
  • Lana Serrano, Sara
  • Villena Román, Julio
  • Goñi Menoyo, José Miguel
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007
Fechas del Evento: 19/09/2007-21/09/2007
Lugar del Evento: Budapest, Hungria
Título del Libro: Working Notes for the CLEF 2007 Workshop
Fecha: 2007
Materias:
Palabras Clave Informales: Linguistic Engineering, classification, geographical IR, geographic entity recognition, gazetteer, semantic expansion, Wordnet.
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Matemática Aplicada a las Tecnologías de la Información [hasta 2014]
Grupo Investigación UPM: Grupo de Sistemas Inteligentes
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (108kB) | Vista Previa

Resumen

This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives.

Más información

ID de Registro: 4684
Identificador DC: http://oa.upm.es/4684/
Identificador OAI: oai:oa.upm.es:4684
URL Oficial: http://ims-sites.dei.unipd.it/documents/71612/86368/CLEF2007wn-GeoCLEF-LanaSerranoEt2007.pdf
Depositado por: Memoria Investigacion
Depositado el: 22 Oct 2010 10:12
Ultima Modificación: 20 Abr 2016 13:48
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM