DAEDALUS at PAN 2014: Guessing tweet author's gender and age

Villena Román, Julio and González Cristóbal, José Carlos (2014). DAEDALUS at PAN 2014: Guessing tweet author's gender and age. In: "5th Conference and Labs of the Evaluation Forum (CLEF 2014) Information Access Evaluation meets Multilinguality, Multimodality, and Interaction", 15/09/2014 - 18/09/2014, Sheffield, UK. pp. 1157-1163.

Description

Title: DAEDALUS at PAN 2014: Guessing tweet author's gender and age
Author/s:
  • Villena Román, Julio
  • González Cristóbal, José Carlos
Item Type: Presentation at Congress or Conference (Article)
Event Title: 5th Conference and Labs of the Evaluation Forum (CLEF 2014) Information Access Evaluation meets Multilinguality, Multimodality, and Interaction
Event Dates: 15/09/2014 - 18/09/2014
Event Location: Sheffield, UK
Title of Book: 5th Conference and Labs of the Evaluation Forum (CLEF 2014) Information Access Evaluation meets Multilinguality, Multimodality, and Interaction
Date: 2014
Subjects:
Freetext Keywords: PAN, CLEF, author profiling, gender, age, user demographics, machine learning classifier, Naive Bayes Multinomial, term vector mode
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería de Sistemas Telemáticos [hasta 2014]
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (594kB) | Preview

Abstract

This paper describes our participation at PAN 2014 author profiling task. Our idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company. We were interested in finding not the best possible classifier that achieves the highest accuracy, but to find the optimum balance between performance and throughput using the most simple strategy and less dependent of external systems. Results show that our software using Naive Bayes Multinomial with a term vector model representation of the text is ranked quite well among the rest of participants in terms of accuracy.

More information

Item ID: 35363
DC Identifier: http://oa.upm.es/35363/
OAI Identifier: oai:oa.upm.es:35363
Deposited by: Memoria Investigacion
Deposited on: 27 May 2015 17:10
Last Modified: 27 May 2015 17:10
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM