Robust Speech Detection for Noisy Environments

Varela Serrano, Oscar and San Segundo Hernández, Rubén and Hernández, Luis A. (2011). Robust Speech Detection for Noisy Environments. "IEEE Aerospace and Electronic Systems Magazine", v. 26 (n. 11); pp. 16-23. ISSN 0885-8985.


Title: Robust Speech Detection for Noisy Environments
  • Varela Serrano, Oscar
  • San Segundo Hernández, Rubén
  • Hernández, Luis A.
Item Type: Article
Título de Revista/Publicación: IEEE Aerospace and Electronic Systems Magazine
Date: September 2011
ISSN: 0885-8985
Volume: 26
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2011_87406.pdf]
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (278kB) | Preview


This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM) to improve speech recognition systems in stationary and non-stationary noise environments: inside motor vehicles (like cars or planes) or inside buildings close to high traffic places (like in a control tower for air traffic control (ATC)). In these environments, there is a high stationary noise level caused by vehicle motors and additionally, there could be people speaking at certain distance from the main speaker producing non-stationary noise. The VAD presented in this paper is characterized by a new front-end and a noise level adaptation process that increases significantly the VAD robustness for different signal to noise ratios (SNRs). The feature vector used by the VAD includes the most relevant Mel Frequency Cepstral Coefficients (MFCC), normalized log energy and delta log energy. The proposed VAD has been evaluated and compared to other well-known VADs using three databases containing different noise conditions: speech in clean environments (SNRs mayor que 20 dB), speech recorded in stationary noise environments (inside or close to motor vehicles), and finally, speech in non stationary environments (including noise from bars, television and far-field speakers). In the three cases, the detection error obtained with the proposed VAD is the lowest for all SNRs compared to Acero¿s VAD (reference of this work) and other well-known VADs like AMR, AURORA or G729 annex b.

More information

Item ID: 8864
DC Identifier:
OAI Identifier:
DOI: 10.1109/MAES.2011.6070277
Official URL:
Deposited by: Memoria Investigacion
Deposited on: 17 Nov 2011 09:17
Last Modified: 20 Apr 2016 17:30
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM