A Review on Speaker Diarization for Whispered Speech Audio

Authors

  • Mr. Chaitanya Pampana Research Scholar, Gandhi Institute of Engineering and Technology, Gunupur, Odisha, India. Author
  • Dr. M. Vijay Reddy Professor, Gandhi Institute of Engineering and Technology, Gunupur, Odisha, India Author
  • Dr. K. Jhansi Rani Assistant Professor, Jawaharlal Nehru Technological University, Kakinada, AP, India. Author

DOI:

https://doi.org/10.47392/IRJAEM.2025.0279

Keywords:

Speaker diarization, feature extraction, Voice activity detection, Deep neural network, Speaker clustering, Diarization Error Rate

Abstract

Speaker diarization, the process of partitioning an audio stream into segments according to the speaker identity, is crucial for various applications in speech processing and analysis. Whispered speech, characterized by its low amplitude and altered spectral properties, presents unique challenges for conventional diarization algorithms designed for clear, normal speech. In this study, I propose a novel approach for supervised speaker diarization specifically tailored to whispered speech audio streams. Supervised learning techniques, utilizing annotated data to train models capable of accurately distinguishing between speakers in whispered speech recordings. The design incorporates extraction techniques that effectively capture the faint spectral cues present in whispered speech, hence augmenting the diarization system's discriminative ability. Furthermore, I investigate the combination of acoustic modeling and domain-specific knowledge to enhance diarization performance in whispered speech scenarios. The suggested strategy on a variety of whispered voice datasets, contrasting its effectiveness with cutting-edge diarization techniques. The precision with which whispered speech can be divided into speaker-specific intervals using a supervised technique. Analyze the effects of various variables on diarization performance, including feature representations and dataset properties. The findings of this research contribute to advancing speaker diarization technology, particularly in challenging acoustic environments characterized by whispered speech. The proposed supervised approach holds promise for practical applications in surveillance, forensic analysis, and human-computer interaction, where accurate speaker segmentation in whispered speech recordings is essential.

Downloads

Download data is not yet available.

Downloads

Published

2025-05-13