Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Musab T. S. Al-Kaltakchi, Haithem Abd Al-Raheem Taha, Mohanad Abd Shehab, Mohamed A.M. Abdullah


In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. 


Cepstral mean variance normalization (CMVN); Coefficients (MFCCS); Gaussian mixture model (GMM); Mel frequency cepstral; Power normalized cepstral coefficients (PNCCS); Speaker recognition

Full Text:



A. K. Jain, A. A. Ross, and K. Nandakumar, Introduction to Biometrics. Springer Publishing Company, Incorporated, 2011.

R. R. Al-Nima et al., “Personal verification based on multi-spectral finger texture lighting images,” IET Signal Processing, vol. 12, no. 10, pp. 1154–1164, 2018.

S. Singh, “The role of speech technology in biometrics, forensics and man-machine interface,” International Journal of Electrical and Computer Engineering, vol. 9, no. 1, pp. 281, 2019.

J. O. Pinz´on-Arenas, R. Jimenez-Moreno, and C. G. Pachón-Suescún, “Offline signature verification using DAG-CNN.” International Journal of Electrical & Computer Engineering, vol. 9, pp. 2088- 8708, 2019.

A. Acien, J. Hernandez-Ortega, A. Morales, J. Fierrez, R. Vera-Rodriguez, and J. Ortega-Garcia, “On the analysis of keystroke recognition performance based on proprietary passwords,” in 8th International Conference of Pattern Recognition Systems (ICPRS 2017), July 2017, pp. 1–6.

M. Balazia and K. N. Plataniotis, “Human gait recognition from motion capture data in signature poses,” IET Biometrics, vol. 6, no. 2, pp. 129–137, 2017.

M. A. M. Abdullah, R. R. Al-Nima, S. S. Dlay, W. L. Woo, and J. A. Chambers, “Cross-Spectral Iris Matching for Surveillance Applications,” In: Surveillance in Action, Advanced Sciences and Technologies for Security Applications. Cham, Switzerland: Springer; 2018. pp. 105-125.

N. Hezil and A. Boukrouche, “Multimodal biometric recognition using human ear and palmprint,” IET Biometrics, vol. 6, no. 5, pp. 351–359, 2017.

R. P. Persada, S. Aulia et al., “Automatic face and VLP’s recognition for smart parking system.” Telkomnika, vol. 17, no. 4, 2019.

H. Sinha and P. K. Ajmera, “Upgrading security and protection in ear biometrics,” IET Biometrics, vol. 8, no. 4, pp. 259–266, 2019.

A. Chatterjee, P. Singh, V. Bhatia, and S. Prakash, “Ear biometrics recognition using laser biospeckled fringe projection profilometry,” Optics and Laser Technology, vol. 112, pp. 368 –378, 2019.

W. Yang, S. Wang, J. Hu, G. Zheng, and C. Valli, “Security and accuracy of fingerprint-based biometrics: A review,” Symmetry, vol. 11, no. 2, 2019.

R. Togneri and D. Pullella, “An overview of speaker identification: Accuracy and robustness issues,” IEEE circuits and systems magazine, vol. 11, no. 2, pp. 23–61, 2011.

M. T. Al-Kaltakchi, W. L. Woo, S. Dlay, and J. A. Chambers, “Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects,” EURASIP Journal on Advances in Signal Processing, vol. 2017, no. 1, p. 80, 2017.

M. Cooke, J. Barker, S. Cunningham and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition”. The Journal of the Acoustical Society of America vol. 120, no. 5, pp. 2421–2424, 2006.

A. Rashed and W. M. Bahgat, “Modified technique for speaker recognition using ANN,” International Journal of Computer Science and Network Security, vol. 13, no. 8, pp. 8–13, 2013.

R. R. O. Al-Nima, M. A. Abdullah, M. T. Al-Kaltakchi, S. S. Dlay, W. L. Woo, and J. A. Chambers, “Finger texture biometric verification exploiting multi-scale sobel angles local binary pattern features and score-based fusion,” Digital Signal Processing, vol. 70, pp. 178–189, 2017.

A. Maesa, F. Garzia, M. Scarpiniti, and R. Cusani, “Text independent automatic speaker recognition system using MEL-frequency Cepstrum coefficient and gaussian mixture models,” Journal of Information Security, vol. 3, no. 8, p. 335, 2012.

E. Ambikairajah, J. M. K. Kua, V. Sethu, and H. Li, “PNCC-ivector-SRC based speaker verification,” in Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 2012, pp. 1–7.

H. Beigi, Signal Enhancement and Compensation. Boston, MA: Springer US, 2011, pp. 561–586.

G. Nijhawan and M. Soni, “A new design approach for speaker recognition using MFCC and VAD,” International Journal of Image, Graphics and Signal Processing(IJIGSP), vol. 9, no. 5, pp. 43–49, 2013.

L. Shi, I. Ahmad, Y. He, and K. Chang, “Hidden Markov model based drone sound recognition using MFCC technique in practical noisy environments,” Journal of Communications and Networks, vol. 20, no. 5, pp. 509–518, 2018.

C. Kim and R. M. Stern, “Power-normalized cepstral coefficients (PNCCs) for robust speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 7, pp. 1315–1329, July 2016.

M. T. S. Al-Kaltakchi, W. L. Woo, S. S. Dlay, and J. A. Chambers, “Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification,” in 2016 4th International Conference on Biometrics and Forensics (IWBF), March 2016, pp. 1–6.

M. T. S. Al-Kaltakchi, “Robust text independent closed set speaker identification systems and their evaluation,” PhD Thesis, Newcastle University, 2018.

Total views : 60 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics