Voice portraits: building faces through voice analysis

Anandhu T. G., John K. Joseph, Navneeth Krishnan J., Richu Shibu, Elizabeth Isaac

Abstract


Generation of a person’s appearance from their voice alone is an intriguing challenge. The proposed framework centers on recreating a person’s facial image based solely on a short audio recording of that person speaking. Using a deep neural network trained on millions of YouTube recordings where faces and voices appear together, the system learns voice-face relationships, enabling it to generate images that capture physical traits such as age, gender, and ethnicity. Operating in a self-supervised manner, this method takes advantage of the pairing of faces and voices in online videos, eliminating the need for explicit property modeling. The model achieved a classification accuracy of (95%) for gender, (83%) for age, and (65%) for race prediction from voice inputs, demonstrating an exceptional performance in demographic trait identification. The generated images are evaluated against real photographs of the speakers, assessing how closely these reconstructions resemble actual appearance. This framework has practical applications in forensic analysis, security systems, and privacy-conscious biometric identification, offering a non-invasive alternative to traditional facial recognition methods.

Keywords


Biometric identification; Cross-modal learning; Deep learning; Facial reconstruction; Generative adversarial networks; Speech analysis; Voice-to-face generation

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v42.i3.pp902-912

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

shopify stats IJEECS visitor statistics