Synthetic Voice detection using Quantum Convolutional Neural Network with attention mechanism Model for Authentication Applications
Main Article Content
Abstract
This research focuses on the real-time identification of Synthetic voices (Deep fake voices) based on their discrimination between real human voices and the digitally synthesized ones based on the binary classification approach. The methodology is based on a combination of feature extraction via Mel-Frequency Cepstral Coefficients (MFCC), combined with a quantum convolutional neural network (QCNN) enhanced through an attention mechanism(AM). This integration significantly improves the model’s ability to identify synthetic voices with high precision and accuracy. The study utilized the Kaggle dataset, containing 320 real and synthetic audio samples, which were pre-processed to remove duplicates and zero-byte files and normalized for consistent analysis. The QCNN model with the attention mechanism was seen to have fantastic results, that is, with 95% accuracy, precision of 96%, recall of 98%, and an F1 score of 97%. Such performance metrics represent a huge increase over the baseline models, namely CNN-LSTM, CNN-BiLSTM, ResNet18 + KNN, and even custom architectures for CNNs. This work demonstrates the efficiency of quantum-enhanced deep learning techniques and places the QCNN with an attention mechanism as a robust and scalable solution for deep fake voice (DFV) detection, which provides reliable performance across diverse scenarios. It opens the doors to further improvement in synthetic media detection and authentication systems.