True Vision AI: Deepfake Video Detection Using a Hybrid Ensemble of Xception and Video Vision Transformer (ViViT)

Authors

  • Gautham Rishab S UG - Department of Information Technology, B.S.A. Crescent Institute of Science & Technology,Chennai, India. Author
  • Gowtham S UG - Department of Information Technology, B.S.A. Crescent Institute of Science & Technology,Chennai, India. Author
  • Mrs. Sakthi P Assistant Professor, Information Technology, B.S.A. Crescent Institute of Science & Technology, Chennai, India Author

DOI:

https://doi.org/10.47392/IRJAEM.2026.0317

Keywords:

Deepfake Detection, Xception, Video Vision Transformer (ViViT), Ensemble Learning.

Abstract

Deepfakes have become one of the most pressing issues of our time, and what used to take a team of visual effects experts weeks to do can now be done in minutes using freely available software, with results increasingly indistinguishable from reality. We present True Vision AI, a deepfake video detection system based on a two-stream ensemble approach utilizing both spatial and temporal understanding. Our system combines a fine-tuned Xception network (pre-trained on ImageNet) for detecting subtle visual inconsistencies in individual frames, alongside a Video Vision Transformer (ViViT-B/16x2, pre-trained on Kinetics-400) for detecting motion-level anomalies across frames. Features from both networks are merged into a unified 2,816-dimensional vector fed into a compact classifier to determine whether a video is real or fake. Trained and tested on the Celeb-DF dataset (890 genuine videos and 808 deepfakes), our Xception model achieves 88.5% validation accuracy, ViViT achieves 87.0%, and the ensemble achieves 88.3%, The final system is deployed as a lightweight Flask API that provides a determination, a confidence score, and a frame-level breakdown of where deception is likely occurring.

Downloads

Download data is not yet available.

Downloads

Published

2026-05-12