Cervical Cancer Detection Using a Hybrid CNN-Vision Transformer Model: A Comparative Study with Efficient NETB, DenseNET, Xception, And ResNET50
DOI:
https://doi.org/10.47392/IRJAEM.2025.0220Keywords:
Vision Transformers, Hybrid Model, Feature Extraction, Deep Learning, Convolutional Neural NetworksAbstract
Cervical cancer remains one of the leading causes of cancer-related deaths among women worldwide, particularly in low-resource settings. Early detection is crucial for improving survival rates, and advancements in deep learning have shown promise in automating this process. This paper proposes a novel hybrid model combining Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) for cervical cancer detection. We integrate EfficientNetB, DenseNet, Xception, and ResNet50 as backbone CNN architectures to extract hierarchical features, followed by a Vision Transformer to capture long-range dependencies and global context. The proposed model is evaluated on a publicly available cervical cancer dataset, achieving state-of-the-art accuracy, sensitivity, and specificity performance. Our results demonstrate the effectiveness of combining CNNs and ViTs for medical image analysis, providing a robust framework for cervical cancer detection.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Research Journal on Advanced Engineering and Management (IRJAEM)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.