Study of SMS Spam Detection Using Machine Learning Based Algorithms
DOI:
https://doi.org/10.47392/IRJAEM.2025.0054Keywords:
SMS Spam Detection, Machine Learning, Classification Models, Text Processing, Data AnalysisAbstract
SMS spam detection is a crucial task in text classification, as unsolicited messages continue to pose security risks and inconvenience to users. This study explores the effectiveness of machine learning-based algorithms, particularly the Naive Bayes classifier, in accurately identifying and filtering spam messages. The primary objective is to classify SMS messages into spam or ham categories by analysing the occurrence of words and patterns within the text. The proposed approach involves a comprehensive pre-processing stage, including tokenization, stop-word removal, stemming, and feature extraction using techniques such as Term Frequency-Inverse Document Frequency (TF-IDF). The Naive Bayes algorithm is then trained on a labelled dataset to learn probabilistic distributions of words in spam and ham messages. Additionally, we compare the performance of Naive Bayes with other machine learning models like Support Vector Machines (SVM), Decision Trees, and Random Forest to assess their efficiency in spam detection. The experimental analysis demonstrates that the Naive Bayes classifier, due to its probabilistic nature, achieves high accuracy with minimal computational complexity. The study also evaluates precision, recall, F1-score, and overall classification accuracy to determine the best-performing algorithm. The results suggest that machine learning-based approaches significantly enhance SMS spam detection, reducing false positives and improving message filtering. Future work aims to integrate deep learning techniques and real-time detection mechanisms to further enhance accuracy and adaptability in dynamic environments.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Research Journal on Advanced Engineering and Management (IRJAEM)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.