CatDBSCAN for Outlier Detection in Categorical Datasets
DOI:
https://doi.org/10.47392/IRJAEM.2026.0027Keywords:
Outlier, DBSCAN, CatDBSCAN, Categorical Outlier, outlier Detection, mushroom and breastAbstract
Outlier detection is a critical task and presents unique challenges due to the lack of natural ordering or distance measure among categorical attributes. Numerous methods have been devised for outlier detection. DBSCAN has proven to be useful in numerical domains, identifying the noise points as outliers. In this paper, a modified approach for identifying outlier instances within categorical data sets has been proposed, named CatDBSCAN. The CatDBSACN is adapted for categorical data by incorporating a distance measure such as Hamming Distance. The CatDBSCAN also detects outliers in the small cluster, as data instances lying in low-density regions are prone to outliers. It employs a static parameter while recognizing noise and minor clusters as outlier points. Additionally, an outlier scoring mechanism is used to label noise points and cluster-based outliers. Experiments are conducted on the purely categorical mushroom dataset and the breast cancer dataset of the UCI ML repository.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering and Management (IRJAEM)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.