CatDBSCAN for Outlier Detection in Categorical Datasets

Authors

  • Aditi Badhan Department of Computer Science, Himachal Pradesh University, Shimla, India Author
  • Anita Ganpati Department of Computer Science, Himachal Pradesh University, Shimla, India Author

DOI:

https://doi.org/10.47392/IRJAEM.2026.0027

Keywords:

Outlier, DBSCAN, CatDBSCAN, Categorical Outlier, outlier Detection, mushroom and breast

Abstract

Outlier detection is a critical task and presents unique challenges due to the lack of natural ordering or distance measure among categorical attributes. Numerous methods have been devised for outlier detection. DBSCAN has proven to be useful in numerical domains, identifying the noise points as outliers. In this paper, a modified approach for identifying outlier instances within categorical data sets has been proposed, named CatDBSCAN. The CatDBSACN is adapted for categorical data by incorporating a distance measure such as Hamming Distance. The CatDBSCAN also detects outliers in the small cluster, as data instances lying in low-density regions are prone to outliers.  It employs a static parameter while recognizing noise and minor clusters as outlier points. Additionally, an outlier scoring mechanism is used to label noise points and cluster-based outliers. Experiments are conducted on the purely categorical mushroom dataset and the breast cancer dataset of the UCI ML repository.

Downloads

Download data is not yet available.

Downloads

Published

2026-02-27