A simple, effective distance and density based outlier detection algorithm
Abstract
Outliers are eccentric data points with anomalous nature. Clustering with outliers has received a lot of attention in the data processing community. But, they inordinately affect the quality of the results obtained in case of popular clustering algorithms during the process of finding an optimal solution. In this work, we propose a novel method to classify the data points with grouping characteristics as either an outlier or not. We use both distance and density of a particular data point with respect to the rest of the data points for this process. Distances are used to find the points at the extremities while the densities are used to identify the data points at the sparsest spaces. Further, every data model has to take into account the aspect of generalization in order to work robustly even in out of the box situations. Hence, our approach provides a generalization aspect to the model. The accuracy of the proposed work is measured using area under curve (AUC) was found the highest for cardioto data set -AUC value-0.90 and second highest AUC value was obtained for Spambase data set -0.52 and several other datasets are used to demonstrate the usage of the model proposed.
		Keywords
Anti-neighbours; Area under curve; Density; Distances; Outliers;
		Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v24.i2.pp1141-1148
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).
