Automated detection of outlier in WSN by entropy features based machine learning methods

manmohan singh yadav, Shish Ahamad


 Environmental disasters like flooding, earthquake etc. causes catastrophic effects all over the world. WSN based techniques have become popular in susceptibility modelling of such disaster due to their greater strength and efficiency in the prediction of such threats. This paper demonstrates the machine learning-based approach to predict outlier in sensor data with bagging, boosting, random subspace, SVM and KNN based frameworks for outlier prediction using a WSN data. First of all database is pre processed with 14 sensor motes with presence of outlier due to intrusion. Subsequently segmented database is created from sensor pairs. Finally, the data entropy is calculated and used as a feature to determine the presence of outlier used different approach. Results show that the KNN model has the highest prediction capability for outlier assessment.


Outlier; K-nearest neighbour; Ensemble; Sensor; Entropy; SVM.


H.Wang, J. Gu, and S.Wang, “An effective intrusion detection framework based on SVM with feature augmentation”, Knowl.-Based Syst., vol. 136, pp. 130_139, doi: 10.1016/j.knosys.2017.09.014, Nov. 2017.

S. Teng, N.Wu, H. Zhu, L. Teng, andW. Zhang, “SVM-DT-based adaptive and collaborative intrusion detection”, IEEE/CAA J. Automatica Sinica,vol. 5, no. 1, pp. 108_118,doi: 10.1109/JAS.2017.7510730, Jan. 2018.

Iftikhar Amad , Mohd. Basheri, Mohd. Javed Iqbal and Aneel Rahim, “Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection”, IEEE access, VOLUME 6, 2018.

Yali Amit and Donald Geman, “Shape quantization and recognition with randomized trees”. Neural Computation, 9:1545–1588, 1997.

Leo Breiman, “Bagging predictors. Machine Learning”, page 24:123–140,1996.

Leo Breiman, “Using adaptive bagging to debias regressions. Technical report”, Statics Department UCB, 1999.

Leo Breiman “ Random forests and Machine Learning”, 45:5–32, 2001.

Thomas G, Dietterich. “An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization”. Machine Learning, 40:139–157, 2000.

Y. Freund and R. Schapire. “Experiments with a new boosting algorithm”. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156, 1996.

Tin Kam Ho. “The random subspace method for constructing decision forests”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.

W. ElGhazel, C. Guyeux, A. Farhat, M. Hakem, K. Medjaher, N. Zerhouni, and J.M. Bahi, “Random Forests for Industrial Device Functioning Diagnostics Using Wireless Sensor Networks”, arXiv:1706.08106v1 [cs.AI] 25 Jun 2017.

Doddy,Prayogo and Yudas Tadeus Teddy Susanto, “Optimizing the Prediction Accuracy of Friction Capacity of Driven Piles in Cohesive Soil Using a Novel Self-Tuning Least Squares Support Vector Machine”, Hindawi Advances in Civil Engineering Volume 2018, Article ID 6490169, 9 pages, 20 March 2018.

B. Emil Richard Singh and E. Sivasankar, “Enhancing Prediction Accuracy of Default of Credit Using Ensemble Techniques”,First International Conference on Artificial Intelligence and Cognitive Computing , Advances in Intelligent Systems and Computing 815,, © Springer Nature Singapore Pte Ltd. 2019.

Shaker El-Sappagh , Mohammed Elmogy, Farman Ali, Tamer ABUHMED , S. M. Riazul Islam and Kyung-Sup Kwak, “A Comprehensive Medical Decision–Support Framework Based on a Heterogeneous Ensemble Classifier for Diabetes Prediction” ,, Electronics 2019, 8, 635; doi:10.3390/electronics8060635,2019.

Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classi_cation”, ,April 15, 2010.

Jose F. Diez-Pastor , Juan J. Rodriguez,Cesar Garcia-Osorio , Ludmila I. Kuncheva, “Random Balance: Ensembles of variable priors classifiers for imbalanced Data”, journal homepage:, 7 May 2015.

J. Stefanowski, “Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data, in: Emerging Paradigms in Machine Learning”, Springer, pp. 277–306, 2013.

C.E. Brodley, M.A. Friedl, “Identifying mislabeled training data”, J. Artif. Intell. Res. 11-131–167, 1999.

G.M. Weiss, “The impact of small disjuncts on classifier learning, in: Data Mining”, Springer, pp. 193–226, 2010.

K. Napierała, J. Stefanowski, S. Wilk, “Learning from imbalanced data in presence of noisy and borderline examples, in: Rough Sets and Current Trends in Computing”, Springer,pp. 158–167, 2010.

T. Jo, N. Japkowicz, “Class imbalances versus small disjuncts”, ACM SIGKDD Explor. Newslett. 6 (1) 40–49, 2004.

D. Wilson, “Asymptotic properties of nearest neighbor rules using edited data”, IEEE Trans. Syst. Man Cybern. 2 (3),408–421,1972.

K. Gowda, G. Krishna, “The condensed nearest neighbor rule using the concept of mutual nearest neighbourhood (corresp.)”, IEEE Trans. Inform. Theory 25 (4) 488–490,1979.

J. Stefanowski, S. Wilk, “Selective pre-processing of imbalanced data for improving classification performance”, in: Data Warehousing and Knowledge Discovery, Springer, pp. 283–292, 2008.

Manmohan singh yadav,shish Ahmad , “Outlier detection in Wireless sensor networks data by entropy based K-NN Predictor”, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278-3075,Volume-8 Issue-12, October 2019.

Total views : 4 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics