Spark: A Statistical Comparison and Evaluation of Classification Algorithms for Fault Prediction in Electrical Secondary Distribution Network

David Makota; Naiman Shililiandumi; Hashim Iddi

doi:10.56279/jicts.v2i2.91

PDF

Published: Oct 30, 2024

DOI: https://doi.org/10.56279/jicts.v2i2.91

Keywords:

Apache Spark Classification Algorithms Electrical Secondary Distribution Fault Prediction Statistical Methods

David Makota

Institute of Finance Managemen, Tanzania, United Republic of

Naiman Shililiandumi

University of Dar es Salaam, Tanzania, United Republic of

https://orcid.org/0000-0002-8499-7543

Hashim Iddi

University of Dar es Salaam, Tanzania, United Republic of

https://orcid.org/0000-0002-4025-9653

Abstract

Managing faults in the electrical secondary distribution network is a challenging task given the nature, size, and complexity. Predicting faults early before they occur helps in increasing the safety and reliability of the power distribution system. Various statistical and machine learning techniques are being used to predict different types of faults. This study applies classification algorithms available in the big data framework Apache Spark through its python interface PySpark to predict electrical secondary distribution network faults. The study evaluates and compares nine algorithms: Decision tree, Gradient-boosted tree, Logistic regression, Naïve Bayes, Multilayer perceptron, Random forest, Linear Support Vector Machine, One-versus-rest and Factorization machines. The research uses Friedman’s test followed by the Nemenyi post hoc test to find the significance of performance differences among the algorithms. The results show significant differences among the algorithms. Gradient-boosted tree and One-versus-rest with Gradient-boosted tree had the best performance for binary and multiclass classification, respectively, while Naïve Bayes had the worst performance.

How to Cite

Makota, D., Shililiandumi, N., & Iddi, H. (2024). Spark: A Statistical Comparison and Evaluation of Classification Algorithms for Fault Prediction in Electrical Secondary Distribution Network. Journal of ICT Systems, 2(2), 42–54. https://doi.org/10.56279/jicts.v2i2.91