Communication induced checkpointing based fault tolerance mechanism using deep-learning in IoT applications
Abstract
Internet of things (IoT) is increasingly used in diverse environments such as healthcare, industry and agriculture. They carry a risk of adverse effects if they make decisions based on faulty information. Software faults, especially transient faults are a primary contributor to deficient decision-making. The existing fault tolerant mechanisms often suffer from checkpoint overheads as checkpoints are placed in all the nodes. This paper describes a novel communication induced checkpointing based fault tolerance mechanism (CIC-FTM) designed to efficiently recover from transient faults, while minimizing useless and forced checkpoints. Long short-term memory (LSTM) based deep learning algorithm is used in our approach to predict fault occurrences and strategically place checkpoints. The proposed method also in turn improve system reliability and performance. Experimental results demonstrate the effectiveness of proposed CIC-FTM in IoT environment by minimizing the practicable operating time for checkpointing and back propagation, compared to traditional fault-tolerance mechanisms.
Keywords
Checkpoint at intermediate node; Communication induced checkpointing based fault tolerance mechanism; Long short-term memory algorithm; Transient faults
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v37.i3.pp1785-1796
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).