Authors: Nafisa S, Dr. Balaji. K
Abstract: The intersection of artificial intelligence (AI) and Internet of Things (IoT) has led to the emergence of massive distributed systems prone to faults due to hardware, network, or software issues. In this paper, we design a self-healing system for AIoT systems that employs automated fault diagnosis, root cause identification, and dynamic healing mechanisms. Our self-healing system uses an efficient transformer-based anomaly detection model coupled with graph neural networks for fault localization and reinforcement learning (RL) agents for recovery policy decision making. The performance of our self-healing framework is evaluated using simulations of a smart factory with 1,200 sensor and actuator nodes, where the framework achieves an average accuracy of 94.2% and latency of 2.3 seconds. When compared to reactive approaches, the framework reduces system downtime by 76.4%, increases task success rate by 15.8 percentage points, and outperforms adaptive recovery approaches by a wide margin.
International Journal of Science, Engineering and Technology