Fault-tolerance methods are required to ensure high availability and high reliability in cloud computing environments. In this survey, we address fault-tolerance in the scope of cloud computing. Recently, cloud computing-based environments have presented new challenges to support fault-tolerance and opened new paths to develop novel strategies, architectures, and standards. We provide a detailed background of cloud computing to establish a comprehensive understanding of the subject, from basic to advanced. We then highlight fault-tolerance components and system-level metrics and identify the needs and applications of fault-tolerance in cloud computing. Furthermore, we discuss state-of-the-art proactive and reactive approaches to cloud computing fault-tolerance. We further structure and discuss current research efforts on cloud computing fault-tolerance architectures and frameworks. Finally, we conclude by enumerating future research directions specific to cloud computing fault-tolerance development. © 2022 IEEE.
Add the full text or supplementary notes for the publication here using Markdown formatting.