A DATA RE-REPLICATION SCHEME AND ITS IMPROVEMENT TOWARD PROACTIVE APPROACH

Authors

Thanda Shwe Department of Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University
Masayoshi Aritsugi Department of Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University

DOI:

https://doi.org/10.11113/aej.v8.15497

Abstract

With increasing demand for cloud computing technology, cloud infrastructures are utilized to their maximum limits. There is a high possibility that commodity servers that are used in Hadoop Distributed File System (HDFS) based cloud data center will fail often. However, the selection of source and destination data nodes for re-replication of data has so far not been adequately addressed. In order to balance the workload among nodes during re-replication phase and reduce impact on cluster normal jobs’ performance, we develop a re-replication scheme that takes into consideration of both performance and reliability perspectives. The appropriate nodes for re-replication are selected based on Analytic Hierarchy Process (AHP) with the consideration of the current utilization of resources by the cluster normal jobs. Toward effective data re-replication, we investigate the feasibility of using linear regression and local regression methods to predict resource utilization. Simulation results show that our proposed approach can reduce re-replication time, total job execution time and top-of-rack network traffic compared to baseline HDFS, consequently increases the reliability of the system and reduces performance impacts on users jobs. Regarding feasibility study of prediction methods, both regression methods are good enough to predict short time future resource utilization for re-replication.