<strong>Paper Title</strong><br>
Load Balancing For Distributed File Systems<br>
<br>

<strong>Abstract</strong><br>
In Distibuted file systems, nodes simultaneously serve computing and storage functions; a file is partitioned
into a number of chunks allocated in distinct nodes so that MapReduce tasks can be performed in parallel over the nodes.
However, in a distributed computing environment, failure is the norm, and nodes may be upgraded, replaced, and added in the
system. Files can also be dynamically created, deleted, and appended. This results in load imbalance in a distributed file
system; that is, the file chunks are not distributed as uniformly as possible among the nodes. Emerging distributed file systems
in production systems strongly depend on a central node for chunk reallocation. This dependence is clearly inadequate in a
large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly
scaled with the system size, and may thus become the performance bottleneck and the single point of failure. In this paper, a
fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem. Our algorithm is compared
against a centralized approach in a production system and a competing distributed solution presented in the literature. The
simulation results indicate that our proposal is comparable with the existing centralized approach and considerably
outperforms the prior distributed algorithm in terms of load imbalance factor, movement cost, and algorithmic overhead. The
performance of our proposal implemented in the Hadoop distributed file system is further investigated in a cluster
environment.