Date of Degree
PhD (Doctor of Philosophy)
Self-stabilizing system is a concept of fault-tolerance in distributed computing. A distributed algorithm is self-stabilizing if, starting from an arbitrary state, it is guaranteed to converge to a legal state in a finite number of states and remains in a legal set of states thereafter. The property of self-stabilization enables a distributed algorithm to recover from a transient fault regardless of its objective. Moreover, a self-stabilizing algorithm does not have to be initialized as it eventually starts to behave correctly.
In this thesis, we focus on extensions and refinements of self-stabilization by studying two non-traditional aspects of self-stabilization.
In traditional self-stabilizing distributed systems , the inherent assumption is that all processes run predefined programs mandated by an external agency which is the owner or the administrator of the entire system. The model works fine for solving problems when processes cooperate with one another, with a global goal. In modern times it is quite common to have a distributed system spanning over multiple administrative domains, and processes have selfish motives to optimize their own pay- off. Maximizing individual payoffs under the umbrella of stabilization characterizes the notion of selfish stabilization .
We investigate the impact of selfishness on the existence and the complexity of stabilizing solutions to specific problems in this thesis. Our model of selfishness centers on a graph where the set of nodes is divided into subsets of distinct colors, each having their own unique perception of the edge costs. We study the problems of constructing a rooted shortest path tree and a maximum flow tree on this model, and demonstrate that when processes are selfish, there is no guarantee that a solution will exist. We demonstrate that the complexity of determining the existence of a stabilizing solution is NP-complete, carefully characterize a fraction of such cases, and propose the construction of stabilizing solutions wherever such solutions are feasible.
Fault containment and system availability are important issues in today's distributed systems. In this thesis, we show how fault-containment can be added to weakly stabilizing distributed systems. We present solutions using a randomized scheduler, and illustrate techniques to bias the random schedules so that the system recovers from all single faults in a time independent of the size of the system, and the effect of the failure is contained within constant distance from the faulty node with high probability (this probability can be controlled by a user defined tuning parameter). Using this technique, we solve two problems: one is the persistent-bit problem, and the other is the leader election problem.
Copyright 2009 Anurag Dasgupta