#### DOI

10.17077/etd.082zx5b2

#### Document Type

Dissertation

#### Date of Degree

Fall 2016

#### Degree Name

PhD (Doctor of Philosophy)

#### Degree In

Computer Science

#### First Advisor

Pemmaraju, Sriram V.

#### First Committee Member

Varadarajan, Kasturi

#### Second Committee Member

Ghosh, Sukumar

#### Third Committee Member

Stump, Aaron

#### Fourth Committee Member

Burer, Samuel

#### Abstract

In this report, we initiate study on understanding a theoretical model for distributed computing called Congested Clique. This report presents constant-time and near-constant-time distributed algorithms for a variety of problems in the Congested Clique model.

We start by showing how to compute a 3-ruling set in expected O(log log log n) rounds and using this, we obtain a constant-approximation to metric facility location, also in expected O(log log log n) rounds. In addition, assuming an input metric space of constant doubling dimension, we obtain constant-round algorithms to compute maximal independent set on distance-threshold graphs and constant-factor approximation to the metric facility location problem. These results significantly improve on the running time of the fastest known algorithms for these problems in the Congested Clique setting.

Then, we study two fundamental graph problems, Graph Connectivity (GC) and Minimum Spanning Tree (MST), in the Congested Clique model, and present several new bounds on the time and message complexities of randomized algorithms for these problems. No non-trivial (i.e., super-constant) time lower bounds are known for either of the aforementioned problems; in particular, an important open question is whether or not constant-round algorithms exist for these problems. We make progress toward answering this question by presenting randomized Monte Carlo algorithms for both problems that run in O(log log log n) rounds (where n is the size of the clique). In addition, assuming an input metric space of constant doubling dimension, we obtain constant-round algorithm the MST problem. Our results improve by an exponential factor on the long-standing (deterministic) time bound of O(log log n) rounds for these problems due to Lotker et al. (SICOMP 2005). Our algorithms make use of several algorithmic tools including graph sketching, random sampling, and fast sorting.

Thus far there has been little work on understanding the message complexity of problems in the Congested Clique. In this report, we initiate a study on the message complexity of Congested Clique algorithms. We study two graph problems, Graph Connectivity (GC) and Minimum Spanning Tree (MST), in the Congested Clique model, focusing on the design of fast algorithms with low message complexity. Our motivation comes from recently established connections between the Congested Clique model and models of large-scale distributed computing such as MapReduce (Hegeman et al., SIROCCO 2014) and the “big data” model (Klauck et al., SODA 2015). For these connections to be fruitful, Congested Clique algorithms not only need to be fast, they also need to have low message complexity. While the aforementioned algorithms are fast, they have an Ω(n2) message complexity, which makes them impractical in the context of the MapReduce and “big data” models.

This motivates our goal of achieving low message complexity, without sacrificing the speed of the algorithm. We start with the simpler GC problem and show that it can be solved in O(log log log n) rounds using only O(n poly log n) messages. Then we derive subroutines to aid our earlier MST algorithm to run in O(log log log n) rounds using O(m poly log n) messages on an m-edge input graph. Then, we present an algorithm running in O(log* n) rounds, with message complexity O (sqrt(m · n)) and then build on this algorithm to derive a family of algorithms, containing for any ε, 0 < ε ≤ 1, an algorithm running in O(log*n/ε) rounds, using O(n^(1+ε/ε)) messages. Setting ε = log log n/ log n leads to the first sub-logarithmic round Congested Clique MST algorithm that uses only O (n) messages.

Our results are a step toward understanding the power of randomization in the Congested Clique with respect to both time and message complexity.

#### Public Abstract

In this report, we initiate study on understanding a theoretical model for distributed computing called *Congested Clique*. This report presents constant-time and near-constant-time distributed algorithms for a variety of problems in the Congested Clique model.

We start by showing how to compute a 3-ruling set in expected *O* (log log log *n*) rounds and using this, we obtain a constant-approximation to metric facility location, also in expected *O* (log log log *n*) rounds. In addition, assuming an input metric space of constant doubling dimension, we obtain constant-round algorithms to compute maximal independent set on distance-threshold graphs and constant-factor approximation to the metric facility location problem. These results significantly improve on the running time of the fastest known algorithms for these problems in the Congested Clique setting.

Then, we study two fundamental graph problems, Graph Connectivity (GC) and Minimum Spanning Tree (MST), in the *Congested Clique* model, and present several new bounds on the time and message complexities of randomized algorithms for these problems. No non-trivial (i.e., super-constant) time lower bounds are known for either of the aforementioned problems; in particular, an important open question is whether or not constant-round algorithms exist for these problems. We make progress toward answering this question by presenting randomized Monte Carlo algorithms for both problems that run in *O* (log log log *n*) rounds (where n is the size of the clique). In addition, assuming an input metric space of constant doubling dimension, we obtain constant-round algorithm the MST problem. Our results improve by an exponential factor on the long-standing (deterministic) time bound of *O* (log log *n*) rounds for these problems due to Lotker et al. (SICOMP 2005). Our algorithms make use of several algorithmic tools including graph sketching, random sampling, and fast sorting.

Thus far there has been little work on understanding the *message complexity* of problems in the Congested Clique. In this report, we initiate a study on the message complexity of Congested Clique algorithms. We study two graph problems, *Graph Connectivity* (GC) and *Minimum Spanning Tree* (MST), in the Congested Clique model, focusing on the design of fast algorithms with *low message complexity*. Our motivation comes from recently established connections between the Congested Clique model and models of large-scale distributed computing such as MapReduce (Hegeman et al., SIROCCO 2014) and the “big data” model (Klauck et al., SODA 2015). For these connections to be fruitful, Congested Clique algorithms not only need to be fast, they also need to have low message complexity. While the aforementioned algorithms are fast, they have an Ω(*n*^{2}) message complexity, which makes them impractical in the context of the MapReduce and “big data” models.

This motivates our goal of achieving low message complexity, without sacrificing the speed of the algorithm. We start with the simpler GC problem and show that it can be solved in *O* (log log log *n*) rounds using only *O* (*n* poly log *n*) messages. Then we derive subroutines to aid our earlier MST algorithm to run in *O* (log log log *n*) rounds using *O* (*m* poly log *n*) messages on an *m*-edge input graph. Then, we present an algorithm running in *O* (log^{∗} *n*) rounds, with message complexity *O* (√*m · n*) and then build on this algorithm to derive a family of algorithms, containing for any ε, 0 < ε ≤ 1, an algorithm running in *O* (log^{∗} *n/ε*) rounds, using *O*(*n*^{1}+ε/ε) messages. Setting ε = log log *n*/ log *n* leads to the first sub-logarithmic round Congested Clique MST algorithm that uses only *O* (*n*) messages.

Our results are a step toward understanding the power of randomization in the Congested Clique with respect to both time and message complexity.

#### Keywords

Congested Clique, Distributed Computing, Graph Algorithms, Linear Sketches, Minimum Spanning Tree

#### Pages

xvi, 163 pages

#### Bibliography

Includes bibliographical references (pages 158-163).

#### Copyright

Copyright © 2016 Vivek Sardeshmukh

#### Recommended Citation

Sardeshmukh, Vivek. "Efficient graph computing on the congested clique." PhD (Doctor of Philosophy) thesis, University of Iowa, 2016.

https://doi.org/10.17077/etd.082zx5b2