## Theses and Dissertations

#### DOI

10.17077/etd.0cx5-cr9b

Dissertation

Spring 2019

#### Degree Name

PhD (Doctor of Philosophy)

Computer Science

Ghosh, Sukumar

Oliveira, Suely

#### Third Committee Member

Pemmaraju, Sriram

Wu, Xiaodong

#### Abstract

Clustering problems often arise in the fields like data mining, machine learning and computational biology to group a collection of objects into similar groups with respect to a similarity measure. For example, clustering can be used to group genes with related expression patterns. Covering problems are another important class of problems, where the task is to select a subset of objects from a larger set, such that the objects in the subset "cover" (or contain) a given set of elements. Covering problems have found applications in various fields including wireless and sensor networks, VLSI, and image processing. For example, covering can be used to find placement locations of the minimum number of mobile towers to serve all the customers of a region. In this dissertation, we consider an interesting collection of geometric clustering and covering problems, which are modeled as optimization problems. These problems are known to be $\mathsf{NP}$-hard, i.e. no efficient algorithms are expected to be found for these problems that return optimal solutions. Thus, we focus our effort in designing efficient approximation algorithms for these problems that yield near-optimal solutions. In this work, we study three clustering problems: $k$-means, $k$-clustering and Non-Uniform-$k$-center and one covering problem: Metric Capacitated Covering.

$k$-means is one of the most studied clustering problems and probably the most frequently used clustering problem in practical applications. In this problem, we are given a set of points in an Euclidean space and we want to choose $k$ center points from the same Euclidean space. Each input point is assigned to its nearest chosen center, and points assigned to a center form a cluster. The cost per input point is the square of its distance from its nearest center. The total cost is the sum of the costs of the points. The goal is to choose $k$ center points so that the total cost is minimized. We give a local search based algorithm for this problem that always returns a solution of cost within $(1+\eps)$-factor of the optimal cost for any $\eps > 0$. However, our algorithm uses $(1+\eps)k$ center points. The best known approximation before our work was about 9 that uses exactly $k$ centers. The result appears in Chapter \ref{sec:kmeanschap}.

$k$-clustering is another popular clustering problem studied mainly by the theory community. In this problem, each cluster is represented by a ball in the input metric space. We would like to choose $k$ balls whose union contains all the input points. The cost of each ball is its radius to the power $\alpha$ for some given paramater $\alpha \ge 1$. The total cost is the sum of the costs of the chosen $k$ balls. The goal is to find $k$ balls such that the total cost is minimized. We give a probabilistic metric partitioning based algorithm for this problem that always returns a solution of cost within $(1+\eps)$-factor of the optimal cost for any $\eps > 0$. However, our algorithm uses $(1+\eps)k$ balls, and the running time is quasi-polynomial. The best known approximation in polynomial time is $c^{\alpha}$ that uses exactly $k$ balls, where $c$ is a constant. The result appears in Chapter \ref{sec:kcluster}.

Non-Uniform-$k$-center is another clustering problem, which was posed very recently. Like in $k$-clustering here also each cluster is represented by a ball. Additionally, we are given $k$ integers $r_1,\ldots,r_k$, and we want to find the minimum dilation $\alpha$ and choose $k$ balls with radius $\alpha\cdot r_i$ for $1\le i\le k$ whose union contains all the input points. This problem is known to be notoriously hard. No approximation is known even in the special case when $r_i$'s belong to a set of three integers. We give an LP rounding based algorithm for this special case that always returns a solution of cost within a constant factor of the optimal cost. However, our algorithm uses $(2+\eps)k$ balls for some constant $\epsilon$. We also show that this special case can be solved in polynomial time under a practical assumption. Moreover, we prove that the Euclidean version of the problem is also as hard as the general version. These results appear in Chapter \ref{sec:nukc}.

Capacitated Covering is a generalization of the classical set cover problem. In the Metric Capacitated Covering problem, we are given a set of balls and a set of points in a metric space. Additionally, we are given an integer that is referred to as the capacity. The goal is to find a minimum subset of the input set of balls, such that each point can be assigned to the chosen balls in a manner so that the number of points assigned to each ball is bounded by the capacity. We give an LP rounding based algorithm for this problem that always returns a solution of cost within a constant factor of the optimal cost. However, we assume that we are allowed to expand the balls by a fairly small constant. If no expansion is allowed, then the problem is known to not admit any constant approximation. We discuss our findings in Chapter \ref{sec:capa}.

As mentioned above, for many of the problems we consider, we obtain results that improve the best known approximation bounds. Our findings make significant progress towards better understanding the internals of these problems, which have impact across the disciplines. Also, during the course of our work, we have designed tools and techniques, which might be of independent interest for solving similar optimization problems. Finally, in Chapter \ref{sec:conclude}, we conclude our discussion and pose some open questions, which we consider as our potential future work.

#### Keywords

Approximation, Clustering, Covering, Inapproximability, Optimization

xiv, 171 pages

#### Bibliography

Includes bibliographical references (pages 161-171).