Date of Degree
PhD (Doctor of Philosophy)
First Committee Member
Second Committee Member
Third Committee Member
Online learning receives increasing attention due to its efficiency in handling large-scale streaming data. However, imbalanced data raises a big challenge for traditional online algorithms which aim at minimizing the misclassification error rate. This thesis works on online learning algorithms that are suitable for classification on imbalanced data.
We first investigate fast converging algorithms directly optimizing asymmetric performance measures, which is challenging because most of the measures are non-decomposable so that standard online optimization methods are not applicable. As a preliminary step, we propose a fast converging adaptive stochastic approxima- tion (ASA) algorithm for the risk minimization with Lipschitz continuous random functions, which adapts to to structure conditions of the problem (a.k.a. error bound conditions, (EBC)). The proposed algorithm achieves a convergence rate of Õ(1/n^(2−θ)) 1 depending on θ ∈ (0, 2], the power constant in EBC, where n is the maximum number of iterations.
Based on the framework of ASA, we develop two fast converging online stochastic algorithms respectively optimizing AUC and F-measure. Specifically, we propose the Accelerated Stochastic Online AUC Maximization (ASOLAM). The problem is challenging because the loss involves pairs of examples from different classes which is non-decomposable. Our ASOLAM algorithm shares the advantage of low time and storage cost with the state-of-the-art online AUC maximization algorithm, and further improves the convergence rate from Õ(1/√n) to Õ(1/n). To the best of our knowledge, this is the fastest online AUC optimization algorithm in terms of convergence rate. We justify the proposed ASOLAM algorithm by both theoretical and experimental analysis.
Based on the ASA framework, we also develop a Fast Online F-measure Optimization (FOFO) algorithm accelerating the thresholding process for a probabilistic binary classifier. FOFO achieves an Õ(1/√n) convergence rate for the threshold learning. It is provably faster than its predecessor based on a heuristic for updating the threshold. The experiments verify the efficiency of the proposed algorithm in comparison with the state-of-the-art algorithms.
Lastly, we extend the research to an online active setting, and propose an asymmetric active querying strategy which queries examples according to a probability based on the predicted label and the uncertainty. We provide theoretical analysis showing the proposed method is equivalent to optimizing a cost-sensitive metric and is further equivalent to maximizing F-measure. We also justify the efficiency of the proposed method by experimental results and provide thoroughly discussion.
xii, 133 pages
Includes bibliographical references (pages 125-133).
Copyright © 2018 Xiaoxuan Zhang
Zhang, Xiaoxuan. "Online learning for imbalanced data: optimizing asymmetric measures." PhD (Doctor of Philosophy) thesis, University of Iowa, 2018.