Document Type

Dissertation

Date of Degree

Fall 2014

Degree Name

PhD (Doctor of Philosophy)

Degree In

Computer Science

First Advisor

Cesare Tinelli

Abstract

An increasing number of applications in verification and security rely on or could benefit from automatic solvers that can check the satisfiability of constraints over a rich set of data types that includes character strings. Unfortunately, most string solvers today are standalone tools that can reason only about some fragment of the theory of strings and regular expressions, sometimes with strong restrictions on the expressiveness of their input language (such as, length bounds on all string variables). These specialized solvers reduce string problems to satisfiability problems over specific data types, such as bit vectors, or to automata decision problems. On the other side, despite their power and success as back-end reasoning engines, general-purpose Satisfiability Modulo Theories (SMT) solvers so far have provided minimal or no native support for string reasoning.

This thesis presents a deductive calculus describing a new algebraic approach that allows solving constraints over the theory of unbounded strings and regular expressions natively, without reduction to other problems. We provide proofs of refutation soundness and solution soundness of our calculus, and solution completeness under a fair proof strategy. Moreover, we show that our calculus is a decision procedure for the theory of regular language membership with length constraints.

We have implemented our calculus as a string solver for the theory of (unbounded) strings with concatenation, length, and membership in regular languages, and incorporated it into the SMT solver CVC4 to expand its already large set of built-in theories. This work makes CVC4 the first SMT solver that is able to accept and process a rich set of mixed constraints over strings, integers, reals, arrays and other data types. In addition, our initial experimental results show that, over string problems, CVC4 is highly competitive with specialized string solvers with a comparable input language. We believe that the approach we described in this thesis provides a new idea for string-based formal methods.

Public Abstract

Privacy violation, or even impersonation, is one of the major concerns about online services, such as email systems or online shopping services. Techniques for vulnerability detection in such web-based applications provide a more secure environment for Internet services. A common approach for vulnerability detection formally describes a potential risk as a mathematical formula and reasons about this formula automatically. However, traditional mathematical methods focus more on numbers than strings, while the latter is considered as the main information carrier in web-based communication. This problem has been intensively studied, yet there is no existing method that can fully handle it.

We present a novel procedure for finding a solution to a formula containing strings, and prove correctness of our procedure. We have identified some patterns of formulas for which our procedure is guaranteed to terminate with a correct answer. In addition, our approach interacts with other procedures in a common framework, which allows it to find a solution to formulas containing not only strings but also multiple data structures.

We have implemented our approach in our solver CVC4 as an internal procedure, which makes it even more competitive as a general-purpose solver for formulas generated by vulnerability detection tools. We give a comparison with the state-of-the-art string solvers over benchmarks from real applications. Initial experimental results indicate that our approach is superior in terms of correctness and performance.

Keywords

publicabstract, Automated Reasoning, DPLL(T), Formal Method, Satisfiability Modulo Theories, Security, String Constraint Solver

Pages

xii, 190 pages

Bibliography

Includes bibliographical references (pages 181-190).

Copyright

Copyright 2014 Tianyi Liang

Share

COinS