# Structural Properties of Propositional Formulas Defining Combinatorial Search Problems

We are interested in understanding the power and limitations of families of algorithms for solving combinatorial search problems, in particular those whose decision versions are NP-complete. In some application areas involving such problems, the state of the art involves representing a problem instance as a formula of propositional logic (or a natural extension, typical examples being cardinality constraints and Boolean combinations of linear inequalities), and using a program designed to find satisfying assignments (e.g., a SAT solver, or SMT solver) to attempt to construct a solution. (SMT stands for Satisfiability Modulo Theories, which means satisfiability of a set of ground first order formulas involving atoms whose interpretation is determined by some decidable theory, e.g., Presburger arithmetic or the theory of arrays.)

An interesting observation about practical experience is that “industrial” SAT and SMT solvers commonly solve instances with millions of variables and clauses. In contrast, we know how to design families of formulas with a few hundreds of variables which no known method could decide satisfiability of in our lifetimes. While industrial sources produce a continual source of increasingly hard instances which are too hard for current technology to solve, these instances are nonetheless very easy relative to their number of variables when we take into account how small a hard formula can be. Such observations have lead to considerable speculation about the structure of comparatively hard or easy instances from various sources.

A frequent suggestion is that many industrial instances are relatively easy because they are satisfy some particular structural property, such as (a graph associated with the formula) having small width, by one of a number of related notions of with of graph width, and thus being solvable by efficient algorithms for instances with this property. Actual evidence is weak: it is not clear that the formulas satisfy any such a property, and in many cases we know they do not – or even cannot – satisfy the most obvious candidate properties. It is even less clear that the algorithms used in practice should be able to efficiently solve instances because they have properties of the sorts proposed. However, there are many intriguing possibilities.

The students role will be to help in answering some particular parts of the first question. Their work would proceed in the following stages: Reading of appropriate background material, and selection of a small number of families of formulas (e.g., from hardware verification, automated planning, bioinformatics, combinatorial designs, etc.) and structural properties to study (notions of width, scale-freeness, etc.); Study of algorithms and existing implementations for the properties chosen; preliminary experiments. It is anticipated that we will not be able to measure interesting properties precisely for large instances with existing implementations; Work, together with the supervisor and other students (and likely a post-doc) on refining the algorithms and data structures to improve speed and thus the size of instances that can be measured; Prepare a report on the progress made.