10 Mar 15

Simple Refinement Types for OCaml

For more than one year, vulnerabilies in software (especially pervasive C software) have been disclosed at an alarmingly high rate. I love OCaml, which is definitely safer, but still has gaps left open. I believe formal verification, albeit a very powerful tool, is not mature enough for most programmers (too difficult to use, requires too much efforts), so I'm thinking about alternative solutions that would be more lightweight to define and use, but would still increase the confidence in source code.

So here is a draft of a relatively simple extension to OCaml, that allows to specify more invariants in types without requiring a full-fledged proof system. It's purely hypothetical and I did not implement it (nor don't I plan to do it, not before the end of my PhD). The purpose is to propose a conservative extension of OCaml with a type system that can represent more complex invariants, without putting too much of a burden on the user, in a way so adding annotations to existing programs incrementally is feasible and even easy.

The idea relies on refinement types, noted { ty | F } where ty is a regular OCaml type and F is a boolean formula (read "type ty such that F holds). A regular type ty in a signature is short for { ty | true }.

Now, the boolean formula F is inductively built from conjunction and, disjunction or, negation not, and atoms. The atoms, for a type ty, are OCaml values of type ty -> bool (actually, of a type that can be specialized into ty -> bool, to handle polymorphism. e.g. if ty = int list, and p : 'a list -> bool, then { ty | p } is valid). The predicates of type ty -> bool should be referentially transparent and terminating (although we don't try to enforce this).

Type Checking

Whenever f : { ty | p } -> foo is called on an expression e : { ty | q }, a boolean proof obligation q => p is issued to a SAT-solver. If the proof obligation fails, either an error is emitted, or a warning is issued and some runtime check of the form assert (p e); f e replaces f e.

Applying f to some "static" value c (e.g. the constant 2) that has a bare type should, ideally, check p c at compile-time. A runtime check can also be used if the compile-time check proves too complicated.

It would also be interesting to expose to the type checker some logic properties about predicates, in the form of rules: given two predicates p : ty -> bool and q : ty -> bool, a rule would say p => q (for any x : ty, p x implies q x). The type checker can use such affirmations to help solving the proof obligations generated by functions calls. The rules could be checked either by adding runtime checks, or using random testing, or by formal proving.

Type Inference

Given a function let f x = <body>, the type inference infers the type tau of x normally, but it also collects every points where a function of type {tau | p} is called on x so the final type of x is { tau | p1 and p2 and ... and pn} (in the absence of any annotation, we find the empty conjunction, that is, true, that is, x : tau. Hence the conservative extension). The return value collects the annotations in a similar way, but joins them using a disjunction.

In the .mli, the user can express more tight constraints on the signature of functions and values, same as phantom types or private aliases.

Runtime Checking

The assertions that refine a type can be checked at runtime, if some compiler flag is enabled or disabled (similar to -no-assert). Runtime checks for a function f : { ty1 | p1 } -> { ty2 | p2 } look like this:

let f x =
  assert (p1 x);
  (* body of f *)
  let res = ..... in
  assert (p2 res);
  res

Usage

The refinement types as defined here are very simple, yet they can serve to encode interesting invariants in the large (proving programs in the small is left to more ambitious tools: extraction from Coq, why3, CFML etc.)

Examples:

a balanced predicate for AVL trees

module type AVL_SET = sig
    type elt

    type t = E | N of t * int * elt * t

    val balanced : t -> bool

    val empty : {t | balanced }
    (* empty tree *)

    val add : {t | balanced} -> elt -> {t | balanced}

    val as_balanced : t -> {t | balanced} option
    (* check the tree is balanced *)

    val random : (Random.State.t -> elt) -> Random.State.t -> t
    (* combine with as_balanced to generate balanced trees *)
end

sorting arrays

val sorted : 'a array -> bool

val as_sorted : 'a array -> {'a array | sorted} option
(* check the array is sorted, and cast it *)

val sort : 'a array -> {'a array | sorted}

val binary_search : {'a array | sorted} -> 'a -> int option
(* Expects the array to be sorted *)

encoding basic state machines (à la phantom types, but without redefining the type)

type t

val at_start : t -> bool
val at_middle : t -> bool
val at_stop : t -> bool

val start : unit -> {t | at_start}

val loop : {t | at_start or at_middle} -> {t | at_middle}

val stop : {t | at_start or at_middle} -> {t | at_stop}

Combining Properties

It can work on third-party types without changing them: a function accepting int also accepts any {int | F}. Some row polymorphism is probably needed to combine properties. For instance, the predicate as_sorted: 'a array -> {'a array | sorted} option should really have the type as_sorted : {'a array | 'p} -> {'a array | 'p and sorted} option not to lose any other properties the array already has.

Maybe assert (p x); e on a value x: {ty | q} should modify the type of x to {ty | p and q} in e, or require that x has type {ty | p} if x is an argument. This would make the idiom val as_foo : ty -> {ty | foo} option redundant.

Limitations

How to parametrize predicates with values, even known statically? Would be extremely useful for invariants parametrized by comparison function, for instance. Maybe the values can be existentially quantified in argument positions, and made opaque in return value position…

Conclusion

This is clearly just a rough idea, in dire need of refinement (no pun intended). However, I think it is both really simple on the type-checking/type inference side (compared to true formal verification) and easy to use, as a more powerful version of private aliases or phantom types to express simple invariants in type signatures. I would love to hear the opinion of people who have a good knowledge of OCaml's type-{checker, system}.

Edit: instead of a SAT-solver, a BDD library (binary decision diagrams) might be simpler to use in the context of a type-checker.

I got many interesting pointers from people:

(late) typestate in rust
blame, coercions
liquid haskell
contract ocaml which looks very interesting in this perspective
relation to abstract interpretation (the predicate domain)