# Simple Refinement Types for OCaml

For more than one year, vulnerabilies in software (especially pervasive C software) have been disclosed at an alarmingly high rate. I love OCaml, which is definitely safer, but still has gaps left open. I believe formal verification, albeit a very powerful tool, is not mature enough for most programmers (too difficult to use, requires too much efforts), so I'm thinking about alternative solutions that would be more lightweight to define and use, but would still increase the confidence in source code.

So here is a draft of a relatively simple extension to OCaml, that allows to specify more invariants in types without requiring a full-fledged proof system. It's purely hypothetical and I did not implement it (nor don't I plan to do it, not before the end of my PhD). The purpose is to propose a conservative extension of OCaml with a type system that can represent more complex invariants, without putting too much of a burden on the user, in a way so adding annotations to existing programs incrementally is feasible and even easy.

The idea relies on *refinement types*, noted `{ ty | F }`

where `ty`

is a regular OCaml type and `F`

is a boolean formula (read "type `ty`

such that `F`

holds). A regular type `ty`

in a signature is short for `{ ty | true }`

.

Now, the boolean formula `F`

is inductively built from conjunction `and`

, disjunction `or`

, negation `not`

, and *atoms*. The atoms, for a type `ty`

, are OCaml values of type `ty -> bool`

(actually, of a type that can be specialized into `ty -> bool`

, to handle polymorphism. e.g. if `ty = int list`

, and `p : 'a list -> bool`

, then `{ ty | p }`

is valid). The predicates of type `ty -> bool`

should be referentially transparent and terminating (although we don't try to enforce this).

# Type Checking

Whenever `f : { ty | p } -> foo`

is called on an expression `e : { ty | q }`

, a boolean proof obligation `q => p`

is issued to a SAT-solver. If the proof obligation fails, either an error is emitted, or a warning is issued and some runtime check of the form `assert (p e); f e`

replaces `f e`

.

Applying `f`

to some "static" value `c`

(e.g. the constant 2) that has a bare type should, ideally, check `p c`

at compile-time. A runtime check can also be used if the compile-time check proves too complicated.

It would also be interesting to expose to the type checker some logic properties about predicates, in the form of *rules*: given two predicates `p : ty -> bool`

and `q : ty -> bool`

, a rule would say `p => q`

(for any `x : ty`

, `p x`

implies `q x`

). The type checker can use such affirmations to help solving the proof obligations generated by functions calls. The rules could be checked either by adding runtime checks, or using random testing, or by formal proving.

# Type Inference

Given a function `let f x = <body>`

, the type inference infers the type `tau`

of `x`

normally, but it also collects every points where a function of type `{tau | p}`

is called on `x`

so the final type of x is `{ tau | p1 and p2 and ... and pn}`

(in the absence of any annotation, we find the empty conjunction, that is, `true`

, that is, `x : tau`

. Hence the conservative extension). The return value collects the annotations in a similar way, but joins them using a disjunction.

In the `.mli`

, the user can express more tight constraints on the signature of functions and values, same as phantom types or private aliases.

# Runtime Checking

The assertions that refine a type can be checked at runtime, if some compiler flag is enabled or disabled (similar to `-no-assert`

). Runtime checks for a function `f : { ty1 | p1 } -> { ty2 | p2 }`

look like this:

let f x = assert (p1 x); (* body of f *) let res = ..... in assert (p2 res); res

# Usage

The refinement types as defined here are very simple, yet they can serve to encode interesting invariants in the large (proving programs in the small is left to more ambitious tools: extraction from Coq, why3, CFML etc.)

Examples:

- a
*balanced*predicate for AVL trees

module type AVL_SET = sig type elt type t = E | N of t * int * elt * t val balanced : t -> bool val empty : {t | balanced } (* empty tree *) val add : {t | balanced} -> elt -> {t | balanced} val as_balanced : t -> {t | balanced} option (* check the tree is balanced *) val random : (Random.State.t -> elt) -> Random.State.t -> t (* combine with as_balanced to generate balanced trees *) end

- sorting arrays

val sorted : 'a array -> bool val as_sorted : 'a array -> {'a array | sorted} option (* check the array is sorted, and cast it *) val sort : 'a array -> {'a array | sorted} val binary_search : {'a array | sorted} -> 'a -> int option (* Expects the array to be sorted *)

- encoding basic state machines (à la phantom types, but without redefining the type)

type t val at_start : t -> bool val at_middle : t -> bool val at_stop : t -> bool val start : unit -> {t | at_start} val loop : {t | at_start or at_middle} -> {t | at_middle} val stop : {t | at_start or at_middle} -> {t | at_stop}

## Combining Properties

It can work on third-party types without changing them: a function accepting `int`

also accepts any `{int | F}`

. Some row polymorphism is probably needed to *combine* properties. For instance, the predicate `as_sorted: 'a array -> {'a array | sorted} option`

should really have the type `as_sorted : {'a array | 'p} -> {'a array | 'p and sorted} option`

not to lose any other properties the array already has.

Maybe `assert (p x); e`

on a value `x: {ty | q}`

should modify the type of `x`

to `{ty | p and q}`

in `e`

, or require that x has type `{ty | p}`

if `x`

is an argument. This would make the idiom `val as_foo : ty -> {ty | foo} option`

redundant.

# Limitations

How to parametrize predicates with values, even known statically? Would be extremely useful for invariants parametrized by comparison function, for instance. Maybe the values can be existentially quantified in argument positions, and made opaque in return value position…

# Conclusion

This is clearly just a rough idea, in dire need of refinement (no pun intended). However, I think it is both really simple on the type-checking/type inference side (compared to true formal verification) and easy to use, as a more powerful version of private aliases or phantom types to express simple invariants in type signatures. I would love to hear the opinion of people who have a good knowledge of OCaml's type-{checker, system}.

**Edit**: instead of a SAT-solver, a BDD library (binary decision diagrams) might be simpler to use in the context of a type-checker.

I got many interesting pointers from people:

- (late) typestate in rust
- blame, coercions
- liquid haskell
- contract ocaml which looks very interesting in this perspective
- relation to abstract interpretation (the
*predicate domain*)