[isabelle-dev] Notes on datatype_new list

Sun May 25 17:43:38 CEST 2014

Hi Makarius,

> For the new high-end BNF version of 'a list there were a few funny effects.
> 
> (1) Its definition looks terrific:
> 
> datatype_new (set: 'a) list (map: map rel: list_all2) =
>    =: Nil (defaults tl: "[]")  ("[]")
>  | Cons (hd: 'a) (tl: "'a list")  (infixr "#" 65)
> 
> It means it can no longer serve as example template for your own datatype definitions.  Note that I have already put the independent Seq.thy example in a prominent place some years ago, to decouple basic examples from the particularly difficult List.thy.

Andreas Lochbihler made similar observations to us privately, but about lazy lists (which are very similar). What makes lists so special is that they have lots of traditional names and syntaxes, and the very general "datatype_new" package tries to integrate such a special citizen as best it can.

More precisely, the syntactic complexity stems from these sources:

1. "Mixfix" syntax: [], #
2. BNF artifacts: set, map, list_all2
3. discriminators: (%xs. xs = []) and (%xs. xs ~= [])
4. selectors: hd, tl, and tl [] = []

My claims are:

A. We need to specify all those four things. Strictly speaking, we could pass the "no_discs_sels" option and not generate any discriminator-and-selector constants and properties. But since lists have "hd" and "tl" anyway, better generate them in a uniform way.

B. The concrete syntax tries to put names as close to where they belong. E.g. the set function "set" is put next to the type argument 'a, the "hd" and "tl" to the corresponding constructor value, and even the default "tl [] = []" next to the "Nil" constructor. The map and relator names are close to the type constructor name "list", which makes sense from a "functorial" point of view.

In sum, I claim the BNF package does the right thing, and that the syntax is quite convenient and mostly feels natural, but that lists are about the worst case imaginable.

The "=" as the name for Nil's discriminator deserves an explanation. Without it, the package would generate an "is_Nil" predicate. But looking in the existing list theory, I saw that "= []" was the thing that was used to check for emptiness. I made "= []" be the default for nullary constructors, but Andreas Lochbihler noticed that this caused some unexpected trouble (at least for codatatypes) because it mixes the constructor and the destructor views of a type. So I introduced this weird "=" syntax, which suggests that equality is used for discriminating. I am open to other suggestions.

The other funky syntax we have is "-:". E.g.

    datatype_new (-: 'a) list = ...

This says: Don't generate a set function for 'a -- and don't allow nesting through lists.

Andrei Popescu pointed to the documentation. The "=" syntax is explained at the end of Section 2.1 (p. 8 in aa7f051ba6ab). It is curiously lacking in the reference section Section 2.2.1. I'll add it.

> (2) Some users somehow used the "hd" and "tl" selector specifications above as suggestion to define primitive recursive functions, e.g. like this:
> 
> primref foo where
>  "foo [] = ..."
> | "foo (Cons hd tl) = ..."
> 
> Of course this fails, since these are constants, not variables.  It results in type-inference errors that beginners find hard to understand, and the Prover IDE has no position information about type errors yet.

In some sense, this is almost a compliment for the package. Instead of giving arbitrary names to arguments, users now realize that there is a connection between the arguments and the selectors.

In this particular instance, I think the root of the problem is that "hd" and "tl" are too short names to serve as good constant names. In Haskell, they call these "head" and "tail", and given the way parsing works in Isabelle, I would argue that they would be more suitable.

> (3) A genuine name space problem: "list" is concealed, and thus cannot be completed in semantic completion.

I presume you are referring to the fact that

    @{const_name Cons} = "List.list.Cons"
    @{const_name hd} = "List.list.hd"
    @{const_name map} = "List.list.map"
    @{const_name rec_list} = "List.list.rec_list"

where the "list" component is optional. This behavior is deliberate -- this is what the old package did with all the constants it defined. In other words,

    @{const_name Cons} = "List.list.Cons"
    @{const_name list_rec} = "List.list.list_rec"

holds with Isabelle2013-2. The only novelty is that we followed the same idiom for the other constants introduced by the package, such as "hd" and "map". This is easy to change: Basically, grepping for "Binding.qualify false" in "src/HOL/Tools/{Ctr_Sugar,BNF}/*.ML" locates the offenders.

The optional components appear to be a safe guard against duplicate names across datatypes (or other entities) defined in a single theory. I have no strong opinion there, except that my clear preference is for a scheme that treats constructors, discriminators, selectors, set functions, map functions, relators, and recursors uniformly.

Alternatively: Couldn't something be done on the semantic completion side?

Jasmin