[isabelle-dev] Word Libraries

Gerwin Klein kleing at unsw.edu.au
Tue Oct 12 07:00:59 CEST 2021



> On 9 Oct 2021, at 19:37, Florian Haftmann <florian.haftmann at informatik.tu-muenchen.de> wrote:
> 
> Signed PGP part
>> There is some relevant experience from Isabelle2021. As a rough order of magnitude, it took about 2 full person months to do the complete proof update for l4v, which finished a bit more than a week ago now. 2 person months is about 1 person month higher than a usual Isabelle update. Not quite all, but most of the changes were due to Word_Lib.
>> 
>> You did an initial foray into this proof update for one architecture (ARM, 32 bit) after the first half of the Isabelle2021 changes. The result of that foray was that those changes were relatively benign, which is why I was surprised by the much higher effort later. I think the main fallout was from a few small changes in the second half of the Isabelle2021 update.
> 
> In the retrospective I did underestimate the side conditions imposed by
> the life cycle of l4v wrt. to Isabelle releases.
> 
> Do I understand correctly that there is a running version built on
> Isabelle2021 / AFP 2021?  If yes, this can be used as point of reference
> when appropriate.

Yes, the current master branch of https://github.com/seL4/l4v <https://github.com/seL4/l4v> is on Isabelle2021


> It seems there are a couple of things to iron out properly.  There is
> still some time towards the upcoming Isabelle release.  Beyond that, how
> strictly is l4v coupled to the Isabelle / AFP release cycle?

l4v is strongly coupled to the release cycle in the sense that it only uses Isabelle release versions. It maybe lag behind Isabelle releases sometimes, and it may take new versions of AFP entries as long as these still work with the corresponding Isabelle release (or may add its own changes) .


> Maybe consolidations for Word_Lib could also happen on a dedicated AFP branch
> *after* the Isabelle release and later converge to the (then) next AFP
> release.

Yes, that would be a possibility.


>> - things started breaking immediately for 64-bit architectures, because the Sumo concept alone does not quite work. It is not enough to make all lemmas for all word lengths available. The point of the Word_SetupXX theories is to additionally establish a set of common names that refer to the default word type of the program (basically, whatever "unsigned int" is in C for that architecture, and what the register width is for the machine). Having these names common means that the same proof text can work on different concrete types. This is different to actually generic lemmas that work for any word length or that need to resolve preconditions on the word length before they can be applied (the latter can be solved in theories like Word8 etc). It is not the prettiest form of genericity, but it is crucial for not having to duplicate thousands of lemmas that for other reasons do need to refer to a concrete word length (which is known in the proof state, but unknown to the proof text). Ultimately this is comes from C, which as painful as it is, works on exactly that principle that the same type name can refer to different representations, depending on architecture.
> 
> My proposal:
> * Theories Word_SetupXX re-appear offering the same name bindings as
> pre-Isabelle2021
> * Their purpose is documented in the Guide.thy

Sounds good.


>>>> The shift*1 Operations have been auxiliary definitions in the
>>>> distribution to bootstrap the bit shifts operations, similarly to
>>>> iszero, equipped only with a minimal set of lemmas. If one tool out
>>>> there relies on it as explicit constant i. e. due to pattern matching, I
>>>> won’t shed a tear to keep it as constant.
>>> 
>>> Great, let's keep them constants.
>>> 
>>>> (Aside, the primitive definitions then would be maybe: shiftl1 a = (a << 1), shiftr1 a = (a >> 1), sshiftr1 a = (a >>> 1)).
>>> 
>>> I'd be fine with these as definitions as long as the old forms still available as lemmas, preferably under the previous names. Maybe we can make sure over time that these constants are not produced any more by any tools and then they can finally be removed, but that is a number of release cycles away.
> 
> My proposal:
> * Proper constants for shiftl1 a = (a << 1), shiftr1 a = (a >> 1),
> sshiftr1 a = (a >>> 1)
> * Facts for those available under the same names as Isabelle2021 / AFP
> 2021 (point of reference)

Perfect.



>>>>> For instance, who on earth wants to read something like "is_aligned (push_bit m k) m" instead of "is_aligned (k << m) m"? . The syntax was introduced for good reasons. It is fine to make it a bundle to not pollute a global syntax space, it is not fine to change original material that was written with a purpose.
>>>> 
>>>> If your point is to to apply the ‹_ << _› syntax etc. thoroughly and
>>>> consistently among Word_Lib theories, I am happy to look after these slips.
>>> 
>>> That would be appreciated. My complaint was not that a new lemma did not use the word syntax (although it would be nice to remain consistent), but that old lemmas were rewritten to not use it. Possibly as part of another change.
> 
> There is one issue which prevents a 100% satisfactory solution here: the
> syntax needs abbreviations, and these cannot be organized in bundles.
> 
> Taking into consideration that there seem to be enough rough edges at
> the moment, I would not object to postpone the reconciliation of the two
> different layers of bit shift operations to a future release, ie.
> keeping << >> >>> as distinct constants at the moment.

I think that would be Ok, yes.


> (This would be one remaining item on my todo list, the other one being
> the final dismantling of Ancient_Numeral.thy sometime in the future)

That is probably a good idea, although some of these old lemmas are still used. It should be possible to find where these are given a list of names.


>>> - the unat_arith tactic solves fewer goals. I might be mistaken, but I think this is mostly due to simpset changes, but I have not had the time to track down which exactly.
> 
> Tactic unat_arith has a technical deficiency: at the time of its
> declaration it takes a simpset »as it is« as base, and hence its
> behavior is brittle wrt. to movements in the surroundings.  Since there
> are many things to work out concerning the simp setup, I would postpone
> this issue after these have been resolved.

I'm fine with that, and yes, it would be much better for unat_arith to use a more controlled simpset (possibly augmentable with a named theorem set).


>> - the default simpset setup changed, it no longer reliably normalises ground terms with numerals.
> 
> Ie. terms involving operations like AND + - * and numerals 0, 1, 42?  If
> that breaks down, something is really weird.

I don't remember the concrete term, I think it was something with shifting + AND/OR, getting stuck either on one of the usual suspects (0, 1, Suc 0 or -1).


>> This caused a bunch of proof changes, hunting for which rules are needed etc. My reading of NEWS it that this is getting worse with NOT terms not being normalised any more. This potentially makes sense for int/nat, but I would suggest to keep them for Word (and to make sure the rest of the numeral simplification actually covers all cases -- usually 0, 1, Suc 0 are the corner cases to look out for).
> 
> After Thomas Sewell pointed out some problems with simplifying bit
> expressions over natural numbers, I took over the elementary numeral
> rewriting approach by Andreas Lochbihler originating in session
> Native_Word to overcome that;  in the first stage that relied on NOT not
> being simplified by default, but with everything now in place, it should
> not be difficult to recover that.
> 
> Simplification of bit expressions involving word numerals works still
> the same way as pre-Isabelle2021: rewriting to int first.  If this by
> itself exhibits problems.

pre-Isabelle2021 it worked reliably enough, so I think that is Ok.


>> - the same issue caused C/assembly refinement to fail, as well as generated bitfield proofs (for 64 bit architectures) because arithmetic leaves different terms. This might be unavoidable for arbitrary terms in a larger update, but Isabelle should be able to compute with at least numeral ground terms reliably.
> 
> Yes, that is definitely supposed to work.  Do you have examples at hand?
> Otherwise I will augment the Examples.thy to cover more combinations.

I can have a look if I can recover them from the update diff (it's over 480 changed files, so it might take a bit).


>>> - punning max_word with -1 conflicts with normalising numerals to the interval [0; 2^len). There were a few test lemmas in the repository that made sure this was not accidentally removed, I'm not sure where these went. Removing normalisation for -1 in turn leads to terms like unat(-1) which proofs then get stuck on. This can be fixed locally by finding the right rule to re-establish the old normalisation, but that will of course mean that abstract lemmas that refer to -1 won't apply any more. This means there is no principled way to deal with ground terms any more. I can see the appeal of using -1 for generic lemmas, because it has nice algebraic meaning (as opposed to max_word). Maybe the solution is to build a rule set for arithmetic that can be added on demand or as a bundle. It's a bit of a crutch, but at least it would provide a choice.
> 
> Concerning normalizing numerals to the interval [0; 2^len) – is that
> supposed to work universally?  There seems to be no such mechanism
> neither in Isabelle2020 nor Isabelle2021:
> 
> lemma ‹w = 2342342› for w :: ‹4 word›
>  apply simp

Yes, that was definitely supposed to work, also deeper in the term, e.g. P (9::3 word) used to normalise to P 1. I don't know exactly when that was lost, it might have been before Isabelle2021. It looks like it is enough for us to not actually normalise, but instead be able to decide the usual operators (=, <, etc), which did work in Isabelle2020.


> (I think something like that would require a simproc).

I think the initial version of this did use a simproc, yes, but was replaced a few years ago.


> Concerning max_word – would it help to let it abbreviate 2^len - 1
> rather than -1?  If it is used in terms involving concrete numerals, the
> abstract properties of -1 are not that relevant.  (I pondered that issue
> into different directions, but will spare the space here for the moment).

We struggled with that one as well. We ended up with a constant, because you can then control how it is rewritten separately from -1. I'm not really sure what the best option is.


> Coming back to word numeral normalization – the typical rules should
> take »- _« into consideration properly – »should« in the sense from what
> I would expect naively, I did not check the proactively.  If this fails,
> I should augment that.

It does increase the number of situations/combinations to account for, but it would probably fix at least the concrete numeral computations. If we also get all corner cases like "unat (-1)", things might actually work out.


> I have been reluctant to enforce a particular normalization policy once
> and for all – who can tell whether e.g. [0; 2^len) or [- 2 ^ (len - 1);
> 2 ^ (len - 1) - 1) is the preferred interval?

It's true that this is a policy decision that may not be good for all applications. For us it worked well, but that doesn't make it universal, of course. Ideally there would be the option of no normalisation, the unsigned interval, and the signed interval. That would probably cover everything.


> If your application depends on normalizing »- _« away, this maybe should
> be done in the setup of the application itself.

Normalising was the policy we decided on, but I think the actual dependency is only reliable computation. Normalising makes this a bit more controlled, but it can probably be achieved without it.


>> Overall, it looks like a number of aspects of the new simpset are convenient for reasoning abstractly about machine words when you are working within the library (esp take_bit and 0/dvd), but counterproductive once you are working with specific word lengths as you do in program verification.
> 
> The only instances I can spot at the moment are the conversions; are
> there examples beyond that?  dvd is not supposed to appear spontaneously
> after default simplification.

The main ones I was thinking about were conversions and dvd, and a few instances of -1, but I think the latter will go away with more complete arithmetic on numerals.

Cheers,
Gerwin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman46.in.tum.de/pipermail/isabelle-dev/attachments/20211012/29bc385f/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <https://mailman46.in.tum.de/pipermail/isabelle-dev/attachments/20211012/29bc385f/attachment-0001.sig>


More information about the isabelle-dev mailing list