[isabelle-dev] Safe approach to hypothesis substitution

Mon Aug 23 11:35:06 CEST 2010

Hi Thomas,

Thanks for digging into this. The concrete patch is exactly what was 
needed to advance this discussion. Here are a few comments. I am sure 
others will have more...

> I attach the resulting patch. After 
> several rounds of bugfixes and compatibility improvements, it requires 
> 23 lines of changes to theories to rebuild HOL, HOL-ex and HOL-Word.

So it is painful (as there are a much more theories out there), but it 
may work, and it may be worth the trouble in the long run.

> The code change in Provers/hypsubst.ML adds code for counting the free 
> and bound variables in a goal, and checking whether either side of an 
> equality hypothesis is a unique variable, occuring nowhere else. The 
> main substitution algorithms avoid using such equalities and also 
> preserve rather than delete the hypothesis they select.

Do I understand correctly that substituting the hypothesis is safe, but 
discarding them afterwards is unsafe? Then, why should they not be used?

More importantly, why is it safe to use and discard an equality when the 
variable occurs elsewhere in the goal? The equality itself could 
interact with a fact from the context? Consider the contrived example

locale l =
   fixes x :: nat
   assumes neq: "x ~= 0"
begin

lemma "x = 0 ==> x <= x + 1 ==> False"
apply safe

After "safe", the lemma becomes unprovable, since the hypothesis that 
contradicts the context is removed. It seems that your patch does not 
fix this... (I haven't tried... just guessing from reading the code)

[...]

A few concrete remarks concerning the patch (which seems to be relative 
to Isabelle2009-2):

> diff -r 2e1b6a8a0fdd src/HOL/Parity.thy
> --- a/src/HOL/Parity.thy	Wed Jul 28 06:38:17 2010 +1000
> +++ b/src/HOL/Parity.thy	Mon Aug 23 17:33:59 2010 +1000
> @@ -66,9 +66,10 @@
>    by presburger
>  
>  lemma even_times_anything: "even (x::int) ==> even (x * y)"
> -  by algebra
> +  by (simp add: int_even_iff_2_dvd)
>  
> -lemma anything_times_even: "even (y::int) ==> even (x * y)" by algebra
> +lemma anything_times_even: "even (y::int) ==> even (x * y)"
> +  by (simp add: int_even_iff_2_dvd)
>  
>  lemma odd_times_odd: "odd (x::int) ==> odd y ==> odd (x * y)" 
>    by (simp add: even_def zmod_zmult1_eq)

?!? Why is the behaviour of the "algebra" method changed?

> diff -r 2e1b6a8a0fdd src/Provers/hypsubst.ML
> --- a/src/Provers/hypsubst.ML	Wed Jul 28 06:38:17 2010 +1000
> +++ b/src/Provers/hypsubst.ML	Mon Aug 23 17:33:59 2010 +1000

> +(* Counts the occurences of Free and Bound variables in a term. *)
> +fun count_vars' _ (Free f) = Termtab.map_default (Free f, 0) (fn x => x + 1)
> +  | count_vars' _ (Var _) = I
> +  | count_vars' _ (Const _) = I
> +  | count_vars' n (Bound m) = Termtab.map_default (Bound (m - n), 0)
> +      (fn x => x + 1)
> +  | count_vars' n (Abs (_, _, t)) = count_vars' (n + 1) t
> +  | count_vars' n (f $ x) = count_vars' n f o count_vars' n x
> +
> +fun count_vars ts = fold (count_vars' 0) ts Termtab.empty

This is very strange, since it counts the bound variables in
"(%f. f) (%x. x)" as the same variable. It doesn't seem to produce a 
problem now, since the variables in a hypothesis are goal parameters, 
which are nested linearly, but it is confusing nevertheless.

> +
> +(* An equality hypothesis 'x = y' is trivial if substituting with it would have
> +   no useful effect. The principal test for this is if x or y is a unique
> +   variable, occuring nowhere else in the goal.  The exception to the rule is
> +   if x is bound and y is unique but free. *)

What is a "useful effect"? This is not a rhetorical question -- I really 
don't understand. Is the second sentence the actual definition?

A more general remark:
Larry's comments in hypsubst.ML:
> (*If novars then we forbid Vars in the equality.
>   If bnd then we only look for Bound variables to eliminate.
>   When can we safely delete the equality?
>     Not if it equates two constants; consider 0=1.
>     Not if it resembles x=t[x], since substitution does not eliminate x.
>     Not if it resembles ?x=0; consider ?x=0 ==> ?x=1 or even ?x=0 ==> P
>     Not if it involves a variable free in the premises,
>         but we can't check for this -- hence bnd and bound_hyp_subst_tac

The last point really indicates the main problem: "the premises" are 
actually facts in the surrounding context. In other words, it is almost 
never safe to eliminate Frees, unless we know for some reason that all 
relevant facts have already been inserted in the goal...

So it seems the only really conservative step is to eliminate bound 
variables only??? How many proofs would have to be fixed if "safe 
methods" were restricted to this? I gues it's a lot more, but it may 
still be better than trading one heuristic for a more complex one, which 
is still unsafe...

Alex