[isabelle-dev] AFP: "enable proof in session Lorenz_C1"^

Thu Feb 29 17:30:04 CET 2024

On 29/02/2024 16:23, Fabian Immler wrote:
> 
>     Of course it is better to check things formally, instead of having them
>     commented-out.
> 
> That is the motovation for the change: it is better to check things formally. 
> Talking to Fabian Huch, I got the impression that now there are more compute 
> resources than five or six years ago.

There are indeed a few more hardware resources, and hopefully more to come 
very soon, but right now the production test environment is rather underpowered.

I am speaking here of the official isabelle_cronjob nightly build with 
repeatable performance measurements. Although many people don't know about 
that, it is the main way to oversee Isabelle/AFP resource requirements. If 
that does not work properly, we are "flying blind" --- we used to do that over 
several years around 2015 +/-.

That build environment defines what can be active as "slow" or "very_slow". 
Starting with April 2024 there will be a new situation, hopefully with some 
new hardware somewhere. That is necessary, because the LRZ will shut down its 
experimental free-as-in-beer Linux cluster.

I do hope that within a few weeks there will be regular AFP tests on good 
hardware. Here is an example timing on the super high-end consumer hardware 
organized by Fabian Huch:

       AFP/ddf90847bfa5
       Host of1.proof.cit.tum.de

       ML_PLATFORM="x86_64-linux"
       ML_HOME="/u/home/wenzelm/.isabelle/contrib/polyml-5.9.1/x86_64-linux"
       ML_SYSTEM="polyml-5.9.1"
       ML_OPTIONS="--minheap 1500"

       threads=8

       Finished HOL (0:01:58 elapsed time, 0:06:19 cpu time, factor 3.21)
       Finished HOL-Analysis (0:04:03 elapsed time, 0:19:05 cpu time, factor 4.70)
       Finished Ordinary_Differential_Equations (0:01:02 elapsed time, 0:03:43 
cpu time, factor 3.59)

       Finished Lorenz_C0 (0:05:06 elapsed time, 0:29:38 cpu time, factor 5.80)
       Finished Lorenz_C1 (3:17:28 elapsed time, 19:25:10 cpu time, factor 5.90)
       3:34:09 elapsed time, 21:06:14 cpu time, factor 5.91

Not bad. But also note that ARM is much slower: the analogous test on Mac 
Studio M1 from some hours ago is still running, here are some results so far:

       Finished HOL (0:02:08 elapsed time, 0:07:09 cpu time, factor 3.33)
       Finished HOL-Analysis (0:04:34 elapsed time, 0:21:46 cpu time, factor 4.77)
       Finished Ordinary_Differential_Equations (0:01:06 elapsed time, 0:04:11 
cpu time, factor 3.81)
       Finished Lorenz_C0 (0:17:58 elapsed time, 1:45:23 cpu time, factor 5.86)

>     So far, I thought that Lorenz_C0 was there to be tested
>     properly, and Lorenz_C1 an untested add-on example that is unlikely to
>     fail on
>     its own account.
> 
> Mathematically, Lorenz_C0 is of not much value. Lorenz_C1 is a formal proof 
> for the computational part of Tucker‘s proof of Smale‘s 14th problem. So 
> Lorenz_C1 is of actual value and worth testing „regularly“.
> 
> „Regularly“ could also be weekly, monthly, or even just once for every 
> Isabelle release. It probably makes sense to reflect that in a new session 
> group „very_very_slow“ or the like.

OK, lets see what will emerge rather soon.

Right now Lorenz_C1 needs to remain inactive to avoid bombing the 
isabelle_cronjob AFP test: Today it actually failed due to file-system 
shortage, so it did not bump into the next day.

	Makarius