[isabelle-dev] Testing on smlnj

Alexander Krauss krauss at in.tum.de
Wed Dec 15 10:32:00 CET 2010


Dear all,

I want to report on the current state of smlnj testing.

Currently, the nightly isatest only covers the HOL image and
HOL-Library. More wasn't really feasible in a nightly-build setting on
a machine that is used interactively during the day.

Now we have a dedicated machine (lxlabbroy15, 3.06GHz Xeon, 2 CPUs,
2GB RAM) for running smlnj exclusively, and the new testing framework
is not restricted to a nightly-build mode. But we must decide on the
strategy that we use.

So I first ran it on everything to get a picture of the state of
affairs. The revision I tested is 97bf008b9cfe.

The big surprise is that HOL-Proofs can actually be built with smlnj
(in ten hours), but HOL-ex (which I would have expected is smaller)
cannot. I don't know why... the memory requirements of different
sessions are largely unknown anyway.


Sessions that work
------------------

(but look at the timing!)

Finished HOL (3:51:13 elapsed time, 3:51:04 cpu time, factor 0.99)
Finished HOL-Auth (5:17:32 elapsed time, 5:17:27 cpu time, factor 0.99)
Finished HOL-Boogie (0:03:51 elapsed time, 0:03:45 cpu time, factor 0.97)
Finished HOL-Boogie-Examples (0:05:54 elapsed time, 0:05:53 cpu time, 
factor 0.99)
Finished HOL-Hahn_Banach (0:11:26 elapsed time, 0:11:24 cpu time, factor 
0.99)
Finished HOL-Hoare (0:24:42 elapsed time, 0:24:41 cpu time, factor 0.99)
Finished HOL-Hoare_Parallel (2:45:23 elapsed time, 2:45:22 cpu time, 
factor 0.99)
Finished HOL-IMP (0:13:31 elapsed time, 0:13:29 cpu time, factor 0.99)
Finished HOL-IMPP (0:06:05 elapsed time, 0:06:03 cpu time, factor 0.99)
Finished HOL-IOA (0:01:41 elapsed time, 0:01:40 cpu time, factor 0.99)
Finished HOL-Imperative_HOL (2:03:01 elapsed time, 2:02:58 cpu time, 
factor 0.99)
Finished HOL-Import (0:16:31 elapsed time, 0:16:30 cpu time, factor 0.99)
Finished HOL-Induct (0:23:05 elapsed time, 0:23:04 cpu time, factor 0.99)
Finished HOL-Isar_Examples (0:11:37 elapsed time, 0:11:35 cpu time, 
factor 0.99)
Finished HOL-Lattice (0:02:24 elapsed time, 0:02:24 cpu time, factor 1.00)
Finished HOL-Library (3:31:17 elapsed time, 3:31:11 cpu time, factor 0.99)
Finished HOL-Matrix (0:32:20 elapsed time, 0:32:18 cpu time, factor 0.99)
Finished HOL-Metis_Examples (0:29:46 elapsed time, 0:29:44 cpu time, 
factor 0.99)
Finished HOL-NanoJava (0:12:46 elapsed time, 0:12:45 cpu time, factor 0.99)
Finished HOL-Number_Theory (1:07:40 elapsed time, 1:07:39 cpu time, 
factor 0.99)
Finished HOL-Old_Number_Theory (0:59:20 elapsed time, 0:59:19 cpu time, 
factor 0.99)
Finished HOL-Quotient_Examples (0:19:52 elapsed time, 0:19:51 cpu time, 
factor 0.99)
Finished HOL-Prolog (0:00:26 elapsed time, 0:00:25 cpu time, factor 0.96)
Finished HOL-Proofs (10:06:25 elapsed time, 10:06:14 cpu time, factor 0.99)
Finished HOL-Proofs-ex (0:00:24 elapsed time, 0:00:19 cpu time, factor 0.79)
Finished HOL-Proofs-Lambda (3:14:35 elapsed time, 3:14:33 cpu time, 
factor 0.99)
Finished HOL-SET_Protocol (1:06:13 elapsed time, 1:06:10 cpu time, 
factor 0.99)
Finished HOL-Word-SMT_Examples (0:08:18 elapsed time, 0:08:14 cpu time, 
factor 0.99)
Finished HOL-Statespace (0:04:51 elapsed time, 0:04:50 cpu time, factor 
0.99)
Finished HOL-Subst (0:02:37 elapsed time, 0:02:36 cpu time, factor 0.99)
Finished TLA (0:04:00 elapsed time, 0:03:56 cpu time, factor 0.98)
Finished TLA-Buffer (0:00:34 elapsed time, 0:00:33 cpu time, factor 0.97)
Finished TLA-Inc (0:01:36 elapsed time, 0:01:34 cpu time, factor 0.97)
Finished TLA-Memory (0:13:11 elapsed time, 0:13:10 cpu time, factor 0.99)
Finished HOL-UNITY (1:39:58 elapsed time, 1:39:55 cpu time, factor 0.99)
Finished HOL-Unix (0:09:11 elapsed time, 0:09:10 cpu time, factor 0.99)
Finished HOL-Word-Examples (0:00:26 elapsed time, 0:00:24 cpu time, 
factor 0.92)
Finished HOL-ZF (0:12:44 elapsed time, 0:12:42 cpu time, factor 0.99)
Finished HOL-Algebra (2:26:16 elapsed time, 2:26:10 cpu time, factor 0.99)
Finished HOL-Multivariate_Analysis (3:19:36 elapsed time, 3:19:28 cpu 
time, factor 0.99)
Finished HOL-NSA (0:27:29 elapsed time, 0:27:24 cpu time, factor 0.99)
Finished HOL4 (0:51:33 elapsed time, 0:51:27 cpu time, factor 0.99)
Finished HOL-Main (2:54:51 elapsed time, 2:54:45 cpu time, factor 0.99)
Finished HOL-Plain (0:29:25 elapsed time, 0:29:22 cpu time, factor 0.99)
Finished Sequents (0:00:38 elapsed time, 0:00:37 cpu time, factor 0.97)
Finished Sequents-LK (0:00:11 elapsed time, 0:00:10 cpu time, factor 0.90)
Finished Pure-WWW_Find (0:00:00 elapsed time, 0:00:00 cpu time)
Finished HOL-Nominal (0:39:28 elapsed time, 0:39:23 cpu time, factor 0.99)
Finished HOL-Word (0:34:12 elapsed time, 0:34:07 cpu time, factor 0.99)
Finished HOL-Bali (4:20:45 elapsed time, 4:20:43 cpu time, factor 0.99)

(not sure why FOL/ZF is not in my logs, probably it built even before,
and I didn't do make clean. So we can expect that they work)

I haven't added the numbers, but it must be a few days of machine time.


Out of memory
-------------

The following sessions failed with out-of-memory. lxlabbroy15 has only
2GB of memory, but I also tested HOL-ex on macbroy21 (with 4GB) and it
failed as well. I don't know if the other sessions below could benefit
from more memory.

HOL-ex FAILED
HOL-Decision_Procs FAILED
HOL-Library-Codegenerator_Test FAILED
HOL-MicroJava FAILED
HOL-Nominal-Examples FAILED
HOL-Proofs-Extraction FAILED
HOL-Probability FAILED
Logics HOL FAILED!

This mostly isn't surprising, since these are the large sessions,
except for HOL-Library-Codegenerator_Test, where I don't know what's
happening.


Type errors
------------

The following sessions failed with type errors, due to the more
restrictive (read: more stupid) checking in NJ. Since the commit
message traditionally is "made smlnj happy", I call them "happiness
errors" below:

HOLCF FAILED
HOL-Mirabelle FAILED
HOL-Mutabelle FAILED
HOL-Nitpick_Examples FAILED
HOL-Predicate_Compile_Examples FAILED

I expect that this can be fixed, and probably should, but it is a bit
tedious. For a start, I have the detailed session logs, which point to
concrete errors, but there are probably more hidden behind them.


Now what?
------------

We must decide on how to proceed with testing smlnj.

Remember that the idea behind all this was to ensure that we are not
too much tied to one particular compiler (polyml), but that our
programming actually follows the standard and not some particular
interpretation.

Another reason that came up again recently is that the smlnj
implementation is pretty much stable (in the sense of not changing)
whereas polyml changes faster. This means that some old Isabelle
release is likely to work with new versions of smlnj but not polyml,
where the range is smaller.

Now due to the poor performance, we are in the process of being
actually more and more tied to a particular compiler, at least for
doing non-trivial things. With the new developments in the pipeline,
this will not become better...

So the strategy would be somethig like

- Test the HOL image as often as possible, to find those happiness
   errors quickly and make it less tedious to fix them since they get
   masked less often.

- Test everything regularly as well, particularly before a release.

One could also decide to stop with this tedium completely and drop
support for smlnj. But this decision should not be done hastily, since
it probably cannot be reverted.

Any thoughts?

Alex


More information about the isabelle-dev mailing list