[isabelle-dev] Grouping ISABELLE_FULL_TEST?
Makarius
makarius at sketis.net
Wed Feb 3 22:22:21 CET 2016
On Wed, 3 Feb 2016, Jasmin Blanchette wrote:
> I have a slight worry concerning the characterization and naming as
> "benchmarks". From what I understand, a "benchmark" is a theory file
> that tests the performance of the system.
I have introduced this terminology many years ago. The idea is that we
always do a proper test of all tools, libraries, examples, either via
old-school "isabelle build -a" or some immediate test service (formerly
"testboard").
The things that are run in addition as "nightly build" are slow (and thus
getting in the way), but are not relevant for actual testing. Since the
main purpose of nightly builds is to collect reliably performance figures,
the outcome of these sessions is characterized as "benchmarks".
Since the measurements of isatest have become very inaccurate in recent
years, this key purpose the background tests has been forgotten. We are
flying blind, especially for AFP. Good performance records are important,
especially due to the trend to shrink consumer devices more and more.
I don't mind if there is a more fitting name for
~~/src/Additional_Examples_That_Are_Often_Slow_But_Not_Really_Relevant
instead of ~~/src/Benchmarks.
Getting good performance figures back -- stored over in the long term --
is really important, though.
> Special setup:
> Sudoku
What is special about Sudoku? The relevant changeset is this:
changeset: 58331:054e9a9fccad
user: blanchet
date: Fri Sep 12 17:51:31 2014 +0200
files: src/HOL/ROOT
description:
enabled 'Sudoku' only with 'ISABELLE_FULL_TEST' -- Sudoku is fast enough
on modern hardware (within seconds on my MacBook), but it seems to fail on
older test machines
> SMT_Tests (requires Z3)
The relevant changeset is this:
changeset: 50666:6f48853f08d5
user: blanchet
date: Wed Jan 02 09:31:25 2013 +0100
files: src/HOL/ROOT src/HOL/SMT_Examples/SMT_Examples.thy
src/HOL/SMT_Examples/SMT_Tests.certs src/HOL/SMT_Examples/SMT_Tests.thy
description:
actually run Z3 for "SMT_Tests" when "ISABELLE_FULL_TEST" is enabled
In 2013 we did not have Z3 as default component, usable for everybody. Now
the condition on it is always true -- z3 is always enabled.
The theory takes 40s elapsed time on my machine. That could qualify it as
optional "benchmark", but I don't mind to have it by default. We have
other much bigger things all the time.
> Slow:
> Brackin
> IsaFoR
> Misc_N2M
>
> Probably also in the slow category (the last three might have a benchmarking component):
> Find_Unused_Assms_Examples
> Needham_Schroeder_No_Attacker_Example
> Needham_Schroeder_Guided_Attacker_Example
> Needham_Schroeder_Unguided_Attacker_Example
>
> Slow and benchmarking:
> Record_Benchmark
These are the genuine examples where I understand the arrangement as "test
this continuously in the background, and record good performance figures
for it".
Makarius
More information about the isabelle-dev
mailing list