[isabelle-dev] Grouping ISABELLE_FULL_TEST?

Wed Feb 3 11:04:48 CET 2016

> On 02.02.2016, at 08:28, Lars Hupel <hupel at in.tum.de> wrote:
> 
>> What remains are various benchmarks.  We could move these sessions to
>> ~~/src/Benchmarks and omit that directory in ~~/ROOTS.  Thus it can be
>> included on demand like this:
>> 
>> isabelle build -d '~~/src/Benchmarks' -a
>> 
>> or like this:
>> 
>> isabelle build -D '~~/src/Benchmarks'
>> 
>> This avoids more sophisticated Boolean algebra on session groups.  It
>> also makes clear from the source structure, what is normally not tested.
>> 
>> I basically do the same with -d '$AFP' all the time.
> 
> Sounds reasonable. There's no pressure to do it just yet; I'd be happy
> to wait for post-release mode.

I have a slight worry concerning the characterization and naming as "benchmarks". From what I understand, a "benchmark" is a theory file that tests the performance of the system. Looking at the list

HOL-ex
  Sudoku
HOL-Datatype_Examples
  Brackin
  IsaFoR
  Misc_N2M
HOL-Word-SMT_Examples
  SMT_Tests
HOL-Quickcheck_Benchmark
  Find_Unused_Assms_Examples
  Needham_Schroeder_No_Attacker_Example
  Needham_Schroeder_Guided_Attacker_Example
  Needham_Schroeder_Unguided_Attacker_Example
HOL-Record_Benchmark
  Record_Benchmark

I would say that only "Record_Benchmark" is truly benchmarking. The others are merely slow examples or examples that require a special setup.

Special setup:
 Sudoku
 SMT_Tests (requires Z3)

Slow:
  Brackin
  IsaFoR
  Misc_N2M

Probably also in the slow category (the last three might have a benchmarking component):
  Find_Unused_Assms_Examples
  Needham_Schroeder_No_Attacker_Example
  Needham_Schroeder_Guided_Attacker_Example
  Needham_Schroeder_Unguided_Attacker_Example

Slow and benchmarking:
 Record_Benchmark

Conversely, some examples that are always processed can be considered benchmarking -- e.g. "Nitpick_Examples/Hotel_Nits.thy".

Jasmin