[isabelle-dev] New Testing Infrastructure -- status report

Mon May 30 10:03:29 CEST 2011

Awesome! It looks positively industrial in scale.
Larry

On 30 May 2011, at 08:54, Alexander Krauss wrote:

> Hi all,
> 
> In the past weeks, there has been some progress with our new testing
> infrastructure, which I would like to summarize here. Please give
> feedback, ask questions, and discuss.
> 
> 
> Really fast status for "isabelle makeall"
> =========================================
> 
> Thanks to a new fat server with 24 cores and 130G of RAM, we can now
> run "isabelle makeall all" in 17-18 minutes. This is done on all
> changesets as they come in, so we always know quickly when the tip of
> the repository breaks.
> 
> The reports with logs etc. can all be browsed at
> 
>  http://isabelle.in.tum.de/reports/Isabelle/
> 
> This is a fully functional repository view, annotated with test
> results.
> 
> The first column of flashlights represents the status of
> Isabelle_makeall.
> 
> 
> Developer-initiated tests
> =========================
> 
> Apart from changesets from the official repository, the system also
> considers changesets from a special "testboard" repository. All
> developers can push their untested changes here, to get some automatic
> feedback. Try it out, it is really convenient.
> 
> To use testboard, it is convenient to add the following to your
> ~/.hgrc
> 
>  [paths]
>  testboard = ssh://USERNAME@macbroy20.informatik.tu-muenchen.de//home/isabelle-repository/repos/testboard
> 
> Then, local changes can be pushed to testboard via
> 
>  hg push -f testboard
> 
> and will usually be considered fairly quickly by the scheduler.
> (Note: -f is required, since the testboard repository will usually
> have multiple heads. Never use -f when pushing to the main development
> repository!)
> 
> The reports from testboard can be viewed at
> 
>  http://isabelle.in.tum.de/testboard/Isabelle
> 
> Note that testboard should be considered write-only. You should never
> pull from it unless you want to get everybody's unfinished stuff!
> 
> 
> SML/NJ testing
> ==============
> 
> Building the HOL image takes ca. 3-4 hours, and a full makeall takes
> ca. 24 hours. This has been prohibitive for pre-push testing with
> SML/NJ, so people generally have fixed issues after getting a failure.
> The new setup supports the same style of working but gives quicker
> feedback. Both the HOL image and makeall are run continuously, so
> there is much quicker feedback.
> 
> The third column in the reports view signals the status of SML/NJ
> builds.  Note that the SML_makeall test still excludes a small number
> of sessions which are too big to fit into memory.
> 
> 
> AFP
> ===
> 
> Currently, there is a reduced build of the AFP, which is run
> continuously, but it still excludes Flyspeck_Tame and
> JinjaThreads. When Gerwin is here, we'll try to make this build
> equivalent to the official AFP builds.
> 
> The results are signalled in the second column in the reports view.
> 
> 
> Data collection
> ===============
> 
> Performance data is collected in every run. When Makarius is available
> again, we must decide on a set of sessions to be run regularly in a
> reproducible environment (i.e., without other jobs or interactive
> activity on the same machine), to get less fluctuation in the
> measurements.
> 
> Judgement Day benchmarks are now run continuously on two lxlabbroy
> machines. This is still somewhat preliminary, and I must coordinate
> with Jasmin and Sascha what data should best be collected and how.
> 
> Mutabelle benchmarks are run continuously on one lxlabbroy machine in
> pretty much the same way.
> 
> The most important weakness at the moment is the lack of good plotting
> facilities for this sort of data. A student (Michael Kerscher) is
> currently working on that, with promising first results.
> 
> 
> Limitations
> ===========
> 
> - Missing email notification
> 
>  Here is some room for discussion: What would be a good notification
>  scheme? Since tests are now run continuously, sending an email after
>  each run is not really an option. Should we simply go for a daily
>  summary? More sophisticated options are possible (e.g., notify the
>  author of a broken changeset when all its parents are working), but
>  these require some thought.
> 
> - Scheduling strategies could be better sometimes
> - Plotting (see above)
> - ...
> 
> 
> 
> Alex
> _______________________________________________
> isabelle-dev mailing list
> isabelle-dev at in.tum.de
> https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev