[isabelle-dev] New Testing Infrastructure -- status report
Gerwin Klein
gerwin.klein at nicta.com.au
Tue May 31 17:30:34 CEST 2011
This looks very nice!
Looking forward to integrate the AFP.
Cheers,
Gerwin
On 30/05/2011, at 9:54 AM, Alexander Krauss wrote:
> Hi all,
>
> In the past weeks, there has been some progress with our new testing
> infrastructure, which I would like to summarize here. Please give
> feedback, ask questions, and discuss.
>
>
> Really fast status for "isabelle makeall"
> =========================================
>
> Thanks to a new fat server with 24 cores and 130G of RAM, we can now
> run "isabelle makeall all" in 17-18 minutes. This is done on all
> changesets as they come in, so we always know quickly when the tip of
> the repository breaks.
>
> The reports with logs etc. can all be browsed at
>
> http://isabelle.in.tum.de/reports/Isabelle/
>
> This is a fully functional repository view, annotated with test
> results.
>
> The first column of flashlights represents the status of
> Isabelle_makeall.
>
>
> Developer-initiated tests
> =========================
>
> Apart from changesets from the official repository, the system also
> considers changesets from a special "testboard" repository. All
> developers can push their untested changes here, to get some automatic
> feedback. Try it out, it is really convenient.
>
> To use testboard, it is convenient to add the following to your
> ~/.hgrc
>
> [paths]
> testboard = ssh://USERNAME@macbroy20.informatik.tu-muenchen.de//home/isabelle-repository/repos/testboard
>
> Then, local changes can be pushed to testboard via
>
> hg push -f testboard
>
> and will usually be considered fairly quickly by the scheduler.
> (Note: -f is required, since the testboard repository will usually
> have multiple heads. Never use -f when pushing to the main development
> repository!)
>
> The reports from testboard can be viewed at
>
> http://isabelle.in.tum.de/testboard/Isabelle
>
> Note that testboard should be considered write-only. You should never
> pull from it unless you want to get everybody's unfinished stuff!
>
>
> SML/NJ testing
> ==============
>
> Building the HOL image takes ca. 3-4 hours, and a full makeall takes
> ca. 24 hours. This has been prohibitive for pre-push testing with
> SML/NJ, so people generally have fixed issues after getting a failure.
> The new setup supports the same style of working but gives quicker
> feedback. Both the HOL image and makeall are run continuously, so
> there is much quicker feedback.
>
> The third column in the reports view signals the status of SML/NJ
> builds. Note that the SML_makeall test still excludes a small number
> of sessions which are too big to fit into memory.
>
>
> AFP
> ===
>
> Currently, there is a reduced build of the AFP, which is run
> continuously, but it still excludes Flyspeck_Tame and
> JinjaThreads. When Gerwin is here, we'll try to make this build
> equivalent to the official AFP builds.
>
> The results are signalled in the second column in the reports view.
>
>
> Data collection
> ===============
>
> Performance data is collected in every run. When Makarius is available
> again, we must decide on a set of sessions to be run regularly in a
> reproducible environment (i.e., without other jobs or interactive
> activity on the same machine), to get less fluctuation in the
> measurements.
>
> Judgement Day benchmarks are now run continuously on two lxlabbroy
> machines. This is still somewhat preliminary, and I must coordinate
> with Jasmin and Sascha what data should best be collected and how.
>
> Mutabelle benchmarks are run continuously on one lxlabbroy machine in
> pretty much the same way.
>
> The most important weakness at the moment is the lack of good plotting
> facilities for this sort of data. A student (Michael Kerscher) is
> currently working on that, with promising first results.
>
>
> Limitations
> ===========
>
> - Missing email notification
>
> Here is some room for discussion: What would be a good notification
> scheme? Since tests are now run continuously, sending an email after
> each run is not really an option. Should we simply go for a daily
> summary? More sophisticated options are possible (e.g., notify the
> author of a broken changeset when all its parents are working), but
> these require some thought.
>
> - Scheduling strategies could be better sometimes
> - Plotting (see above)
> - ...
>
>
>
> Alex
> _______________________________________________
> isabelle-dev mailing list
> isabelle-dev at in.tum.de
> https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
>
More information about the isabelle-dev
mailing list