[isabelle-dev] [158c513a39f5] JVM crash
David Matthews
dm at prolingua.co.uk
Wed Nov 8 15:23:34 CET 2017
If these crashes are happening at the end of the build process I would
suspect that it is something to do with either the data sharing or
writing out the heap image. Writing the heap image does not involve any
concurrency but the data sharing does, so I would suspect the latter.
That has changed recently in 5.7.1 but so too has saving the heap. The
fact that it only occurs on specific hardware points to some error in
the memory model i.e. that the code is assuming something about the
consistency of memory that is not in fact true. It's impossible to be
sure about any of this without running tests on the hardware itself. At
least the problem seems to have gone away with the latest version of
Poly/ML.
David
On 08/11/2017 08:21, Manuel Eberl wrote:
> I was not able to find out what exactly crashes and why, but while
> trying to do that, I found out that the problem is actually not present
> in the development version of Isabelle anymore.
>
> After a lengthy bisection, I found that the first revision where no
> crashes occur is this one:
>
> changeset: 66920:aefaaef29c58
> user: wenzelm
> date: Thu Oct 26 13:44:41 2017 +0200
> summary: use Poly/ML 5.7.1 test version as default;
>
> So apparently, something changed in Poly/ML 5.7.1 that made the crashes
> go away. Older versions of Isabelle, i.e. Isabelle2016-1, Isabelle2017,
> and isabelle-dev up to 66919:1f93e376aeb6 /do/ exhibit the crash. It
> thus seems likely that whatever caused it was apparently present in
> multiple older Poly/ML versions.
>
> Note that the crash appears to be highly sensitive to the environment:
> Having two builds run in parallel with different Isabelle versions seems
> to make it considerably less frequent; however, only in Poly/ML 5.7.1
> does it appear to not happen at all no matter how I run it.
>
> Does Poly/ML 5.7.1 contain any changes that could plausibly cause this
> bad behaviour to go away?
>
> Manuel
>
>
> On 2017-11-07 09:51, Manuel Eberl wrote:
>> It seems like this thread is not dead yet.
>>
>> I had my CPU replaced by a new version that supposedly does not have the
>> SMD problem on Linux. The problem with Isabelle did not go away.
>>
>> I still get reproducible crashes of "isabelle build -c Pure", but only
>> with SMT switched on. However, it is worth noting that the crashes
>> always seem to happen at the end of the build process. (A successful
>> build takes about 9 seconds of elapsed and CPU time on my machine and
>> unsuccessful ones always crash at at least that time) However, it is
>> worth noting that some of the failed builds look like this:
>>
>> 0:00:30 elapsed time, 0:00:08 cpu time, factor 0.29
>>
>> Also, the last line that is printed is always
>>
>> ### theory "ML_Bootstrap"
>>
>> so perhaps that also points to some specific problem. However, I don't
>> think it's anything specific to Pure, since the crash is also
>> reproducible with other sessions if I remember correctly (I just use
>> Pure because it can be built quickly).
>>
>> To summarise:
>> - crash happens randomly in about 6% of builds of the "Pure" session
>> - crash seems specific to my Ryzen or at least my system
>> - disabling SMT makes problem go away
>> - crash reproducible on my Arch installation and a fresh install of
>> Arch, but not on Ubuntu
>> - RAM passes memtest without errors
>> - I did not experience instability with any other software
>> - My conjecture would be that it is somehow related to PolyML and
>> multithreading
>>
>> I attached both the console output and the build log.
>>
>> Manuel
>>
>>
>> On 04/09/17 15:12, Manuel Eberl wrote:
>>> Alas, it would appear I have spoken too soon!
>>>
>>> I experienced a strange build failure with RC1 yesterday and, fearing
>>> the worst, did my experiment from a few weeks ago again, with the
>>> following result:
>>>
>>> – building "Pure" fails in around 6 % of the cases
>>> – this does not change even after a cold reboot
>>> – switching SMT off seems to make the problem go away entirely
>>> – switching SMT on makes it reappear
>>>
>>> Sounds very much like this might well be the Ryzen bug. AMD has started
>>> replacing affected CPUs, so I shall enquire about that and see what happens.
>>>
>>> Cheers,
>>>
>>> Manuel
More information about the isabelle-dev
mailing list