[isabelle-dev] Using PolyML's memory consumption profiling on Isabelle

Fri Oct 14 13:37:07 CEST 2011

On Fri, 14 Oct 2011, Andreas Schropp wrote:

>> Isabelle is generating a *lot* of copies of types& terms, particularly 
>> via Term.map_atyps. Since shareCommonData eliminates them, many are 
>> duplicates. It's possible that further use of the "same" variants from 
>> Term_Subst might help. It's also possible that the repeated 
>> reconstruction is necessary (e.g. repeated generalization/abstraction 
>> of type variables) and further use of the new Term_Sharing mechanism 
>> might be the answer.
>> 
>
> The way I learned Isabelle programming one views term-traversal as being 
> cheap, which seems to be true most of the time especially when they are 
> freshly allocated with nice locality properties. Sharing lots of 
> subterms might interfere with this. Isn't this what GC was made for? Why 
> introduce artificial sharing?

There is a spectrum of things that can be done, and the various tendencies 
in optimization are often in conflict.  Andreas is right about his 
intuition how operations in Isabelle are usually done.  GC works pretty 
well and with an increasing tendency of multiple cores and distributed 
memory excessive sharing could become counter-productive.

We used to have global sharing of term structure for results for the sake 
of SML/NJ, which lacks the convenience of share_common_data of Poly/ML. 
When the parallelization of Poly/ML and Isabelle/ML started around 2007, I 
removed that global resource for performance reasons.  As a rule of thumb 
it is better to let parallel threads run freely and independently, even if 
they allocate redundant data -- in different regions of the heap.

I reintroduced some bits of Term_Sharing recently for different reasons, 
to ensure that terms entering the system by the new protocol layers are 
"interned" in the way one normally expects.  (The concrete syntax layer 
did this implicitly due to lookup of formaly entities.)

The details of parallel / distributed memory management are a black art, 
at all levels (hardware, runtime-system, application code).  David 
Matthews is the one who understands that best :-)

> BTW: the Coq kernel does huge amounts of sharing IIRC. Should we be 
> concerned or is this just a thing because of proof terms?

I can't say what Coq does exactly -- the general tendency there (and 
OCaml) seems to target hardware from the mid 1990-ies.  They also have 
serious problems catching up with the multicore trend, but I am actually 
collaborating with some Coq experts on the parallel prover theme so some 
improvements in the general understanding can be anticipated.

> Makarius, please comment on this, because now I feel like a wasteful 
> programmer. ;D

I see no particular problems here.  Just continue with a purely functional 
mindset, with a bit of a sense for the general workings of the 
runtime-system and underlying hardware.

 	Makarius