[isabelle-dev] More context on Isabelle memory consumption
Makarius
makarius at sketis.net
Wed Oct 19 13:24:54 CEST 2011
On Tue, 18 Oct 2011, Thomas Sewell wrote:
> My email about PolyML and memory usage has generated some discussion,
> perhaps I should give some more context.
The continuing improvements of Poly/ML statistics add important new
aspects to the overall performance monitoring of Isabelle performance,
although this is an ongoing effort for many years already.
> The L4.verified project has been stuck at Isabelle-2009-2 for years. Our
> proofs seem to be compatible with newer Isabelle versions, but we're
> running out of (virtual) memory. We've been running PolyML in 32 bit
> mode on most of our machines. As images grow beyond 4GB, we must switch
> to 64 bit mode, which roughly doubles memory use to > 8GB, which means
> replacing most of our laptops. Soon 16GB will be standard, I suppose.
Probably yes. Moore's Law for memory is still working. 5 years ago 4 GB
was really huge, but now it is not so much, something you can assume for a
"small" Laptop. A normal technical workstation of 2011 could be expected
to have 16-32 GB.
Isabelle has traditionally targeted well-equipped workstations. In the
past few years there was an unusual situation that the difference between
laptops and servers was dimishied, but the gap is opening again due to
hardware and marketing reasons (think of Smart phones, iPads, light
Netbooks etc. opposed to huge "cloud" facilities).
I don't think it is realistic to follow the path of such mobile devices
for local execution of the proof develpment environment. It is imaginable
to do it remotely, though, say via some "proof cloud service". We have
had this essentially in the past already: In the classic times of the Bali
project, for example, it was commonplace to run the actual proof process
in the background on a big server.
> In the meanwhile I've been having a look at Isabelle memory use. I've
> never really understood why Isabelle images consume gigabytes of memory.
> With full proofs off, the only real inputs should be the statements of
> hypotheses and the definitions of types and terms. Isabelle's output is
> many orders of magnitude bigger than these inputs.
You probably mean heap size 1 GB, not image size -- the latter is fully
shared an usually quite compact. Taking current hardware, 1 GB is not so
much. Empirically, a software system always grows to fill up the
available space. The question is what can be done with that, i.e. if it
is put into proper use or just wasted. (I am also the one who
periodically reminds people of Isabelle/HOL bloat, even though it builds
faster on my machine than ever before in history: 2min 16s on 4 cores with
maybe 4-8 GB.
Understanding the full details of memory usage is hard, often infeasible.
After years of Isabelle maintanence and optimization, I have myself
developed a certain mental model that approximates this complex situation.
Counting mere types and terms of the core logic is not realistic. The
system is much more than the logic, even though everything happens to be
reduced to some core calculus. And even the logic itself has a lot of not
immediately visible overhead compared to rather compact source
representation. As a rule of thumb the explicit type information is 10
times bigger than the overall term structure. You have already
rediscovered some aspects of that in your recent profiling 4 measurement.
> The initial thought was that the record package would be representative.
> The HOL image has grown because it has more in it, but the theorems
> generated in defining a large record stay roughly constant.
> ...
> The record package uses multiple typedefs to simplify the types of some
> terms. In 2009-2 a typedef caused a theory to be checkpointed 8 times,
> in 2011-2 30 times.
>
> This gives us a large collection of hypotheses, all of which are tedious
> to investigate. Please appreciate how valuable empirical data from
> PolyML is. Up until now we've been poking at these issues in the dark.
One more performance figure is good, but many other things have been
measured over the years, with an ever improving tendency. So profiling 4
is just one more step.
Since a couple of years, my morning ritual is to look first at the charts
that are generated from the isatest runs, e.g. see
http://isabelle.in.tum.de/devel/stats/at-poly.html
Every tiny unexplained change in performance usually leads to some
investigation of changesets, often with a private mail exchange with the
people involved. Thus we have managed to keep an ever growing behemoth
manageable, and actually devliver some performance to end-users, who in
turn produce larger and larger applications to fill up the available space
and time.
Here is a list of things that can be easily contributed externally to
continue the continious efforts to manage the system, without postulating
arbitrary quick changes to the incredibly complex Isabelle/Pure and main
HOL/Tools:
* Improve isatest to include additional measurements, say from profiling
4, and the new (still experimental) Poly/ML Statistics facility.
* Improve isatest to remove some assumptions about shared NFS home
directories, to make it work reliably on vmbroy9 under Windows/Cygwin.
Since that platform has more resource limitations, it will indirectly
result in a leaner Isabelle system and library over time.
* Improve Mira to take over the main statistics functions of old
isatest. So a better Mira could make isatest obsolete eventually.
* Ensure that the big applications, which are now mostly in AFP, produce
usable statistics. Recently there was very little information coming
from this source, as I have pointed out several times already.
* Publish the L4.verified sources together with results from the
checking process. The closed nature of the application makes it
much more subject to performance issue -- this then really means
"poking in the dark".
* Procure financial resources to have David Matthews improve the Poly/ML
statistics facilities even further. Not just for small mobile
devices in 32bit mode, but also for bigger iron with large degree of
parallelism. (This is technically rather delicate, and there are big
companies making big money just with performance monitoring tools for
certain platforms.)
Makarius
More information about the isabelle-dev
mailing list