[isabelle-dev] mira disk usage
Makarius
makarius at sketis.net
Mon Sep 3 13:57:52 CEST 2012
On Sat, 1 Sep 2012, Lars Noschinski wrote:
> On 01.09.2012 01:15, Gerwin Klein wrote:
>> I guess the problem is that build -b produces heap images for all sessions.
>> These are many, and each image is hundreds of MB. The previous setup had
>> images for selected base sessions only and no images for the rest (almost
>> everything).
>
> The "AFP" session in mira does only produce the base images, AFAICS:
>
> $ ls AFP_c5dd6e9db74c4e9fadb373ec946c413a/polyml-5.4.1_x86_64-linux/
> Collections HOL-Nominal Jinja Pure
> Group-Ring-Module HOL-Probability LatticeProperties Refine_Monadic
> HOL HOL-Word List-Infinite Simpl
> HOL-Multivariate_Analysis HOLCF Nat-Interval-Logic log
>
>> ps: "isabelle build" takes about 5 sec on my system to figure out
>> dependencies (I think). Should this be faster?
Isabelle build produces only the required images, i.e. "inner" nodes of
the hierarchy. A more ambitious version would not even do that, forking
processes directly without going through image save/load first.
In the mira configuration
http://isabelle.in.tum.de/repos/isabelle/file/7b6beb7e99c1/Admin/mira.py I
don't see redundant -b options, in agreement with the above observation of
images found later in the file-system.
> I guess this time is spent hashing the dependencies to figure out,
> whether they have changed -- at least, it is not time based.
>
> I know that Git uses stat info as a first step and only if this changes
> proceeds reading the file.
Ever since I am taking care of theory loading and session management
(approx. 1997) I am aware of the various ways how file identifies are
managed, including their advantages and disadvantages. The "stat" model
is tied to a physical file-system, and was gradually phased from Isabelle
in the past couple of years. SHA1 computation on the whole file content
is fast enough, and has the advantage that it is well-defined for
non-local files, say via HTTP. Or files that are not files at all, just a
vector of characters sent from the front-end to the back-end.
The slowness of the preparatory stage of Isabelle build has a different
reason: reworking the old 'use' command to fit into the IDE model (without
a file-system present) I've realized that the prover can now tell about
auxiliary file-dependencies, via the new 'ML_file' command. This requires
a full outer-parse of files that might contain such commands: it also
explains the renaming from unspecific 'use' to specific 'ML_file', because
the latter does not occur by accident in source files so often.
This is a significant conceptual improvement, so the question if it is
possible to do this superficial parsing a bit faster can be sorted out in
the coming months. It might be just a matter to use some of Oderski's
.par combinators.
It also indicates how huge the HOL image has now become, And how great
Poly/ML 5.5.0 is at digesting that so quickly.
Generally, Scala/JVM technology has a tradition of slow startup, but once
it is running it runs quite fast. When I start Proof General Emacs now
and again by accident, I am surprised by its quick startup, but very slow
running afterwards. The same for Python tools like "hg view" or "hgtk".
The EPLF guys have a funny trick in their Scala TTY loop to make it appear
faster on startup as it is, but I have no plans to imitate this
illusionistic approach.
Makarius
More information about the isabelle-dev
mailing list