[isabelle-dev] eq_list

Sat Oct 27 18:55:27 CEST 2012

On Wed, 24 Oct 2012, Christian Sternagel wrote:

> browsing through the Isabelle sources (as one does once in a while) I noticed 
> that 'eq_list' could be implemented slightly more efficient. As suggested on 
> this mailing-list I also consulted the history tomes and found that in
>
>
> http://isabelle.in.tum.de/repos/isabelle/diff/d59364649bcc/src/Pure/library.ML
>
> where 'eq_list' and 'equal_lists' where unified, the old 'eq_list' avoided to 
> separately go through the input lists to compute their lengths (as I would 
> have suggested).

The older function is actually equal_lists, introduced in 6ce03d2f7d91 
(wenzelm) Sat Sep 04 21:00:20 1999, while eq_list was introduced much 
later in 46a7e133f802 (haftmann) Mon Apr 24 16:35:30 2006.  Later Florian 
noticed the overlap, and IIRC we had a brief oral discussion about it, and 
he then only re-tained his newer name for the function for a variant of my 
older implementation from 1999, which is usually more efficient in the way 
it is typically used.

> 1) Why not get rid of the length computations again?
>
> 2) The alternating (un)wrapping of pairs in the auxiliary local function of 
> 'eq_list' seems unnecessary. (I'm not sure whether the compiler does optimize 
> this away anyway or not.)
>
> Those are just peephole optimizations, but they seem easy enough.
>
> What about the following? (As always, I'm interested in historical lore as 
> well as your opinions.)
>
> fun eq_list eq (list1, list2) =
>  pointer_eq (list1, list2) orelse
>    let
>      fun eq_lst (x :: xs) (y :: ys) = eq (x, y) andalso eq_lst xs ys
>        | eq_lst [] [] = true
>        | eq_lst _ _ = false;
>    in eq_lst list1 list2 end;
>
> This partly avoids the repeated (un)wrapping of pairs (I guess curried 
> infix operators are a historical SML accident, but I digress ...) and 
> does not separately compute the length of the given lists.

I think we can forget about currying vs. tupling.  I've got used to the 
mental model that Poly/ML treats that mostly equivalent (already in the 
mid-1990-ies).  David Matthews or the PolyML.Compiler flags to emit 
assembly code mit tell more, but that issue is not really important here.

What remains is how you traverse the two tree structures that are compared 
here: depth-first (your version) or a little-bit more breadth-first (my 
version) by looking at the lengths first.

Your wording of "compute their lengths" sounds like this would be an 
expensive operation, but normally it isn't.  If you do some 
grep/hypersearch over the sources of Isabelle99, Isabelle2002, 
Isabelle2005, to see how equal_lists was originally intended and used, it 
is mainly for short lists of terms or thms.  So the structure stored in 
each cons cell is much larger than the whole length of the list. Walking 
rather painfully through a prefix with "eq" comparisons might turn out 
quite slow, if the lists disagree only later after several equal entries.

You should also take a look of the "ord" family of functions: list_ord, 
typ_ord, fast_term_ord etc.  They first take the outer structure into 
account, before going into the depth.  It was not always like that in the 
old times, and this change gave some significant performance improvement 
(apart from moving to eq-based association lists to ord-based tables in 
the first place).

 	Makarius