[isabelle-dev] Subscripts within identifiers

Makarius makarius at sketis.net
Tue Jul 2 13:15:08 CEST 2013


This issue is so old that hardly anybody remembers that it is an issue: 
using subscripts within identifiers without special precautions.

Historically, I was the first to use subscripts in identifiers in my 
doctoral thesis -- with some perl script over the generated latex sources 
for proper visual appearance of the printed document.  A bit later in 
Isabelle2003, a variety of uses of sub/superscripts within the formal 
language emerged, with some conflicts and potential for surprise that 
resulted in slightly odd approaches that we still have today in 
Isabelle2013 (although the side-conditions have changed a lot).

When visiting TUM last December, I was surprised that in a room packed 
with seasoned Isabelle experts there was approx. 1 person who knew the 
precise rules for subscripts within identifiers.  Here is the formal joke 
I had improvised in that "jam session":

definition [simp]: "⇣ = 1"
definition [simp]: "⇣⇣ = 2"
definition [simp]: "⇣⇣⇣ = 3"

lemma "⇣ + ⇣⇣ = ⇣⇣⇣" by simp

(In Proof General some of these symbols are invisible, so better use 
Isabelle/jEdit here.)


After 10 years we actually have a chance to stop such jokes about 
identifiers starting or ending with control symbols, or control symbols 
getting out of sync, and especially the confusion about which of the two 
subscript controls is what.

So here is a reform that works with minimal impact on existing sources:

   * Block sub/superscripts (\<^bsub>..\<^esub> and \<^bsup>..\<^esup>)
     are not affected at all -- this is a different concept.

   * There is just one \<^sub> and \<^sup> control symbol to that effect;
     \<^isub> and \<^isup> remain within the infinite collection of
     Isabelle symbols without any special meaning.

   * \<^sub> and \<^sup> are no longer "letters".

   * Superscript is for explicit notation, e.g. "x\<^sup>2" for some
     operator "_\<^sup>2" applied to term x.

   * Subscript may get used within identifiers, e.g. "x\<^sub>2" for a
     variable of that name.

The traditional ML identifier syntax is like this:

   LETTER = A .. Z a .. Z
   DIGIT = 0 .. 9
   LETDIG = LETTER | DIGIT | _ | '

   IDENTIFIER = LETTER LETDIG*

So it is a strict letter followed by a sequence of liberal letters.

Instead of making \<^sub> a "letter" as was done in Isabelle2003, the 
syntax is changed like this:

   LETTER = A .. Z a .. Z \<alpha> ... \<A> ... \<a> ... etc.
   DIGIT = 0 .. 9
   LETDIG = LETTER | DIGIT | _ | '
   SUBSCRIPT = \<^sub>

   IDENTIFIER = LETTER ( SUBSCRIPT? LETDIG )*

This means, the trailing sequence of liberal letters is as before, but 
each one might have a subscript control before it.  This is conceptually 
simple, and ensures that the control symbols stay in sync (it also reduces 
the potential for ambiguity wrt. notation that uses tokens involving 
subscripts as well).

Note that over the years I had occasionally pondered more sophisticated 
schemes for identifiers that impose more structure on the sub-components 
(spans of latin vs. greek vs. underscore vs. primes etc.) but that turned 
out both complicated and not quite realistic when applied to the existing 
body of sources.  I had also studied the syntax of Fortress, but was not 
covinced in the end -- they admit themselves that things are complicated.


The included changesets for main Isabelle + AFP demonstrate that the above 
is indeed rather conservative. The actual theories are then edited 
automatically like this:

   perl -pi~ -e 's,\Q\<^isub>\E,\\<^sub>,g;' -e 's,\Q\<^isup>\E,\\<^sup>,g;' $(find -name "*.thy")

Exploring the concrete situation (over the past few years) I learned more 
about the confusion and quite different understanding of the few users who 
actually ventured to use sub/superscripts in their theories.  For example, 
the "identifier sub/superscripts" \<^isub> \<^isup> were getting used in 
notation, where \<^sub> \<^sub> would be expected.  (This can be explained 
by certain defaults of user interfaces, i.e. different generations of 
Proof General, and Isabelle/jEdit.)

With the above simplification I would hope to see more people getting 
acquainted with sub/superscripts, and enjoying the good rendering quality 
of it in Isabelle/jEdit, without requiring arcane lore.


I am posting this here as an opportunity to point out further fine points, 
side-conditions, or giving counter examples -- lets say over a few weeks.


 	Makarius
-------------- next part --------------
# HG changeset patch
# User wenzelm
# Date 1372716484 -7200
# Node ID c412cf71db5e0d90ed234596336611673193a49e
# Parent  d802431fe3569404c86ce9ba48984a3e4706296b
test

diff -r d802431fe356 -r c412cf71db5e src/Pure/General/scan.scala
--- a/src/Pure/General/scan.scala	Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/scan.scala	Tue Jul 02 00:08:04 2013 +0200
@@ -348,7 +348,16 @@
     private def other_token(is_command: String => Boolean)
       : Parser[Token] =
     {
-      val id = one(Symbol.is_letter) ~ many(Symbol.is_letdig) ^^ { case x ~ y => x + y }
+      val letdigs1 = many1(Symbol.is_letdig)
+      val subscript =
+        one(s =>
+          s == Symbol.sub_decoded || s == "\\<^sub>" ||
+          s == Symbol.isub_decoded || s == "\\<^isub>")
+      val id =
+        one(Symbol.is_letter) ~
+          (rep(letdigs1 | (subscript ~ letdigs1 ^^ { case x ~ y => x + y })) ^^ (_.mkString)) ^^
+        { case x ~ y => x + y }
+
       val nat = many1(Symbol.is_digit)
       val natdot = nat ~ "." ~ nat ^^ { case x ~ y ~ z => x + y + z }
       val id_nat = id ~ opt("." ~ nat) ^^ { case x ~ Some(y ~ z) => x + y + z case x ~ None => x }
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/symbol.ML
--- a/src/Pure/General/symbol.ML	Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/symbol.ML	Tue Jul 02 00:08:04 2013 +0200
@@ -46,7 +46,6 @@
   val decode: symbol -> sym
   datatype kind = Letter | Digit | Quasi | Blank | Other
   val kind: symbol -> kind
-  val is_letter_symbol: symbol -> bool
   val is_letter: symbol -> bool
   val is_digit: symbol -> bool
   val is_quasi: symbol -> bool
@@ -377,9 +376,7 @@
       "\\<Upsilon>",
       "\\<Phi>",
       "\\<Psi>",
-      "\\<Omega>",
-      "\\<^isub>",
-      "\\<^isup>"
+      "\\<Omega>"
     ];
 in
 
@@ -520,7 +517,6 @@
 
 fun symbolic_end (_ :: "\\<^sub>" :: _) = true
   | symbolic_end (_ :: "\\<^isub>" :: _) = true
-  | symbolic_end (_ :: "\\<^isup>" :: _) = true
   | symbolic_end (s :: _) = is_symbolic s
   | symbolic_end [] = false;
 
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/symbol.scala
--- a/src/Pure/General/symbol.scala	Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/symbol.scala	Tue Jul 02 00:08:04 2013 +0200
@@ -342,9 +342,7 @@
       "\\<tau>", "\\<upsilon>", "\\<phi>", "\\<chi>", "\\<psi>",
       "\\<omega>", "\\<Gamma>", "\\<Delta>", "\\<Theta>", "\\<Lambda>",
       "\\<Xi>", "\\<Pi>", "\\<Sigma>", "\\<Upsilon>", "\\<Phi>",
-      "\\<Psi>", "\\<Omega>",
-
-      "\\<^isub>", "\\<^isup>")
+      "\\<Psi>", "\\<Omega>")
 
     val blanks = recode_set(" ", "\t", "\n", "\u000B", "\f", "\r", "\r\n", "\\<^newline>")
 
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/symbol_pos.ML
--- a/src/Pure/General/symbol_pos.ML	Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/symbol_pos.ML	Tue Jul 02 00:08:04 2013 +0200
@@ -215,51 +215,17 @@
 
 local
 
-val latin = Symbol.is_ascii_letter;
-val digit = Symbol.is_ascii_digit;
-fun underscore s = s = "_";
-fun prime s = s = "'";
-fun subscript s = s = "\\<^sub>" orelse s = "\\<^isub>";
-fun script s = s = "\\<^sub>" orelse s = "\\<^isub>" orelse s = "\\<^isup>";
-fun special_letter s = Symbol.is_letter_symbol s andalso not (script s);
-
-val scan_plain = Scan.one ((latin orf digit orf prime) o symbol) >> single;
-val scan_digit = Scan.one (digit o symbol) >> single;
-val scan_prime = Scan.one (prime o symbol) >> single;
-val scan_extended =
-  Scan.one ((latin orf digit orf prime orf underscore orf special_letter) o symbol) >> single;
-
-val scan_subscript =
-  Scan.one (subscript o symbol) --
-  Scan.one ((latin orf digit orf prime orf special_letter) o symbol)
-  >> (fn (x, y) => [x, y]);
-
-val scan_ident_part1 =
-  Scan.one (latin o symbol) ::: (Scan.repeat (scan_plain || scan_subscript) >> flat) ||
-  Scan.one (special_letter o symbol) :::
-    (Scan.repeat (scan_digit || scan_prime || scan_subscript) >> flat);
-
-val scan_ident_part2 =
-  Scan.repeat1 (scan_plain || scan_subscript) >> flat ||
-  scan_ident_part1;
+val scan_letter = Scan.one (symbol #> Symbol.is_letter);
+val scan_letdigs1 = Scan.many1 (symbol #> Symbol.is_letdig);
+val scan_subscript = Scan.one (symbol #> (fn s => s = "\\<^sub>" orelse s = "\\<^isub>"));
 
 in
 
-val scan_ident0 =
-  Scan.one (Symbol.is_letter o symbol) ::: Scan.many (Symbol.is_letdig o symbol);
-
-val scan_ident1 =
-  Scan.one ((latin orf special_letter) o symbol) :::
-    (Scan.repeat (scan_extended || Scan.one (subscript o symbol) ::: scan_extended) >> flat);
-
-val scan_ident2 =
-  scan_ident_part1 @@@
-    (Scan.repeat (Scan.many1 (underscore o symbol) @@@ scan_ident_part2) >> flat);
+val scan_ident =
+  scan_letter ::: (Scan.repeat (scan_letdigs1 || scan_subscript ::: scan_letdigs1) >> flat);
 
 end;
 
-val scan_ident = scan_ident0;
-
 fun is_identifier s =
   Symbol.is_ascii_identifier s orelse
     (case try (Scan.finite stopper scan_ident) (explode (s, Position.none)) of
diff -r d802431fe356 -r c412cf71db5e src/Pure/Syntax/lexicon.ML
--- a/src/Pure/Syntax/lexicon.ML	Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/Syntax/lexicon.ML	Tue Jul 02 00:08:04 2013 +0200
@@ -296,7 +296,6 @@
     fun chop_idx [] ds = idxname [] ds
       | chop_idx (cs as (_ :: "\\<^sub>" :: _)) ds = idxname cs ds
       | chop_idx (cs as (_ :: "\\<^isub>" :: _)) ds = idxname cs ds
-      | chop_idx (cs as (_ :: "\\<^isup>" :: _)) ds = idxname cs ds
       | chop_idx (c :: cs) ds =
           if Symbol.is_digit c then chop_idx cs (c :: ds)
           else idxname (c :: cs) ds;
diff -r d802431fe356 -r c412cf71db5e src/Pure/term.ML
--- a/src/Pure/term.ML	Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/term.ML	Tue Jul 02 00:08:04 2013 +0200
@@ -980,7 +980,6 @@
       (case rev (Symbol.explode x) of
         _ :: "\\<^sub>" :: _ => false
       | _ :: "\\<^isub>" :: _ => false
-      | _ :: "\\<^isup>" :: _ => false
       | c :: _ => Symbol.is_digit c
       | _ => true);
   in
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ch2.gz
Type: application/x-gzip
Size: 13039 bytes
Desc: 
URL: <https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/attachments/20130702/08d280fe/attachment-0001.bin>


More information about the isabelle-dev mailing list