[isabelle-dev] Subscripts within identifiers
Makarius
makarius at sketis.net
Tue Jul 2 13:15:08 CEST 2013
This issue is so old that hardly anybody remembers that it is an issue:
using subscripts within identifiers without special precautions.
Historically, I was the first to use subscripts in identifiers in my
doctoral thesis -- with some perl script over the generated latex sources
for proper visual appearance of the printed document. A bit later in
Isabelle2003, a variety of uses of sub/superscripts within the formal
language emerged, with some conflicts and potential for surprise that
resulted in slightly odd approaches that we still have today in
Isabelle2013 (although the side-conditions have changed a lot).
When visiting TUM last December, I was surprised that in a room packed
with seasoned Isabelle experts there was approx. 1 person who knew the
precise rules for subscripts within identifiers. Here is the formal joke
I had improvised in that "jam session":
definition [simp]: "⇣ = 1"
definition [simp]: "⇣⇣ = 2"
definition [simp]: "⇣⇣⇣ = 3"
lemma "⇣ + ⇣⇣ = ⇣⇣⇣" by simp
(In Proof General some of these symbols are invisible, so better use
Isabelle/jEdit here.)
After 10 years we actually have a chance to stop such jokes about
identifiers starting or ending with control symbols, or control symbols
getting out of sync, and especially the confusion about which of the two
subscript controls is what.
So here is a reform that works with minimal impact on existing sources:
* Block sub/superscripts (\<^bsub>..\<^esub> and \<^bsup>..\<^esup>)
are not affected at all -- this is a different concept.
* There is just one \<^sub> and \<^sup> control symbol to that effect;
\<^isub> and \<^isup> remain within the infinite collection of
Isabelle symbols without any special meaning.
* \<^sub> and \<^sup> are no longer "letters".
* Superscript is for explicit notation, e.g. "x\<^sup>2" for some
operator "_\<^sup>2" applied to term x.
* Subscript may get used within identifiers, e.g. "x\<^sub>2" for a
variable of that name.
The traditional ML identifier syntax is like this:
LETTER = A .. Z a .. Z
DIGIT = 0 .. 9
LETDIG = LETTER | DIGIT | _ | '
IDENTIFIER = LETTER LETDIG*
So it is a strict letter followed by a sequence of liberal letters.
Instead of making \<^sub> a "letter" as was done in Isabelle2003, the
syntax is changed like this:
LETTER = A .. Z a .. Z \<alpha> ... \<A> ... \<a> ... etc.
DIGIT = 0 .. 9
LETDIG = LETTER | DIGIT | _ | '
SUBSCRIPT = \<^sub>
IDENTIFIER = LETTER ( SUBSCRIPT? LETDIG )*
This means, the trailing sequence of liberal letters is as before, but
each one might have a subscript control before it. This is conceptually
simple, and ensures that the control symbols stay in sync (it also reduces
the potential for ambiguity wrt. notation that uses tokens involving
subscripts as well).
Note that over the years I had occasionally pondered more sophisticated
schemes for identifiers that impose more structure on the sub-components
(spans of latin vs. greek vs. underscore vs. primes etc.) but that turned
out both complicated and not quite realistic when applied to the existing
body of sources. I had also studied the syntax of Fortress, but was not
covinced in the end -- they admit themselves that things are complicated.
The included changesets for main Isabelle + AFP demonstrate that the above
is indeed rather conservative. The actual theories are then edited
automatically like this:
perl -pi~ -e 's,\Q\<^isub>\E,\\<^sub>,g;' -e 's,\Q\<^isup>\E,\\<^sup>,g;' $(find -name "*.thy")
Exploring the concrete situation (over the past few years) I learned more
about the confusion and quite different understanding of the few users who
actually ventured to use sub/superscripts in their theories. For example,
the "identifier sub/superscripts" \<^isub> \<^isup> were getting used in
notation, where \<^sub> \<^sub> would be expected. (This can be explained
by certain defaults of user interfaces, i.e. different generations of
Proof General, and Isabelle/jEdit.)
With the above simplification I would hope to see more people getting
acquainted with sub/superscripts, and enjoying the good rendering quality
of it in Isabelle/jEdit, without requiring arcane lore.
I am posting this here as an opportunity to point out further fine points,
side-conditions, or giving counter examples -- lets say over a few weeks.
Makarius
-------------- next part --------------
# HG changeset patch
# User wenzelm
# Date 1372716484 -7200
# Node ID c412cf71db5e0d90ed234596336611673193a49e
# Parent d802431fe3569404c86ce9ba48984a3e4706296b
test
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/scan.scala
--- a/src/Pure/General/scan.scala Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/scan.scala Tue Jul 02 00:08:04 2013 +0200
@@ -348,7 +348,16 @@
private def other_token(is_command: String => Boolean)
: Parser[Token] =
{
- val id = one(Symbol.is_letter) ~ many(Symbol.is_letdig) ^^ { case x ~ y => x + y }
+ val letdigs1 = many1(Symbol.is_letdig)
+ val subscript =
+ one(s =>
+ s == Symbol.sub_decoded || s == "\\<^sub>" ||
+ s == Symbol.isub_decoded || s == "\\<^isub>")
+ val id =
+ one(Symbol.is_letter) ~
+ (rep(letdigs1 | (subscript ~ letdigs1 ^^ { case x ~ y => x + y })) ^^ (_.mkString)) ^^
+ { case x ~ y => x + y }
+
val nat = many1(Symbol.is_digit)
val natdot = nat ~ "." ~ nat ^^ { case x ~ y ~ z => x + y + z }
val id_nat = id ~ opt("." ~ nat) ^^ { case x ~ Some(y ~ z) => x + y + z case x ~ None => x }
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/symbol.ML
--- a/src/Pure/General/symbol.ML Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/symbol.ML Tue Jul 02 00:08:04 2013 +0200
@@ -46,7 +46,6 @@
val decode: symbol -> sym
datatype kind = Letter | Digit | Quasi | Blank | Other
val kind: symbol -> kind
- val is_letter_symbol: symbol -> bool
val is_letter: symbol -> bool
val is_digit: symbol -> bool
val is_quasi: symbol -> bool
@@ -377,9 +376,7 @@
"\\<Upsilon>",
"\\<Phi>",
"\\<Psi>",
- "\\<Omega>",
- "\\<^isub>",
- "\\<^isup>"
+ "\\<Omega>"
];
in
@@ -520,7 +517,6 @@
fun symbolic_end (_ :: "\\<^sub>" :: _) = true
| symbolic_end (_ :: "\\<^isub>" :: _) = true
- | symbolic_end (_ :: "\\<^isup>" :: _) = true
| symbolic_end (s :: _) = is_symbolic s
| symbolic_end [] = false;
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/symbol.scala
--- a/src/Pure/General/symbol.scala Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/symbol.scala Tue Jul 02 00:08:04 2013 +0200
@@ -342,9 +342,7 @@
"\\<tau>", "\\<upsilon>", "\\<phi>", "\\<chi>", "\\<psi>",
"\\<omega>", "\\<Gamma>", "\\<Delta>", "\\<Theta>", "\\<Lambda>",
"\\<Xi>", "\\<Pi>", "\\<Sigma>", "\\<Upsilon>", "\\<Phi>",
- "\\<Psi>", "\\<Omega>",
-
- "\\<^isub>", "\\<^isup>")
+ "\\<Psi>", "\\<Omega>")
val blanks = recode_set(" ", "\t", "\n", "\u000B", "\f", "\r", "\r\n", "\\<^newline>")
diff -r d802431fe356 -r c412cf71db5e src/Pure/General/symbol_pos.ML
--- a/src/Pure/General/symbol_pos.ML Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/General/symbol_pos.ML Tue Jul 02 00:08:04 2013 +0200
@@ -215,51 +215,17 @@
local
-val latin = Symbol.is_ascii_letter;
-val digit = Symbol.is_ascii_digit;
-fun underscore s = s = "_";
-fun prime s = s = "'";
-fun subscript s = s = "\\<^sub>" orelse s = "\\<^isub>";
-fun script s = s = "\\<^sub>" orelse s = "\\<^isub>" orelse s = "\\<^isup>";
-fun special_letter s = Symbol.is_letter_symbol s andalso not (script s);
-
-val scan_plain = Scan.one ((latin orf digit orf prime) o symbol) >> single;
-val scan_digit = Scan.one (digit o symbol) >> single;
-val scan_prime = Scan.one (prime o symbol) >> single;
-val scan_extended =
- Scan.one ((latin orf digit orf prime orf underscore orf special_letter) o symbol) >> single;
-
-val scan_subscript =
- Scan.one (subscript o symbol) --
- Scan.one ((latin orf digit orf prime orf special_letter) o symbol)
- >> (fn (x, y) => [x, y]);
-
-val scan_ident_part1 =
- Scan.one (latin o symbol) ::: (Scan.repeat (scan_plain || scan_subscript) >> flat) ||
- Scan.one (special_letter o symbol) :::
- (Scan.repeat (scan_digit || scan_prime || scan_subscript) >> flat);
-
-val scan_ident_part2 =
- Scan.repeat1 (scan_plain || scan_subscript) >> flat ||
- scan_ident_part1;
+val scan_letter = Scan.one (symbol #> Symbol.is_letter);
+val scan_letdigs1 = Scan.many1 (symbol #> Symbol.is_letdig);
+val scan_subscript = Scan.one (symbol #> (fn s => s = "\\<^sub>" orelse s = "\\<^isub>"));
in
-val scan_ident0 =
- Scan.one (Symbol.is_letter o symbol) ::: Scan.many (Symbol.is_letdig o symbol);
-
-val scan_ident1 =
- Scan.one ((latin orf special_letter) o symbol) :::
- (Scan.repeat (scan_extended || Scan.one (subscript o symbol) ::: scan_extended) >> flat);
-
-val scan_ident2 =
- scan_ident_part1 @@@
- (Scan.repeat (Scan.many1 (underscore o symbol) @@@ scan_ident_part2) >> flat);
+val scan_ident =
+ scan_letter ::: (Scan.repeat (scan_letdigs1 || scan_subscript ::: scan_letdigs1) >> flat);
end;
-val scan_ident = scan_ident0;
-
fun is_identifier s =
Symbol.is_ascii_identifier s orelse
(case try (Scan.finite stopper scan_ident) (explode (s, Position.none)) of
diff -r d802431fe356 -r c412cf71db5e src/Pure/Syntax/lexicon.ML
--- a/src/Pure/Syntax/lexicon.ML Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/Syntax/lexicon.ML Tue Jul 02 00:08:04 2013 +0200
@@ -296,7 +296,6 @@
fun chop_idx [] ds = idxname [] ds
| chop_idx (cs as (_ :: "\\<^sub>" :: _)) ds = idxname cs ds
| chop_idx (cs as (_ :: "\\<^isub>" :: _)) ds = idxname cs ds
- | chop_idx (cs as (_ :: "\\<^isup>" :: _)) ds = idxname cs ds
| chop_idx (c :: cs) ds =
if Symbol.is_digit c then chop_idx cs (c :: ds)
else idxname (c :: cs) ds;
diff -r d802431fe356 -r c412cf71db5e src/Pure/term.ML
--- a/src/Pure/term.ML Mon Jul 01 15:08:29 2013 +0200
+++ b/src/Pure/term.ML Tue Jul 02 00:08:04 2013 +0200
@@ -980,7 +980,6 @@
(case rev (Symbol.explode x) of
_ :: "\\<^sub>" :: _ => false
| _ :: "\\<^isub>" :: _ => false
- | _ :: "\\<^isup>" :: _ => false
| c :: _ => Symbol.is_digit c
| _ => true);
in
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ch2.gz
Type: application/x-gzip
Size: 13039 bytes
Desc:
URL: <https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/attachments/20130702/08d280fe/attachment-0001.bin>
More information about the isabelle-dev
mailing list