Received: from localhost (daemon@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id AAA09634; Tue, 31 Mar 1998 00:11:21 -0500 (EST) Received: by cs.cs.utk.edu (bulk_mailer v1.9); Tue, 31 Mar 1998 00:10:54 -0500 Received: by CS.UTK.EDU (cf v2.9s-UTK) id AAA09561; Tue, 31 Mar 1998 00:10:53 -0500 (EST) Received: from koobera.math.uic.edu (koobera.math.uic.edu [131.193.178.247]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id AAA09538; Tue, 31 Mar 1998 00:10:46 -0500 (EST) Received: (qmail 16917 invoked by uid 666); 31 Mar 1998 05:29:58 -0000 Date: 31 Mar 1998 05:29:57 -0000 Message-ID: <19980331052957.16915.qmail@cr.yp.to> From: "D. J. Bernstein" To: drums@cs.utk.edu Subject: Re: Syntax issues in draft-ietf-drums-msg-fmt-04.txt Mail-Followup-To: drums@cs.utk.edu References: <19980330232102.14521.qmail@cr.yp.to> <10176.891312961@aussie.cs.mu.OZ.AU> Robert Elz writes: > In 3.3, on lexical tokens, DIGIT and atom (and CHAR and ALPHA and specials > and a whole bunch more) are all there as equals. If DIGIT wasn't meant to > be a lexical token, it wouldn't have needed to be there. The lexical analyzer is explained in 3.1.4. The structured field body 29 Mar 1998 21:20:19 -0000 contains the atoms 29 and Mar and 1998 and 21 and 20 and 19 and -0000, along with two specials, specifically colons. There are no choices here. There is no ambiguity. There are many MUA implementors who have never read RFC 822, but that's a different issue. > So, once again, and for the last time, I am not claiming that this is the > correct interpretation of 822, or even a rational one, but I fail to see > anything so blatant or compelling in 822 to make me conclude that the "20" > and "82" are atoms, That's how the lexical analyzer is specified. Structured field bodies consist of specials, quoted-strings, domain-literals, comments, atoms, and white space, with rules laid out quite clearly in RFC 822. > it would seem just as plausible to me that they could > be interpreted as each being a 2DIGIT, which is a DIGIT DIGIT, and since > each digit is (100% unambiguously) a single character, there is no space > needed between them to separate them. No, it's not plausible. RFC 822 says quite clearly that atoms are delimited by specials, quoted-strings, domain-literals, comments, and white space. In 20 Jun 82, for example, the atom 20 stops at the subsequent space. You claim that digits are special characters. You're wrong. The specials are ()<>@,;:\".[]. Digits are not in the list. They don't delimit atoms. > That, in another context, the character sequences "20" and "82" could > also be atoms is irrelevant, and that this would make building a parser > without lots of context basically impossible is also irrelevant (it's > perfectly simple to build one with context, if you know you're parsing a > time, you can easily look for individual digits, which in other fields > you would coalesce into atoms). But that's not how RFC 822 is specified. Every structured field body is fed through the same lexical analyzer. The behavior of that analyzer is perfectly clear. There is _no way_ that 20 can turn into two atoms. ---Dan Smaller, faster, safer than inetd+tcpd. http://pobox.com/~djb/ucspi-tcp.html