Received: from localhost (daemon@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA27084; Fri, 6 Dec 1996 15:12:19 -0500 Received: by CS.UTK.EDU (bulk_mailer v1.7); Fri, 6 Dec 1996 15:12:06 -0500 Received: by CS.UTK.EDU (cf v2.9s-UTK) id PAA27051; Fri, 6 Dec 1996 15:12:04 -0500 Received: from muenster.westfalen.de (root@muenster.westfalen.de [193.174.5.2]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA26847; Fri, 6 Dec 1996 15:08:59 -0500 Received: from khms.westfalen.de by muenster.westfalen.de via rsmtp with bsmtp id for ; Fri, 6 Dec 1996 21:01:02 +0100 (MET) (Smail-3.2 1996-Jul-4 #1 built 1996-Nov-13) Received: by khms.westfalen.de (CrossPoint v3.1 R/C435); 06 Dec 1996 20:52:18 +0200 Date: 06 Dec 1996 19:54:00 +0200 From: kai@khms.westfalen.de (Kai Henningsen) To: drums@cs.utk.edu Message-ID: <6MKAwovzcsB@khms.westfalen.de> In-Reply-To: Subject: Re: Form of the Message Format document X-Mailer: CrossPoint v3.1 R/C435 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Organization: Organisation? Me?! Are you kidding? X-No-Junk-Mail: I do not want to get *any* junk mail. Comment: Unsolicited commercial mail will incur an US$100 handling fee per received mail. presnick@qualcomm.com (Pete Resnick) wrote on 04.12.96 in : > On 12/4/96 at 3:15 AM -0800, Paul Overell wrote: > > >1) Mixing whitespace considerations in with the ABNF is a mess. > > I think with the correction that Dan pointed out from the get-go (removing > WS after the ":"), the number of occurances of whitespace in the grammar > drops tremendously. (Remember, *CLWSP occurs in the tokens at the beginning > of the grammar, so it doesn't have to appear in the grammar for most of the > structured headers themselves.) So, I don't think the next draft will be > nearly as messy. > > >Reinstate the two level grammar, first level: tokenizer, terminal > >symbols are characters, deals with whitespace, folding and comments; > >second level: header syntax, terminal symbols are the tokens from the > >first level. > > The problem with the way 822 is currently written is that there is no > definitive way to tell where things like whitespace and comments are > allowed. Sometimes it is clear from the syntax, but often it's either > overridden in the text or not in the text at all. For verification 822 got one thing exactly right: first tokenize your input, then analyse the token sequences. I think deviating from this was a *serious* mistake. The longer I read arguments for putting these two levels together, and the more I see the confusion this has brought, the more I believe that we should undo this. It just doesn't work. Doing it al-in-one can work for simple languages. 822 is much too complex for this. Layering is a good thing. It is true that 822 had problems with this layering. The problems were that 1. 822 did not explicitely enough specify which rules apply to which layer, and 2. the tokens 822 used were, in part, not defined in terms of the output of the tokenizer, but instead in terms of its input. However, the direction you have taken in 822bis is, in fact, exactly the wrong one - you did not eliminate these two errors, but instead make them into design decisions. > So I guess that's why I approached the problem the way I did. The important > thing was to be able to tell whether what you were sending was > *reasonable*, if not "legal" (taking "legal" in scare quotes for what it's > worth). It's also important to be able to tell what you can expect as a > receiver of messages and be able to parse it, but people can implement > parsers in all sorts of ways; what you need is a good grammar to let you > know if you've done it correctly. As written, the grammar is more useful > for writing a verifier than a parser. Over the years, I think the > verification is what people go to the document for. 1. Most of the problems I've seen on the net involve people getting parsers wrong. 2. Most of the parser problems about which I have some idea how they came about, are because people tried to do without a separate tokenizer. 3. I'm afraid I know lots of people who, on seeing "you must support obsolete syntax, this is the nonobsolete syntax, this is the obsolete syntax", will try to get away without implementing the obsolete syntax, and if someone complains, will answer "the other side is in error, the doc says this is obsolete". I might not have come so completely full circle on the obsolete part, except that the way it is done in the current spec is something I find *highly* confusing. It doesn't work. Maybe the obsolete stuff could be made to work, but certainly not without going back to the separate tokenizer. And I certainly prefer doing rules about, say, not placing comments in a place where a parser should still understand them, to be given in text form, with MUST and/or SHOULD, instead of in grammar form like in the current document. It's nice to philosophize about what the IETF should or should not do, but the unfortunate fact is that lots of people implement mail parsers, and lots of people get that wrong. The purpose of 822bis, IMO, is not to have a law to which we can point to assign blame, but to keep people from getting it wrong in the first place (at least once they see the RFC) - people should not need someone like a programming language lawyer to understand it. Currently, they do. (I might add that I do enjoy playing a language lawyer now and then on places like comp.std.c. For some reason, most C programmers seem to have trouble interpreting the standard. However, they usually do understand what a token is, once the preprocessor has done its ugly work.) MfG Kai