Received: from localhost (daemon@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id BAA24857; Thu, 5 Dec 1996 01:01:45 -0500 Received: by CS.UTK.EDU (bulk_mailer v1.7); Thu, 5 Dec 1996 01:01:12 -0500 Received: by CS.UTK.EDU (cf v2.9s-UTK) id BAA24808; Thu, 5 Dec 1996 01:01:09 -0500 Received: from koobera.math.uic.edu (qmailr@koobera.math.uic.edu [128.248.178.247]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id BAA24796; Thu, 5 Dec 1996 01:01:03 -0500 Received: (qmail 23405 invoked by uid 666); 5 Dec 1996 06:06:41 -0000 Date: 5 Dec 1996 06:06:41 -0000 Message-ID: <19961205060641.23404.qmail@koobera.math.uic.edu> From: "D. J. Bernstein" To: drums@cs.utk.edu Subject: Re: Form of the Message Format document > the standard should not be implementation instructions; ``Do this. Do that.'' What's wrong with ``Do what this code does''? Look at the as-if principle in the C standard. Tremendously useful. > but people can implement > parsers in all sorts of ways; what you need is a good grammar to let you > know if you've done it correctly. ``Need''? What for? A message creator could feed sample outputs through a syntax checker, which would test the MUSTs and SHOULDs. Is this the application you're talking about? Why is ``a good grammar'' the only way to provide such a checker? Are you sure that religious adherence to a single tool is going to produce the best result? Don't you think strcasecmp() might help? Yes or no: Is DRUMS going to provide a usable syntax checker? If yes, let's actually _do_ it. Right now we're not even close. > As written, the grammar is more useful for writing a verifier than a parser. I don't believe you. Show me a verifier. > The problem with the way 822 is currently written is that there is no > definitive way to tell where things like whitespace and comments are > allowed. RFC 822 states clearly that whitespace and comments are allowed between tokens in structured fields. I don't like the RFC 822 exposition of the tokenizer. ABNF really hurts. Pseudocode would have been much easier to read and much easier to apply. However, the spec is definitive; there's no dispute about how the tokenizer is supposed to work. In the high-level syntax, RFC 822 fails to explicitly state its notation for _tokens_. The string "Mar" refers to an atom with contents Mar; the string ":" refers to a colon token; the string 4DIGIT refers to an atom whose contents are 4 digits. Fix: Define the notation. > I go to 822 to answer the question "Can free insertion of linear whitespace > go *here*?", where "here" is any number of places in a header, it's very > difficult to tell. A document written my way would make it crystal clear---and would only have to say it _once_. > Having everything in the grammar > leaves no ambiguity, and having them in the prose is almost guaranteeing it. Really? How come your grammar allows ``To: anything I want''? How come your grammar allows the string ``foo'' to be parsed as three atoms? ---Dan Put an end to unauthorized mail relaying. http://pobox.com/~djb/qmail.html