Received: from localhost (daemon@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id TAA27025; Mon, 31 Mar 1997 19:46:43 -0500 Received: by CS.UTK.EDU (bulk_mailer v1.7); Mon, 31 Mar 1997 19:46:24 -0500 Received: by CS.UTK.EDU (cf v2.9s-UTK) id TAA26975; Mon, 31 Mar 1997 19:46:21 -0500 Received: from ig.cs.utk.edu (IG.CS.UTK.EDU [128.169.94.149]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id TAA26966; Mon, 31 Mar 1997 19:46:16 -0500 Received: from localhost by ig.cs.utk.edu with SMTP (cf v2.11c-UTK) id TAA11237; Mon, 31 Mar 1997 19:46:11 -0500 (EST) Message-Id: <199704010046.TAA11237@ig.cs.utk.edu> X-Mailer: exmh version 2.0gamma 1/27/96 X-URI: http://www.cs.utk.edu/~moore/ From: Keith Moore To: Pete Resnick cc: Bill McQuillan , drums@cs.utk.edu, moore@cs.utk.edu Subject: Re: Comments on abnf-02 In-reply-to: Your message of "Mon, 31 Mar 1997 13:33:39 CST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 31 Mar 1997 19:46:11 -0500 Sender: moore@cs.utk.edu > That is to say, how Set combines with other constructs is pretty funky, but > it can be done. The same kind of thing happens with repeat rules; a repeat > inside of a Set means that the items repeated can be ordered in any way > within the Set. However, strictly ordered items, like within parentheses or > square brackets, can't be broken out and re-ordered. > > I think this works just fine, so long as the definition is clear. I'm still of the opinion that, due to the way that sets combine with other constructs, they just make things more complicated. If we want to specify a set of things that can occur zero or more times that can occur in any order, we can do that with RFC822 constructs: foo = *(a / b / c) means that within a "foo", a, b, and c can occur any number of times, in any order. So the only thing that sets might buy us is the ability to specify "these things can occur in any order" while still enforcing rules like "this must occur at least once" or "this must not occur more than once". Note that draft-ietf-drums-abnf-02.txt doesn't say how to do this. The "obvious" fix is to use n*m within a set to specify how many times the element can occur within the set. But this leads to really counter-intuitive results: rule rfc822 equivalent ( 1*2foo ) ( foo ) / ( foo foo ) ( 1*2foo 1*2bar ) ( foo bar ) / ( foo foo bar ) / ( foo bar bar ) / ( foo foo bar bar ) { 1*2foo } ( foo ) / ( foo foo ) { 1*2foo 1*2bar } ( foo bar ) / ( bar foo ) / ( foo foo bar ) / ( foo bar foo ) / ( foo bar bar ) / ( bar foo foo ) / ( bar foo bar ) / ( bar bar foo ) / ( foo foo bar bar ) / ( foo bar foo bar ) / ( foo bar bar foo ) / ( bar foo foo bar ) / ( bar foo bar foo ) / ( bar bar foo foo ) / { (1*2foo) 1*2bar } ( foo bar ) / ( bar foo ) / ( foo foo bar ) / ( foo bar bar ) / ( bar foo foo ) / ( bar foo bar ) / ( bar bar foo ) / ( foo foo bar bar ) / ( bar foo foo bar ) / ( bar bar foo foo ) / { (1*2foo) (1*2bar) } ( foo bar ) / ( bar foo ) / ( foo foo bar ) / ( foo bar bar ) / ( bar foo foo ) / ( bar bar foo ) / ( foo foo bar bar ) / ( bar bar foo foo ) / ...but presumably (for clarity?) you want to be able to use nonterminals within sets, and have those nonterminals expand to other sets, or even to sequences. So if zot = { 1*2bar } then { 1*2foo zot } is equivalent to { 1*2foo 1*2bar } rather than { 1*2foo (1*2bar) } which is probably not what you expected. So symbols that appear as set elements are treated differently than elsewhere...they become "macros" to be expanded *before* the set is evaluated, rather than real nonterminals (sub-goals) of the grammar. But as annoying as it would be to have symbols which are sometimes nonterminals and sometimes macros, even that's not quite right. I haven't been able to come up with a precise explanation of how n*m works within a set, which doesn't violate my intuition in some way or another. I'd far rather do away with sets entirely, and accept that any grammar will accept more strings than the actual language, and that some of the constraints must be described in prose. (this is true of all programming language grammars, so it won't hurt much in practice) -- A grammar written with this notation is going to be read by either a human or a parser generator. + If it's read by a human...no matter how we fix this notation, I don't think it's going to be any easier to read (or more likely to be interpreted correctly) than a simple set of prose rules (probably collected together under a "constraints" section) that say things like: 1. The following fields MUST NOT occur more than once: To, Cc, Bcc, From, Sender, Reply-To, Subject, ... 2. The following fields MUST occur exactly once: Return-Path, Date, ... 3. If a From field with multiple addresses appears in a message header, a Sender field MUST be present. If only one address is present in the From field, the Sender field is optional. etc. Such prose is both easy to understand and easy to implement in code. + If it's read by a parser generator...none of the parser generators I've seen can deal with anything resembling sets. So the ABNF grammar would have to be pre-processed to expand the sets into long sequences of symbols. If this preprocessing is very difficult people aren't likely to use parser generators at all, even to construct 822bis verifiers, so we might as well express things in a notation that is easy for humans to understand. If the grammar notation isn't both precise and intuitive for people, those who do try to preprocess the grammar (either by hand or by writing a preprocessor) to convert it to parser generator input are likely to mess it up and obtain incorrect parsers. Even for people who get the conversion right, the output of the preprocessor for any grammar that uses many sets (say 822bis) is likely to generate very large and slow parsing tables...assuming the resulting grammar can still be parsed given the lookahead limitations. So I don't think sets are worth the trouble even if we can work out all of the notational bugs. We more-or-less know how to write clear prose that specifies what we want to do with sets. Given that parser generators don't grok sets anyway, why should we spend more time developing a set notation and trying to get it right, than it takes to express what we want to say in prose? Keith