Received: from localhost (daemon@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA19302; Thu, 24 Jul 1997 13:06:43 -0400 (EDT) Received: by CS.UTK.EDU (bulk_mailer v1.7); Thu, 24 Jul 1997 13:06:35 -0400 Received: by CS.UTK.EDU (cf v2.9s-UTK) id NAA19267; Thu, 24 Jul 1997 13:06:34 -0400 (EDT) Received: from THOR.INNOSOFT.COM (THOR.INNOSOFT.COM [192.160.253.66]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id NAA19254; Thu, 24 Jul 1997 13:06:28 -0400 (EDT) Received: from eleanor.innosoft.com ("port 47141"@ELEANOR.INNOSOFT.COM) by INNOSOFT.COM (PMDF V5.1-8 #8694) with SMTP id <01ILM0D8FW7Q8WX1KG@INNOSOFT.COM> for drums@cs.utk.edu; Thu, 24 Jul 1997 10:04:51 PDT Date: Thu, 24 Jul 1997 10:06:41 -0700 (PDT) From: Chris Newman Subject: Re: questions about draft-ietf-drums-abnf-03.txt In-reply-to: <18100.869754630@munnari.OZ.AU> To: Robert Elz Cc: Detailed Revision/Update of Message Standards Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Originator-Info: login-id=chris; server=thor.innosoft.com On Fri, 25 Jul 1997, Robert Elz wrote: > To a single lexical token actually, which at the level ABNF is usually > used is an 8 bit character (whether or not implementatins of the grammars > then use a parser to take single characters, or have the lexer also recignise > higher level objects). > > However, ABNF isn't (and shouldn't be) restricted to 8 bit characters. > This is, however, something that I'm not sure has received much thought > in the WG, and it is something that we ought to make sure that we are > considering properly. How about something along the lines of: An ABNF grammar may specify an output size other than the default octet size. When a larger output size is used and encoded as octets, the numbers are represented in network octet order (most significant octet first). > They ought be, so grammars where the lexical elements are (say) UCS-4 > (if I have the right magic name - I mean 10646 4 byte things) can be > written. Yes. UTF-16 and UCS-4 should eventually be acceptable character encoding schemes. For now, the IETF is recommending UTF-8 for compatibility with the installed base. > | I suggest that % in front of a string means could mean case insensitive, The problem with all these notations is they're too non-obvious so are likely to result in implementation bugs. > | 4) You removed the list notation with "#". I have no need for it > | currently, but I wondered why it disappeared. > > Because it turned out that no-one had a use for it, nor could anyone think > of one. In 822 it made some sense, but only because of the lexically > implied LWSP that could be inserted anywhere. Without that the use of "#" > is a little hard (though not totally impossible). But given the extremes > would would need to go to to use it in many cases, and the ease with which > the same syntax can be derived using the "*" and concatenation operators, > it just wasn't worth keeping. Actually I think the compelling argument was that half the specifications which use "#" end up re-defining its meaning so it's not used consistantly. Since it's easy enough to do without, it's worth it for consistancy.