Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id SAA07896; Wed, 13 Mar 1996 18:17:38 -0500 Received: by CS.UTK.EDU (bulk_mailer v1.4); Wed, 13 Mar 1996 18:17:19 -0500 Received: from wilma.cs.utk.edu by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id SAA07874; Wed, 13 Mar 1996 18:17:18 -0500 Received: from LOCALHOST by wilma.cs.utk.edu with SMTP (cf v2.11c-UTK) id SAA18978; Wed, 13 Mar 1996 18:17:16 -0500 Message-Id: <199603132317.SAA18978@wilma.cs.utk.edu> X-Mailer: exmh version 1.6.5 12/11/95 X-URI: http://www.cs.utk.edu/~moore/ From: Keith Moore To: Mark Crispin cc: John Gardiner Myers , drums@cs.utk.edu, moore@cs.utk.edu Subject: Re: specials In-reply-to: Your message of "Wed, 13 Mar 1996 13:29:42 PST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 13 Mar 1996 18:17:09 -0500 Sender: moore@cs.utk.edu > On Wed, 13 Mar 1996 15:51:42 -0500 (EST), John Gardiner Myers wrote: > > My understanding is that there are two key issues: > > > > * A large number of implementations that can't handle leading, > > trailing, or doubled dots unquoted in local-parts. > > Is this really an issue? I suspect that there are many more > implementations that are clueless about the full semantics of RFC 822. > Such implementations (usually ancient ones, but certain recent > commercial products as well) do really stupid rules like: > 1) for each substring delimited by comma, search for something inside > a pair of <>'s. If found, collect its contents as an address. > 2) if not found, then collect the entire substring as an address, but > disregard anything within a pair of ()'s. > 3) mailbox is the left half of an @, domain is the right half. Trim > leading and trailing whitespace. > > It's been at least 18 years since RFC 733 and people should know > better, but they still write address parsers which use rules such > as the above. Somehow I don't think 733 is the culprit for any implementation written in the last ten years or so. Rather, RFC 822 messages look simple enough that people are tempted to implement it without reading the spec. Either that, or they try to read the spec once, their eyes glaze over, and they've got a product deadline in 2 weeks, so they just ignore it. After all, the product "works just fine" for the one or two test cases they try ... > What is remarkable is that, for the 99% case, such parsers are viable and > sometimes "work better" (from the user's perspective) than parsers which > adhere more closely to 822's rules. Many of us had had the unpleasant > experience of explaining to our users why our fancy RFC-obeying > parsers fail when a stupid parser program "works just fine". I can see this as an argument for not generating addresses that will confuse stupid parsers. I can't see this as an argument for generating addresses that will confuse non-stupid parsers. Just as there are lots of UAs with stupid parsers out there, there are also lots of UAs with smart parsers that have extra productions in there to accomodate the most common protocol errors. Relaxing the protocol breaks those parsers. > I've never encountered a leading, trailing, or doubled dot in a > local-part. I agree that it would be somewhat surprising/silly. One reason for this may have nothing to do with 822. If MTAs tend not to use such patterns in mailbox names, we're not likely to see them in 822. On the other hand, were we suddenly to start allowing such patterns, somebody would find a use for them. Using "." in such places might seem odd at first, but if it "works just fine" for some people, the practice may catch on even though it doesn't work well for others. > But on the other hand I think we're going to extremes to prevent something > that may never happen, or that if it does happen will be caused by someone > who won't care about the impact on the result of the community anyway. Judging by their implementations, lots of implementors don't seem to care about the result on the community. (rumor even has it that some implementors have deliberately introduced bugs so that their MTAs or gateways will work fine with their own products but only barely work with other vendors' products). We can't really stop that. What we can do is: a) give good guidance to someone who wants to do the right thing, and b) give users, those in charge of procurement, etc., something with which they can measure how well a particular implementation complies. > EVERY SINGLE instance of a misparse related to "." has been because I > treated "." as a special in the parse. The most common example is > unquoted MIME parameter values, to the point that I'm tempted to use > the "."-free specials list for parsing MIME parameters too. In fact, for MIME headers, this is the correct thing to do. Since RFC 1521, MIME tspecials do not include ".". > I have *NEVER* encountered a misparse > due to failure to treat "." as a special. > > My conclusion is that there are far more negative consequences from > treating "." as a special when parsing than there are from not doing so. RFC 822 doesn't define the behavior of a UA when it sees an error; as long as you correctly parse well-formed messages, you're fine. If you can parse some messages that aren't well-formed, that's fine too, as long as the results aren't surprising (like, if an attempt to parse the recipient list in a broken message misinterprets the recipient list and causes mail to be sent to the wrong person). I also have experience with using a strict parser for an email server, and have had to add lots of hacks to deal with several common kinds of brain damage. I don't recommend being strict as an implementation strategy, but neither do I recommend a parser as lax as the one you describe above. (I have experience with one similar to that one also; it was more error-prone than the strict parser.) Obviously, there are problems with both "strict" and "lax". We can define what "strict" means, and we can specify that a UA must have a certain amount of laxity when reading messages (when the interpretation is unambiguous) to deal with common errors. But I'm against changing the language to the point that it becomes legal to emit things that can't be dealt with by a significant set of users without a very good reason (say, IPv6 domain literals), and also against changing the language to make it legal to emit things that we have no real use for, even if it doesn't appear to have a big operational impact. (we could always be wrong. just because people don't tend to use usernames like .blank.@some.domain now, doesn't mean that the mail system would tolerate it if we did.) Keith -------- Terrestrial governments have no authority in cyberspace.