Received: from localhost (daemon@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA10958; Sat, 13 Apr 1996 10:20:26 -0400 Received: by CS.UTK.EDU (bulk_mailer v1.6); Sat, 13 Apr 1996 10:18:34 -0400 Received: from muenster.westfalen.de (root@muenster.westfalen.de [193.174.5.2]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA10756; Sat, 13 Apr 1996 10:18:05 -0400 Received: by muenster.westfalen.de (/\oo/\ Smail3.1.29.1 #29.3) id ; Sat, 13 Apr 96 16:02 MET DST Received: by khms.westfalen.de (CrossPoint v3.1 R/C435); 13 Apr 1996 15:55:04 +0200 Date: 13 Apr 1996 15:50:00 +0200 From: kai@khms.westfalen.de (Kai Henningsen) To: drums@cs.utk.edu Message-ID: <66oTZwLjcsB@khms.westfalen.de> In-Reply-To: <828474959.10043.0@nifty.andrew.cmu.edu> Subject: Character set issues X-Mailer: CrossPoint v3.1 R/C435 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Organization: Organisation? Me?! Are you kidding? X-No-Junk-Mail: I do not want to get *any* junk mail. Comment: Unsolicited commercial mail will incur an US$100 handling fee per received mail. 821/822 contain two design decisions that, at least nowadays and from my point of view, were bad ideas. First, they explicitely disallowed characters from 0x80 .. 0xff. We already had this discussion; I suggest that 82[12]bis should, at least, say that mailers MUST handle those chars sanely. With respect to headers, I believe "sanely" means "like any other odd non-special". This will allow a later standard (say, 82[12]ter) to define a meaning for those chars and/or lift any remaining ban on not sending them. Second, the old standards allow any control chars in headers, including local-parts. It is rather easy to see that actually using this "feature" will have all sorts of ugly results - for example, think of the well-known "ANSI bomb". Or how many mail handlers, which, after all, are most often written in C or C++, do you expect to handle random '\0' chars in a way remotely similar to intend? It's probably time for another mail archive survey. I think we'll find that a. No mailbox needs the use of control chars. b. Only very few control chars are both used and at least marginally safe in headers (I'd personally limit those to crlf strictly as a line delimiter, tab, and backspace, and only in places like unstructured fields, comments, or quoted phrases). Let's look at the litmus test: > 1) The legal 822 structure will often break in the installed base. check. > 2) The legal 822 structure is mostly unused. check, unless the survey shows proof - I would be very surprised. > 3) There is no legitimate reason to use the legal 822 structure. Well, you might argue 2022 type escapes in subjects; however, allowing ESC _is_ dangerous, as noted above. I can think of no other reason. > 4) Removal of the legal 822 structure would simplify parsers, the > document, or have other technical benefits. Removing ESC certainly improves security. Removing NUL, CR and LF except for CRLF line breaks, certainly makes correct parsers far simpler to write. In conclusion, I believe control chars should be mostly forbidden in the bis version. MfG Kai