Rant/peeve of the day: E-mailing Photos
Monday, November 27, 2006
At the risk of sounding like a zealous e-mail nutcase, I am going to put my foot down and firmly declare my belief that sending large binary attachments (e.g., photos, big PowerPoint presentations, MP3s, etc.) by e-mail is simply immoral and wrong. They are, dare I say it, sinful. Here are the three reasons why attachments in e-mails are evil:
First, e-mail was designed as a way to send messages, not parcels. It was designed from the start as a way to send plain text. It was not designed as a way to send parcels of binary data because there are a number of other, more suitable, ways to accomplish that, through methods like UUCP (though it too was 6-bit), FTP, and later, HTTP, SFTP, and various P2P protocols. If one ever looked at the source of an e-mail containing a binary attachment (e.g., the "Show Original" option in Gmail or the "Message Source" option in Outlook Express), one will quickly see just how text-oriented e-mail is. There is simply no way to send binary data using SMTP (although the new 8BITMIME handling can potentially alleviate this, it is rarely ever used). As a result, binary data must somehow be encoded as plain text if it is to be transmitted via SMTP. The most common encoding used today, Base64 (other encoding methods--e.g., Uuencode--work in a similar way and suffer from the same problems), converts 8-bit binary data (base-256) into 6-bit plain text (base-64; 26 capital letters, 26 lowercase letters, 10 digits, and two other characters make up the 64 characters). This means that any binary data transmitted via SMTP automatically incurs a 33% storage and bandwidth penalty in addition to a processing penalty because of the need to encode and decode between 8-bit and 6-bit data.
Second, in addition to this overhead inefficiency, there is now a certain inelegance added to e-mail. There now needs to be a way to tell the difference between regular text data and binary data that has been re-encoded into a block of text. This is done rather inelegantly by randomly generating a string of text, checking to make sure that this random string does not already exist somewhere in the encoded e-mail, and then inserting this random bit of text as needed to serve as a sort of ad-hoc boundary between the text and attachment sections of e-mails. Coupled with the encoding and decoding, this adds a certain degree of complexity to e-mail handling software, making the e-mail handling process more error-prone and adding more hurdles to the process of writing custom e-mail handling tools. Although modern webmail interfaces and e-mail programs are now so good at handling attachments that all the underlying grotesqueness is obfuscated and hidden far away from the user, this was not the case a decade ago when it was not too uncommon to encounter problems sending, receiving, and decoding attachments (been there, done that, got the t-shirt). Just because e-mail attachments work smoothly nowadays does not change the fact that underneath the veneer, it is still an ugly bastardization.
The final argument against e-mail attachments is one of infrastructure. Unlike HTTP, FTP, etc., e-mail is not a way to directly send data between two computers. For example, when someone at gmail.com e-mails someone at hmc.edu, the e-mail first travels from the user's computer to Gmail's "server". It then travels from Gmail's "server" to a server run by the Postini company, which then scans the e-mail for viruses and also determines if the e-mail is spam (it used to be that this filtering was handled by HMC's own server, but it was eventually outsourced to a commercial company, presumably because this filtering process was overloading the system). After processing, the Postini server then sends the e-mail back to yet another server at HMC that the students can then connect to and retrieve their e-mail. That is, unless they have an e-mail forward set up, in which case, that e-mail is transmitted once again to yet another chain of mail servers. So in this scenario, in the process of getting from one person's computer to another's, an e-mail message passes through at least four (and maybe five or six if there is a forward) different servers! Aggravating this problem SMTP retransmission, there is no pipelining. A SMTP server must fully receive the entire e-mail, save it to memory or to disk, perform any necessary processing (spam checking, virus scanning, or digital signature verification in the case of DomainKeys, all of which are somewhat taxing, hence why more and more organizations are outsourcing e-mail processing), and then finally retransmit it. In contrast, when a file is passes through a bunch of routers when going from one computer to another, there is pipelining because each router does not have to wait for all the packets to arrive before sending it off to the next. While this feature of SMTP (which is what gives e-mail the robustness needed for reliable communication) is not very problematic when dealing with small messages a few kilobytes in size, multi-megabyte files are not well-suited to this form of message transmission and will result in high latencies and even delays. In the worst case, some SMTP servers will simply fail when the message size becomes too great for them to efficiently process.
Trying to shoehorn the ability to send files into a system that was never designed for such use introduces a significant inefficiency in the packaging of the message, introduces ugliness into the structure the message, and involves the use of a system of message transmission that is far from ideal for the transmission of large data. The problem, unfortunately, is that none of the other ways to transmit data is as accessible. A peer-to-peer method, such as using the file sending function of various instant messaging systems, is very efficient, but will work only if both people are online at the same time. HTTP, FTP, SFTP, etc. will work, but requires that people either run their own servers or have easy and quick access to one. Unfortunately, while companies seem perfectly happy to promote and offer an inefficient system like e-mail for transmitting large amounts of data, they often put a lot of restrictions on any proper file storage and transmission services that are offered (and usually fail to offer easy, user-friendly ways for people to upload and manage data). This is probably because people who would abuse the system by transmitting things like warez would never use e-mail because it is so inefficient and unsuitable and thus e-mail providers are generally not worried about nefarious uses of large file transmission through SMTP like they are about nefarious uses of large file transmission through other means, which is unfortunate because this is effectively forcing people to resort to SMTP.
So make this your New Year's Resolution: Try to send files via a proper medium, if possible. Oh, and while you are at it, please set the default in your webmail or your mail software to use plain text by default instead of formatted HTML mail (my three e-mail pet peeves are attachments, HTML mail, and people forgetting to use BCC for multi-recipient messages).
Together, we can help purify the Internet... either that, or at least hold out like a bunch of Luddites, but the former sounds better. ;)