From:     "Erik E. Fair" (Your Friendly Postmaster) <fair@APPLE.COM>
To:       tcp-ip@NIC.DDN.MIL, unicode@SUN.COM, [...]
Subject:  Case of the Replicated Errors: An Internet Postmaster's Horror Story
Date:     Thu, 09 May 91 23:26:50 -0700


    [Forwarded to RISKS by Jerry Leichter <leichter@lrw.com> and Jim Horning]

This Is The Network: The Apple Engineering Network.

The Apple Engineering Network has about 100 IP subnets, 224 AppleTalk zones,
and over 600 AppleTalk networks. It stretches from Tokyo, Japan, to Paris,
France, with half a dozen locations in the U.S., and 40 buildings in the
Silicon Valley. It is interconnected with the Internet in three places: two in
the Silicon Valley, and one in Boston. It supports almost 10,000 users every
day.

When things go wrong with E-mail on this network, it's my problem.
My name is Fair. I carry a badge.

[insert theme from "Dragnet"]

The story you are about to read is true. The names have not been
changed so as to finger the guilty.

It was early evening, on a Monday. I was working the swing shift out of
Engineering Computer Operations under the command of Richard Herndon.  I don't
have a partner.

While I was reading my E-mail that evening, I noticed that the load
average on apple.com, our VAX-8650, had climbed way out of its normal
range to just over 72.

Upon investigation, I found that thousands of Internet hosts were trying to
send us an error message. I also found 2,000+ copies of this error message
already in our queue.

I immediately shut down the sendmail daemon which was offering SMTP service on
our VAX.

I examined the error message, and reconstructed the following sequence
of events:

We have a large community of users who use QuickMail, a popular macintosh based
E-mail system from CE Software. In order to make it possible for these users to
communicate with other users who have chosen to use other E-mail systems, ECO
supports a QuickMail to Internet E-mail gateway. We use RFC822 Internet mail
format, and RFC821 SMTP as our common intermediate E-mail standard, and we
gateway everything that we can to that standard, to promote interoperability.

The gateway that we installed for this purpose is MAIL*LINK SMTP from Starnine
Systems. This product is also known as GatorMail-Q from Cayman Systems. It does
gateway duty for all of the 3,500 QuickMail users on the Apple Engineering
Network.

Many of our users subscribe, from QuickMail, to Internet mailing lists which
are delivered to them through this gateway. One such user, Mark E. Davis, is on
the unicode@sun.com mailing list, to discuss some alternatives to ASCII with
the other members of that list.

Sometime on Monday, he replied to a message that he recieved from the mailing
list. He composed a one paragraph comment on the original message, and hit the
"send" button.

Somewhere in the process of that reply, either QuickMail or MAIL*LINK SMTP
mangled the "To:" field of the message.

The important part is that the "To:" field contained exactly one "<" character,
without a matching ">" character. This minor point caused the massive
devastation, because it interacted with a bug in sendmail.

Note that this syntax error in the "To:" field has nothing whatsoever to do
with the actual recipient list, which is handled separately, and which, in this
case, was perfectly correct.

The message made it out of the Apple Engineering Network, and over to Sun
Microsystems, where it was exploded out to all the recipients of the
unicode@sun.com mailing list.

Sendmail, arguably the standard SMTP daemon and mailer for UNIX, doesn't like
"To:" fields which are constructed as described. What it does about this is the
real problem: it sends an error message back to the sender of the message, AND
delivers the original message onward to whatever specified destinations are
listed in the recipient list.

This is deadly.

The effect was that every sendmail daemon on every host which touched
the bad message sent an error message back to us about it. I have
often dreaded the possibility that one day, every host on the Internet
(all 400,000 of them) would try to send us a message, all at once.

On monday, we got a taste of what that must be like.

I don't know how many people are on the unicode@sun.com mailing list, but I've
heard from Postmasters in Sweden, Japan, Korea, Australia, Britain, France, and
all over the U.S. I speculate that the list has at least 200 recipients, and
about 25% of them are actually UUCP sites that are MX'd on the Internet.

I destroyed about 4,000 copies of the error message in our queues here
at Apple Computer.

After I turned off our SMTP daemon, our secondary MX sites got whacked.
We have a secondary MX site so that when we're down, someone else will
collect our mail in one place, and deliver it to us in an orderly
fashion, rather than have every host which has a message for us jump on
us the very second that we come back up.

Our secondary MX is the CSNET Relay (relay.cs.net and relay2.cs.net).  They
eventually destroyed over 11,000 copies of the error message in the queues on
the two relay machines. Their postmistress was at wit's end when I spoke to
her. She wanted to know what had hit her machines.

It seems that for every one machine that had successfully contacted apple.com
and delivered a copy of that error message, there were three hosts which
couldn't get ahold of apple.com because we were overloaded from all the mail,
and so they contacted the CSNET Relay instead.

I also heard from CSNET that UUNET, a major MX site for many other hosts, had
destroyed 2,000 copies of the error message. I presume that their modems were
very busy delivering copies of the error message from outlying UUCP sites back
to us at Apple Computer.


This instantiation of this problem has abated for the moment, but I'm still
spending a lot of time answering E-mail queries from postmasters all over the
world.

The next day, I replaced the current release of MAIL*LINK SMTP with a
beta test version of their next release. It has not shown the header
mangling bug, yet.


The final chapter of this horror story has yet to be written.

The versions of sendmail with this behavior are still out there on hundreds of
thousands of computers, waiting for another chance to bury some unlucky site in
error messages.

Are you next?

[insert theme from "The Twilight Zone"]

        just the vax, ma'am,

        Erik E. Fair    apple!fair      fair@apple.com

