A primer on how the train wrecks of yesteryear paved the way for technology that keeps data transfer safe today

On July 17, 1856, a hot, humid day in southeastern Pennsylvania, two passenger trains traveling in opposite directions on a single track crashed head-on just outside Fort Washington.

It was a tragedy of monstrous proportions. The train bound west, a special-excursion train, was crammed full of boisterous schoolchildren en route from Philadelphia; the one headed east, a local train, carried sleepy commuters on their way to work. All told, 60 people died, and 60 more were injured. That reverberations from the disaster would be heard more than 100 years later seemed as likely as snow in summer.

Yet those echoes are everywhere -- in the way telephone calls are forwarded from one party to the next, to the way E-mail flies safely from computer to computer. Traffic, after all, is traffic -- whether it's made up of steam engines on tracks or high-speed data streams in fiber-optic cable. What caused the two trains to taste metal could just as easily make a computer network crash: faulty protocol design. Systems engineers have taken to heart the lessons learned from the crash in Pennsylvania in 1856 and other railway accidents, and have developed sophisticated software to make sure that E-mail and other traveling data arrive intact.

In the early days of railroads, train engineers had tried to do the same for passengers. They developed sets of rules to govern traffic on the rails ("protocols," in computer parlance), to make sure that trains traveling in opposite directions over a single track could do so with impunity.

In 1856 one such protocol was the North Pennsylvania Railroad's rule for handling train delays: Any train awaiting the passing of a train traveling in the opposite direction had to wait at least an additional 15 minutes if the oncoming train failed to appear. The conductor of a delayed train had to make sure to move his train onto a siding if he could not make it to the regular meeting point within that 15-minute interval.

On that fateful day in July, the trains' respective conductors had tried to follow the rule to the letter. The special-excursion train was scheduled to arrive in Fort Washington at 6:00 a.m. The local train was scheduled to leave the Fort Washington station at 6:15 a.m. However, the special-excursion train was held up; it did not make it to Fort Washington by 6:00 a.m. So the delay rule had to be applied. But there was a fatal flaw to the rule: it did not specify whether the 15-minute grace period started at the time the first train (the special-excursion train, in this case) was scheduled to arrive or at the time the second train (the commuter) was scheduled to depart. As it happened, the conductor of the local train assumed it was the former, and the conductor of the special assumed it was the latter. The two trains collided shortly after 6:15 a.m.

Today communications protocols formalize the rules computers must follow to exchange information such as E-mail and documents. Some of those rules are simple, among them identifying the sender and the receiver, describing how the information should be encoded, and spelling out how the integrity of information can be verified. But some of them are very complicated, among them the protocol for resolving right-of-way issues -- that is, which computer has priority when there are conflicts.

Once everyone has agreed on the rules, data transfers can take place at lightning speeds. But what if a protocol contains an error, the way the North Pennsylvania Railroad's did so long ago? What if two computers simultaneously conclude, say, that they must yield the right-of-way? Deadlock will result -- a costly and embarrassing mistake. Or what if an error leads two computers to conclude that they both have the right-of-way? They're in for the equivalent of a virtual train wreck.

Today finely tuned software acts as a safety net for faulty protocol design. Take, for example, the approach to what's known in the communications industry as the "feature interaction problem." Currently there are 40 or so features available to telephone subscribers. The features, from call forwarding to return calls on busy lines, are controlled by computers. Each feature on its own makes perfect sense and can be checked for proper functioning. But it's virtually impossible to predict how the features will interact when they're turned on in some unexpected combination. The nightmare scenario is that a freak combination of events could crash a switch or, even worse, bring down part of a network.

That's where verification technology comes in. We have software today that can thoroughly verify the safety of feature code before it's implemented. The same verification techniques are being applied to the software that defines the interaction of other multiple-computer arrangements.

Would that the conductors of the North Pennsylvania Railroad had had a similar ability to test the train-delay rule before it was too late.

Gerard J. Holzmann (gerard@research.att.com) is a distinguished member of the technical staff at AT&T Bell Laboratories in Murray Hill, N.J. He is the author of several books, including Design and Validation of Computer Protocols (Prentice Hall, 1991) and The Early History of Data Networks (IEEE Computer Society Press, 1995).