Bernie Cosell
532
I know that one of the things that impressed Will was there was some bug
that they could not find and I found it. It turns out it was a bug in the
handling of some protocol for the modems and it was sending the wrong
packet at the wrong time. I put together a series of patches so that I could
put a marker in a packet and when it saw that particular packet, it installed a
patch on the system that looked for this other thing happening and as soon
as it saw it, it stopped the system. Then once it stopped the system, we
could use debuggers to figure out what was going on. Once I had done that,
it took about two minutes to find the bug because the offending packet was
still in memory; it hadn’t been written over.
I don’t remember the exact problem, but it was one of these problems that
was not fatal. There was a bad pointer corrupting memory and the
corruption wasn’t causing any trouble, but thousands and thousands of
machine cycles later, the program crashed because some data structure was
corrupt. But it turns out the data structure was used all the time, so we
couldn’t put in code that says, “Stop when it changes.” So I thought about it
for a while and eventually I put in this two- or three-stage patch that when
this first thing happened, it enabled another patch that went through a
different part of the code. When that happened, it enabled another patch to
put in another thing. And then when it noticed something bad happening, it
froze the system. I managed to figure how to delay it until the right time by
doing a dynamic patching hack where one path through the code was
patched dynamically to another piece of the code. And I was lucky because I
guessed the right thing and we immediately found the problem.
Seibel: What enables that kind of intuition?
Cosell: On the systems I’m very good with like that, like the IMP system
when I had it all in my head, or the PDP-1 time-sharing system, even though
the system is a multiprogramming, multilayered, interrupt-driven system, I
have all the dynamics of the system in my head. I know what order things
are supposed to happen. I know somehow what’s not supposed to happen,
when things are supposed to not be happening. That lets me build up a
model for, “How could this thing possibly have happened?”
And at least some of those were two-machine problems, which also
required some odd creativity to find. That is, the trouble is something goes
wrong on my machine and the evidence of it shows up on yours. I can’t