lease databases. This is why it’s a mistake to copy the lease database from a stand-
alone server to its partner when you convert it to a failover pair; if you do that, both
servers have identical lease files, and they take twice as long to synchronize.
When the servers are synchronized, they might both wait out the MCLT before
beginning to serve clients. This is the behavior required by the failover protocol
when a server is in the
RECOVER state. However, if you are starting up for the first
time, both servers are in the
RECOVER state, which isn’t a desirable situation. Some
DHCP servers, including the ISC DHCP server, bypass the waiting period if they
detect that both servers are in the
RECOVER state because this can usually only
happen the first time two servers are configured to do failover.
Normal Operations
After the servers have synchronized, they begin normal operations. This doesn’t
mean the
NORMAL failover state. Normal operations refers to all the failover states
described in Chapter 10. During normal operations, two sorts of failover log
messages are worth watching for: lease update messages and failover state messages.
When the state of either failover partner changes, you see a message in the log for
that state change. The most usual state changes are from
NORMAL to COMMUNICATIONS-
INTERRUPTED and from COMMUNICATIONS-INTERRUPTED to NORMAL. You see a message
about this on one server whenever the other server is stopped. More rarely, you see
this message when the network connection between the two servers has failed.
The second sort of log message is a binding update message. The ISC DHCP server is
usually quiet about binding update messages. The only time you hear about them in
the log is when they fail. The only real reason a binding update would fail is if the
server is buggy or the two servers have lease databases that have gone out of sync.
Operational Problems
During operations, a variety of problems can come up. Some of them have to do
with the fact that the failover protocol is very new, and existing implementations
might still have bugs to work out. Others are just normal operational problems that
can come up even if the DHCP servers are not at all buggy.
Server Down
When one server in a failover pair goes down, the other server continues to provide
service, but in a limited mode called the
COMMUNICATIONS-INTERRUPTED state. To learn
more about this state, see Chapter 10. Because of the limitations of
COMMUNICATIONS-
INTERRUPTED, if the server that has gone down isn’t expected to come back up
quickly, it’s good to put the other server into the
PARTNER-DOWN state. In the
PARTNER-DOWN state, the remaining DHCP server can, after waiting for the MCLT,
completely take over DHCP service on the network, including reclaiming all of the
down server’s IP addresses.
Operating a Failover Pair 315
022 3273 CH18 10/3/02 5:00 PM Page 315