2 servers show different behavior when DHCP sever goes down

jodlgc asked:

i dont have any control of the DHCP servers on my network and every 2 weeks or so there is an outage of a few hours where my CentOS 7.8 servers do not get responses to DHCP renewal requests. As far as I know these servers are configured exactly the same. Some of the servers continuously request DHCP during this outage until DHCP renewal is successful and the systems are back on the network. But, some of the servers seem to hit some corner case and stop DHCP requests after some time and then never make it back on the network. Can any one please advise what different is happening when seeing the differences in the logs I’m posting.
server003 is a failure case
server004 is a good case
thanks!

some oddities I see is on failed server003
"bound: renewal in 2134686840 seconds"
"Trying recorded lease 192.168.2.72"
192.168.2.72 is a very old network we used to be on is dhclient actually setting this IP on the interface?

server003logs:

Nov 18 07:01:41 got DHCP
Nov 18 07:21:02 something killed, MFE?
Nov 18 09:00:09 DHCP started failing
Nov 18 09:00:09 server003 dhclient[45214]: DHCPREQUEST on enp4s0 to 10.20.193.131 port 67 (xid=0x44d64e6c)
-- DHCPREQUEST on enp4s0 repeatedly till 12:01 --
Nov 18 12:01:27 server003 dhclient[45214]: DHCPREQUEST on enp4s0 to 255.255.255.255 port 67 (xid=0x44d64e6c)
Nov 18 12:01:41 server003 avahi-daemon[1973]: Withdrawing address record for 10.20.232.222 on enp4s0.
Nov 18 12:01:41 server003 avahi-daemon[1973]: Leaving mDNS multicast group on interface enp4s0.IPv4 with address 10.20.232.222.
Nov 18 12:01:41 server003 avahi-daemon[1973]: Interface enp4s0.IPv4 no longer relevant for mDNS.
Nov 18 12:01:42 server003 NetworkManager[2553]: <info>  [1605718902.1357] dhcp4 (enp4s0): state changed bound -> expire
Nov 18 12:01:42 server003 NetworkManager[2553]: <info>  [1605718902.1364] device (enp4s0): DHCPv4: 480 seconds grace period started
Nov 18 12:01:42 server003 NetworkManager[2553]: <info>  [1605718902.1469] dhcp4 (enp4s0): state changed expire -> unknown
Nov 18 12:02:41 server003 dhclient[45214]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 2 (xid=0x40f44748)
Nov 18 12:02:43 server003 dhclient[45214]: No DHCPOFFERS received.
Nov 18 12:02:43 server003 dhclient[45214]: Trying recorded lease 192.168.2.72
Nov 18 12:02:43 server003 NetworkManager[2553]: <info>  [1605718963.8053] dhcp4 (enp4s0): state changed unknown -> timeout
Nov 18 12:02:43 server003 dhclient[45214]: bound: renewal in 2134686840 seconds.
Nov 18 12:09:42 server003 NetworkManager[2553]: <info>  [1605719382.2119] device (enp4s0): DHCPv4: grace period expired
Nov 18 13:06:02 server003 NetworkManager[2553]: <info>  [1605722762.3311] policy: set 'enp4s0' (enp4s0) as default for IPv6 routing and DNS
-- nothing else after this --

server004 logs

Nov 18 07:27:10 got DHCP
Nov 18 09:26:55 DHCP started failing
Nov 18 09:26:55 server004 dhclient[5179]: DHCPREQUEST on enp4s0 to 10.20.193.131 port 67 (xid=0x26458456)
-- DHCPREQUEST on enp4s0 repeatedly till 12:27 --
Nov 18 12:27:04 server004 dhclient[5179]: DHCPREQUEST on enp4s0 to 255.255.255.255 port 67 (xid=0x26458456)
Nov 18 12:27:10 server004 avahi-daemon[1869]: Withdrawing address record for 10.20.232.229 on enp4s0.
Nov 18 12:27:10 server004 avahi-daemon[1869]: Leaving mDNS multicast group on interface enp4s0.IPv4 with address 10.20.232.229.
Nov 18 12:27:10 server004 avahi-daemon[1869]: Interface enp4s0.IPv4 no longer relevant for mDNS.
Nov 18 12:27:11 server004 NetworkManager[2609]: <info>  [1605720431.3993] dhcp4 (enp4s0): state changed bound -> expire
Nov 18 12:27:11 server004 NetworkManager[2609]: <info>  [1605720431.4000] device (enp4s0): DHCPv4: 480 seconds grace period started
Nov 18 12:27:11 server004 NetworkManager[2609]: <info>  [1605720431.4106] dhcp4 (enp4s0): state changed expire -> unknown
Nov 18 12:27:11 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 5 (xid=0x1e6890d0)
Nov 18 12:27:16 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 9 (xid=0x1e6890d0)
Nov 18 12:27:25 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 10 (xid=0x1e6890d0)
Nov 18 12:27:35 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 15 (xid=0x1e6890d0)
Nov 18 12:27:50 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 15 (xid=0x1e6890d0)
Nov 18 12:28:05 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 7 (xid=0x1e6890d0)
Nov 18 12:28:12 server004 dhclient[5179]: No DHCPOFFERS received.
Nov 18 12:28:12 server004 dhclient[5179]: No working leases in persistent database - sleeping.
Nov 18 12:28:12 server004 NetworkManager[2609]: <info>  [1605720492.1971] dhcp4 (enp4s0): state changed unknown -> fail
Nov 18 12:32:08 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 8 (xid=0x47045c24)
Nov 18 12:32:16 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 13 (xid=0x47045c24)
Nov 18 12:32:29 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 20 (xid=0x47045c24)
Nov 18 12:32:49 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 14 (xid=0x47045c24)
Nov 18 12:33:03 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 6 (xid=0x47045c24)
Nov 18 12:33:09 server004 dhclient[5179]: No DHCPOFFERS received.
Nov 18 12:33:09 server004 dhclient[5179]: No working leases in persistent database - sleeping.
-- DHCPDISCOVER -> No DHCPOFFERS received -> DHCPDISCOVER happens repeatedly every 5 mins until 13:05 and then got DHCP
Nov 18 13:05:04 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 7 (xid=0x6a52e1ae)
Nov 18 13:05:11 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 14 (xid=0x6a52e1ae)
Nov 18 13:05:25 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 15 (xid=0x6a52e1ae)
Nov 18 13:05:40 server004 dhclient[5179]: DHCPDISCOVER on enp4s0 to 255.255.255.255 port 67 interval 17 (xid=0x6a52e1ae)
Nov 18 13:05:40 server004 dhclient[5179]: DHCPREQUEST on enp4s0 to 255.255.255.255 port 67 (xid=0x6a52e1ae)
Nov 18 13:05:40 server004 dhclient[5179]: DHCPOFFER from 10.20.232.1
Nov 18 13:05:40 server004 dhclient[5179]: DHCPACK from 10.20.232.1 (xid=0x6a52e1ae)
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0476] dhcp4 (enp4s0):   address 10.20.232.229
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0483] dhcp4 (enp4s0):   plen 22 (255.255.252.0)
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0483] dhcp4 (enp4s0):   gateway 10.20.232.1
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0483] dhcp4 (enp4s0):   lease time 18000
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0483] dhcp4 (enp4s0):   nameserver '10.20.10.49'
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0483] dhcp4 (enp4s0):   nameserver '10.20.10.48'
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0484] dhcp4 (enp4s0):   domain name 'company.com'
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0484] dhcp4 (enp4s0): state changed fail -> bound
Nov 18 13:05:40 server004 dhclient[5179]: bound to 10.20.232.229 -- renewal in 7548 seconds.
Nov 18 13:05:40 server004 avahi-daemon[1869]: Joining mDNS multicast group on interface enp4s0.IPv4 with address 10.20.232.229.
Nov 18 13:05:40 server004 NetworkManager[2609]: <info>  [1605722740.0519] policy: set 'enp4s0' (enp4s0) as default for IPv4 routing and DNS
Nov 18 13:05:40 server004 avahi-daemon[1869]: New relevant interface enp4s0.IPv4 for mDNS.
Nov 18 13:05:40 server004 avahi-daemon[1869]: Registering new address record for 10.20.232.229 on enp4s0.IPv4.
Nov 18 13:05:40 server004 dbus[1896]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Nov 18 13:05:40 server004 systemd: Starting Network Manager Script Dispatcher Service...
Nov 18 13:05:40 server004 dbus[1896]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 18 13:05:40 server004 systemd: Started Network Manager Script Dispatcher Service.
Nov 18 13:05:40 server004 nm-dispatcher: req:1 'dhcp4-change' [enp4s0]: new request (4 scripts)
Nov 18 13:05:40 server004 nm-dispatcher: req:1 'dhcp4-change' [enp4s0]: start running ordered scripts...
Nov 18 13:06:02 server004 NetworkManager[2609]: <info>  [1605722762.4182] policy: set 'enp4s0' (enp4s0) as default for IPv6 routing and DNS

My answer:


Your server003 has remembered an old DHCP lease for an address 192.168.2.72 with a very long expiration time and is falling back to that when the DHCP server is unavailable. This isn’t even on the same network as the last legitimate IP address it received 10.20.193.131.

You should clear out that server’s DHCP leases and then restart the DHCP client.

Something like this should do (as root) while keeping the network link up:

rm -f /var/lib/NetworkManager/*.lease; killall dhclient; nmcli device reapply enp4s0

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.