Finally, TIL, what can all be the reason for systemd services to hang indefinitely. The internet is flooded with numerous reports on this topic but no clear answers. So no more uselessly marked workarounds like: systemctl daemon-reload
and systemctl-daemon-reexec
for this scenario.
The scene would be something along the lines of:
rrs 6467 0.0 0.0 23088 15852 pts/1 Ss 12:53 0:00 | | \_ /bin/bash
rrs 11512 0.0 0.0 14876 4608 pts/1 S+ 13:18 0:00 | | | \_ systemctl restart snapper-timeline.timer
rrs 11513 0.0 0.0 14984 3076 pts/1 S+ 13:18 0:00 | | | \_ /bin/systemd-tty-ask-password-agent --watch
rrs 11514 0.0 0.0 234756 6752 pts/1 Sl+ 13:18 0:00 | | | \_ /usr/bin/pkttyagent --notify-fd 5 --fallback
The snapper-timeline
service is important to me and it not running for months is a complete failure. Disappointingly, commands like systemctl --failed
do not report of this oddity. The overall system status is reported to be fine, which is completely incorrect.
Thankfully, a kind soul’s comment gave the hint. The problem is that you could be having certain services in Activating
status, which thus blocks all other services; quietly. So much for the unnecessary fun.
Looking further, in my case, it was:
rrs@priyasi:~$ systemctl list-jobs
JOB UNIT TYPE STATE
81 timers.target start waiting
85 man-db.timer start waiting
88 fstrim.timer start waiting
3832 snapper-timeline.service start waiting
83 snapper-timeline.timer start waiting
39 systemd-time-wait-sync.service start running
87 logrotate.timer start waiting
84 debspawn-clear-caches.timer start waiting
89 plocate-updatedb.timer start waiting
91 dpkg-db-backup.timer start waiting
93 e2scrub_all.timer start waiting
40 time-sync.target start waiting
86 apt-listbugs.timer start waiting
13 jobs listed.
13:12 ♒ ॐ ♅ ♄ ⛢ ☺ 😄
That was it. I knew the systemd-timesyncd
service, in the past, had given me enough headaches. And so was it this time, just quietly doing it all again.
rrs@priyasi:~$ systemctl status systemd-time-wait-sync.service
● systemd-time-wait-sync.service - Wait Until Kernel Time Synchronized
Loaded: loaded (/lib/systemd/system/systemd-time-wait-sync.service; enabled; vendor preset>
Active: activating (start) since Fri 2022-04-22 13:14:25 IST; 1min 38s ago
Docs: man:systemd-time-wait-sync.service(8)
Main PID: 11090 (systemd-time-wa)
Tasks: 1 (limit: 37051)
Memory: 836.0K
CPU: 7ms
CGroup: /system.slice/systemd-time-wait-sync.service
└─11090 /lib/systemd/systemd-time-wait-sync
Apr 22 13:14:25 priyasi systemd[1]: Starting Wait Until Kernel Time Synchronized...
Apr 22 13:14:25 priyasi systemd-time-wait-sync[11090]: adjtime state 5 status 40 time Fri 2022->
13:16 ♒ ॐ ♅ ♄ ⛢ ☹ 😟=> 3
Dear LazyWeb, anybody knows of why the systemd-time-wait-sync
service would hang indefinitely? I’ve had identical setups on many machines, in the same network, where others don’t exhibit this problem.
rrs@priyasi:~$ systemctl cat systemd-time-wait-sync.service
...snipped...
[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-time-wait-sync
TimeoutStartSec=infinity
RemainAfterExit=yes
[Install]
WantedBy=sysinit.target
The TimeoutStartSec=infinity
is definitely an attribute that shouldn’t be shipped in any system services. There are use cases for it but that should be left for local admins to explicitly decide. Hanging for infinity
is not a desired behavior for a system service.
In figuring all this out, today I learnt the handy systemctl list-jobs
command, which will give the list of active running/blocked/waiting
jobs.
Update: 2024-08-15
This week I finally found the cause of the issue. I have a bunch of bridge interfaces defined on my machine. And all of them, most of their time, would be DOWN
@ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: withnet: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether XXXXXXXXXXXXXXXXX brd ff:ff:ff:ff:ff:ff
3: nonet: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether XXXXXXXXXXXXXXXXX brd ff:ff:ff:ff:ff:ff
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether XXXXXXXXXXXXXXXXX brd ff:ff:ff:ff:ff:ff
6: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether XXXXXXXXXXXXXXXXX brd ff:ff:ff:ff:ff:ff
inet 10.42.1.66/24 brd 10.42.1.255 scope global dynamic noprefixroute wlan0
valid_lft 5325sec preferred_lft 5325sec
inet6 fe80::9bc0:e362:7c9d:be7c/64 scope link noprefixroute
valid_lft forever preferred_lft forever
7: email-laptop: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1280 qdisc noqueue state UNKNOWN group default qlen 1000
link/none
inet XXXXXXXXXXXXXXXXX scope global noprefixroute email-laptop
valid_lft forever preferred_lft forever
inet6 XXXXXXXXXXXXXXXXX scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 XXXXXXXXXXXXXXXXXXXXXXXX/64 scope link noprefixroute
valid_lft forever preferred_lft forever
⛢ 13:55:21 rrs@priyasi ~/pu/researchut-hugo researchut|+1…2
And as it stands, that is the cause of the problem.
systemd-timesyncd
would invoke an attempt to sync time for all the interfaces defined under systemd-networkd
. And given the setup, that is bound to fail timeout; which is why I had added the override against:
TimeoutStartSec=infinity
So, apparently, for every interface in networkd
, it’d attempt the ntp sync. It reports along the lines of:
Aug 15 13:24:06 priyasi systemd-timesyncd[1987]: Network configuration changed, trying to establish connection.
So I learnt this week, of the attribute to handle this:
@ cat sysbr0.network
[Match]
Name=withnet
[Network]
DHCPServer=yes
IPv4Forwarding=yes
IPv6Forwarding=yes
IPMasquerade=both
Address=192.168.1.1/24
LLMNR=yes
[Link]
RequiredForOnline=no
#[DHCPServer]
#DNS=192.168.1.1
With that, NTP time synchronization is back to what it should be. Precision synchronization of the system time.
ॐ 14:11:35 rrs@priyasi /etc/systemd/network
@ timedatectl show
LocalRTC=no
CanNTP=yes
NTP=yes
NTPSynchronized=yes
TimeUSec=Thu 2024-08-15 14:12:45 IST
RTCTimeUSec=Thu 2024-08-15 14:12:45 IST