Skip to content

Commit 7aa9ed5

Browse files
authored
Merge pull request #474 from finit-project/multi-depend
Multi depend fix
2 parents 92a2861 + a1a92a0 commit 7aa9ed5

15 files changed

Lines changed: 402 additions & 47 deletions

File tree

doc/ChangeLog.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Change Log
33

44
All relevant changes are documented in this file.
55

6-
[4.16][] - Unreleased
6+
[4.16][UNRELEASED] - Unreleased
77
---------------------
88

99
### Changes
@@ -18,13 +18,37 @@ All relevant changes are documented in this file.
1818
systemd `RemainAfterExit=yes`, by Aaron Andersen
1919
- Clear service conditions on `initctl reload NAME` to ensure dependent
2020
services are properly updated
21+
- Run service `stop:` and `reload:` scripts as non-blocking processes,
22+
preventing Finit from stalling on long-running helper scripts
23+
- Guard shutdown with timer watchdog to detect and debug shutdown hangs
24+
- Add `~` condition prefix to propagate reload from a dependency to the
25+
dependent service. E.g., `<!~pid/netd>` means not only a regular
26+
condition, but when `netd` reloads, restart this service too. Similar
27+
to systemd's directive `PropagatesReloadTo=`, but declared on the
28+
consumer side. Issue #416
2129

2230
### Fixes
2331
- Fix #464: invalid user:group examples in cgroups.md
2432
- Fix #466: elogind path for Debian-based distros, by Jackie Liu
2533
- Fix #467: TTY services stuck in restart state after non-zero exit.
2634
Throttling logic introduced in v4.15 had duplicate checks causing
2735
infinite timer loop, and TTYs lacked default restart timeout
36+
- Fix #475: clear pid condition on service collection to fix stale
37+
deps. When a service crashes (SIGKILL), the RUNNING → HALTED path
38+
bypasses STOPPING where `cond_clear()` is normally called, leaving
39+
dependents stuck
40+
- Fix #476: dependents not restarted after SIGHUP reload of service in
41+
dependency chain. Add `service_step_all()` at end of reload cycle to
42+
guarantee convergence after conditions are reasserted. See also the
43+
new `~` condition prefix (above) to propagate reload to dependents
44+
- Fix reload of SIGHUP-capable services incorrectly disrupting their
45+
dependents. E.g., `initctl reload syslogd` would stop dbus, dnsmasq,
46+
etc. even though syslogd handles SIGHUP gracefully and its PID persists
47+
- Only remove managed pidfiles in service cleanup. For SysV services
48+
with `pid:!/path`, the pidfile belongs to the service itself and Finit
49+
should not delete it
50+
- Silence spurious cgroup warnings for short-lived processes where the
51+
kernel reaps the child before cgroup assignment completes
2852
- Fix handling of already-mounted cgroups in `cgroup_init()`, can occur
2953
after switch_root or in container environments
3054
- Improve cgroups documentation clarity, grammar, and examples
@@ -1941,6 +1965,7 @@ Major bug fix release.
19411965
* Initial release
19421966

19431967
[UNRELEASED]: https://github.com/finit-project/finit/compare/4.15...HEAD
1968+
[4.16]: https://github.com/finit-project/finit/compare/4.15...4.16
19441969
[4.15]: https://github.com/finit-project/finit/compare/4.14...4.15
19451970
[4.14]: https://github.com/finit-project/finit/compare/4.13...4.14
19461971
[4.13]: https://github.com/troglobit/finit/compare/4.12...4.13

doc/conditions.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ specified separated by comma. Multiple conditions are logically AND'ed
1414
during evaluation, i.e. all conditions must be satisfied in order for a
1515
service to run.
1616

17-
A special syntax, using a leading `!` in run/task/service conditions,
18-
denote if a:
17+
Two special prefixes can be used inside the angle brackets:
1918

20-
- service does not support `SIGHUP`
21-
- run/task should not block runlevel changes (i.e., bootstrap)
19+
- `!` -- service does not support `SIGHUP` (noreload), or run/task
20+
should not block runlevel changes (i.e., bootstrap)
21+
- `~` -- propagate reload from this dependency, see below
2222

2323
Finit guarantees by default that all run/tasks run (at least) once
2424
per runlevel. For most tasks this is a good default, for example
@@ -53,6 +53,30 @@ moving to the next runlevel after bootstrap, so we set `<!>`:
5353
task [S0123456789] <!sys/pwr/fail> name:pwrfail initctl poweroff -- Power failure, shutting down
5454

5555

56+
Propagating Reload in Dependencies
57+
-----------------------------------
58+
59+
By default, when a service reloads during `initctl reload`, dependent
60+
services are paused (`SIGSTOP`) and simply resumed (`SIGCONT`) when the
61+
condition is reasserted. This is correct for barrier-style dependencies
62+
like `<pid/syslogd>`, where dependents just need syslogd running and do
63+
not care if it reloads its config.
64+
65+
For services that need to react when their upstream reloads, the `~`
66+
prefix propagates the reload from the dependency:
67+
68+
service <pid/svc_a> name:svc_b /sbin/svc_b -- Needs A (barrier)
69+
service <!~pid/svc_b> name:svc_c /sbin/svc_c -- Propagate reload from B
70+
71+
Here, `<~pid/svc_b>` means: propagate a reload of `svc_b` to `svc_c`.
72+
When `svc_b` reloads, `svc_c` will be restarted (because of `!`,
73+
noreload) instead of merely resumed. If `svc_c` supported `SIGHUP`
74+
(no `!` prefix), it would be sent `SIGHUP` instead.
75+
76+
This is similar to systemd's `PropagatesReloadTo=` directive, but
77+
declared on the consumer side rather than the provider side.
78+
79+
5680
Triggering
5781
----------
5882

@@ -281,6 +305,11 @@ This STOP/CONT handling minimizes the number of unnecessary service
281305
restarts that would otherwise occur because a depending service was sent
282306
`SIGHUP` for example.
283307

308+
Services with the `~` prefix are an exception to this rule: when their
309+
conditions return to `on` after being in `flux`, the reload is propagated
310+
-- the service is reloaded (SIGHUP) or restarted (noreload `!`) instead
311+
of simply being resumed.
312+
284313
Therefore, any plugin that supplies Finit with conditions must ensure
285314
that their state is updated after each reconfiguration. This can be
286315
done by binding to the `HOOK_SVC_RECONF` hook. For an example of how

doc/config/services.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,14 @@ waits for another service, `zebra`, to have created its PID file in
4747
*all* files in `/var/run`, for each file named `*.pid`, or `*/pid`,
4848
Finit opens it and find the matching `NAME:ID` using the PID.
4949

50+
The condition can be prefixed with `!` and/or `~`:
51+
52+
- `<!pid/zebra>` -- `ospfd` does not support `SIGHUP` (noreload)
53+
- `<~pid/zebra>` -- propagate reload from `zebra` to `ospfd`
54+
- `<!~pid/zebra>` -- both: noreload and propagate reload
55+
56+
For details, see the [Finit Conditions](../conditions.md) document.
57+
5058
Some services do not maintain a PID file and rather than patching each
5159
application Finit provides a workaround. A `pid` keyword can be set
5260
to have Finit automatically create (when starting) and later remove

plugins/pidfile.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ static int pidfile_add_path(struct iwatch *iw, char *path)
5959
}
6060
}
6161

62-
return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE);
62+
return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE | IN_ATTRIB);
6363
}
6464

6565
static void pidfile_update_conds(char *dir, char *name, uint32_t mask)
@@ -276,6 +276,7 @@ static void pidfile_reconf(void *arg)
276276
if (cond_get(cond) == COND_ON)
277277
continue;
278278

279+
dbg("Reassert condition %s", cond);
279280
cond_set_path(cond_path(cond), COND_ON);
280281
}
281282

src/api.c

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -126,17 +126,22 @@ static int reload(svc_t *svc, void *user_data)
126126

127127
/*
128128
* Clear conditions before reload to ensure dependent services
129-
* are properly updated. The conditions are reasserted when
130-
* the service touches its PID file after processing SIGHUP.
129+
* are properly updated. Only needed when the service does NOT
130+
* support SIGHUP (noreload), because then it will be stopped
131+
* and restarted, so conditions genuinely go away. When the
132+
* service handles SIGHUP, its PID and pidfile persist, so the
133+
* condition stays valid and dependents should not be disrupted.
131134
*
132135
* Note: only clear 'ready' for services where the pidfile
133136
* inotify handler reasserts it (pid/none). For s6/systemd
134137
* services readiness relies on their respective notification
135138
* mechanism which may not re-trigger on SIGHUP.
136139
*/
137-
cond_clear(mkcond(svc, cond, sizeof(cond)));
138-
if (svc->notify == SVC_NOTIFY_PID || svc->notify == SVC_NOTIFY_NONE)
139-
service_ready(svc, 0);
140+
if (svc_is_noreload(svc)) {
141+
cond_clear(mkcond(svc, cond, sizeof(cond)));
142+
if (svc->notify == SVC_NOTIFY_PID || svc->notify == SVC_NOTIFY_NONE)
143+
service_ready(svc, 0);
144+
}
140145

141146
svc_mark_dirty(svc);
142147
service_step(svc);

src/cgroup.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,8 @@ int cgroup_move_pid(const char *group, const char *name, int pid, int delegate)
190190
}
191191

192192
if (rc) {
193-
warn("Failed moving pid %d to cgroup %s", pid, path);
193+
if (errno != ESRCH)
194+
warn("Failed moving pid %d to cgroup %s", pid, path);
194195
return -1;
195196
}
196197
}
@@ -379,7 +380,8 @@ static int cgroup_leaf_init(const char *group, const char *name, int pid, const
379380
if (!strcmp(group, "root"))
380381
return fnwrite(str("%d", pid), FINIT_CGPATH "/cgroup.procs");
381382

382-
if (cgroup_move_pid(group, name, pid, delegate))
383+
/* Error if error is not "No such process" */
384+
if (cgroup_move_pid(group, name, pid, delegate) && errno != ESRCH)
383385
err(1, "Failed moving pid %d to group %s/%s", pid, group, name);
384386

385387
/* Set up inotify watch for cgroup cleanup */

src/conf.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -745,8 +745,21 @@ void conf_parse_cond(svc_t *svc, char *cond)
745745
return;
746746
}
747747

748+
/*
749+
* The '~' prefix means a reload of the upstream service is
750+
* propagated to this service -- it will be reloaded (SIGHUP)
751+
* or restarted (noreload), not just paused and resumed.
752+
* Syntax: <!~pid/foo,~pid/bar> or <~pid/foo>
753+
*
754+
* Strip '~' from each condition and set the service-level
755+
* flag. This allows '~' on any condition in the list.
756+
*/
748757
svc->cond[0] = 0;
749758
for (i = 0, c = strtok(ptr, ","); c; c = strtok(NULL, ","), i++) {
759+
if (c[0] == '~') {
760+
c++;
761+
svc->flux_reload = 1;
762+
}
750763
devmon_add_cond(c);
751764
if (i)
752765
strlcat(svc->cond, ",", sizeof(svc->cond));

src/service.c

Lines changed: 43 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1031,7 +1031,6 @@ static int service_run_script(svc_t *svc, char *script)
10311031
{
10321032
const char *id = svc_ident(svc, NULL, 0);
10331033
pid_t pid = service_fork(svc);
1034-
int status, rc;
10351034

10361035
if (pid < 0) {
10371036
err(1, "%s: failed forking off script %s", id, script);
@@ -1047,6 +1046,8 @@ static int service_run_script(svc_t *svc, char *script)
10471046
};
10481047
char pidbuf[16];
10491048

1049+
redirect(svc);
1050+
10501051
snprintf(pidbuf, sizeof(pidbuf), "%d", svc->pid);
10511052
setenv("MAINPID", pidbuf, 1);
10521053

@@ -1056,23 +1057,7 @@ static int service_run_script(svc_t *svc, char *script)
10561057
}
10571058

10581059
dbg("%s: script '%s' started as PID %d", id, script, pid);
1059-
if (waitpid(pid, &status, 0) == -1) {
1060-
warn("%s: failed calling script %s", id, script);
1061-
return -1;
1062-
}
1063-
1064-
rc = WEXITSTATUS(status);
1065-
if (WIFEXITED(status)) {
1066-
dbg("%s: script '%s' exited without signal, status: %d", id, script, rc);
1067-
} else if (WIFSIGNALED(status)) {
1068-
dbg("%s: script '%s' terminated by signal %d", id, script, WTERMSIG(status));
1069-
if (!rc)
1070-
rc = 1;
1071-
} else {
1072-
dbg("%s: script '%s' exited with status: %d", id, script, rc);
1073-
}
1074-
1075-
return rc;
1060+
return service_script_add(svc, pid, svc->killdelay);
10761061
}
10771062

10781063
/* Ensure we don't have any notify socket lingering */
@@ -1093,15 +1078,31 @@ static void service_notify_stop(svc_t *svc)
10931078
*/
10941079
static void service_cleanup(svc_t *svc)
10951080
{
1096-
char *fn;
1081+
char cond[MAX_COND_LEN];
10971082

10981083
/* PID collected, cancel any pending SIGKILL */
10991084
service_timeout_cancel(svc);
11001085

1101-
fn = pid_file(svc);
1102-
if (fn && remove(fn) && errno != ENOENT)
1103-
logit(LOG_CRIT, "Failed removing service %s pidfile %s",
1104-
svc_ident(svc, NULL, 0), fn);
1086+
/* Only clean up process' pidfile if managed by us */
1087+
if (svc_has_pidfile(svc)) {
1088+
char *fn = pid_file(svc);
1089+
1090+
if (remove(fn) && errno != ENOENT)
1091+
logit(LOG_CRIT, "Failed removing service %s pidfile %s",
1092+
svc_ident(svc, NULL, 0), fn);
1093+
}
1094+
1095+
/*
1096+
* Invalidate the pid/ condition for this service to ensure
1097+
* dependent services are properly stopped and restarted.
1098+
* Without this, the condition is only cleared asynchronously
1099+
* via inotify on pidfile removal, which may not trigger when
1100+
* the daemon fails to clean up its own pidfile, or when the
1101+
* service dies during a reload cycle and goes directly from
1102+
* RUNNING to HALTED (skipping STOPPING where cond_clear()
1103+
* is normally called).
1104+
*/
1105+
cond_clear(mkcond(svc, cond, sizeof(cond)));
11051106

11061107
service_notify_stop(svc);
11071108

@@ -1140,7 +1141,7 @@ int service_stop(svc_t *svc)
11401141
service_timeout_cancel(svc);
11411142

11421143
if (svc->stop_script[0]) {
1143-
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script);
1144+
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script);
11441145
} else if (!svc_is_sysv(svc)) {
11451146
char *nm = pid_get_name(svc->pid, NULL, 0);
11461147
const char *sig = sig_name(svc->sighalt);
@@ -1155,10 +1156,10 @@ int service_stop(svc_t *svc)
11551156
}
11561157

11571158
dbg("Sending %s to pid:%d name:%s(%s)", sig, svc->pid, id, nm);
1158-
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], stopping, sending %s ...", id, svc->pid, sig);
1159+
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], sending %s ...", id, svc->pid, sig);
11591160
} else {
11601161
compose_cmdline(svc, cmdline, sizeof(cmdline));
1161-
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling '%s stop' ...", id, svc->pid, cmdline);
1162+
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling '%s stop' ...", id, svc->pid, cmdline);
11621163
}
11631164

11641165
/*
@@ -1278,16 +1279,16 @@ static int service_reload(svc_t *svc)
12781279
print_desc("Restarting ", svc->desc);
12791280

12801281
if (svc->reload_script[0]) {
1281-
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script);
1282+
logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script);
12821283
rc = service_run_script(svc, svc->reload_script);
12831284
} else if (svc->sighup) {
12841285
if (svc->pid <= 1) {
12851286
dbg("%s[%d]: bad PID, cannot reload service", id, svc->pid);
12861287
svc->start_time = svc->pid = 0;
12871288
goto done;
12881289
}
1289-
dbg("%s[%d], sending SIGHUP", id, svc->pid);
1290-
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], sending SIGHUP ...", id, svc->pid);
1290+
dbg("Reloading %s[%d], sending SIGHUP", id, svc->pid);
1291+
logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], sending SIGHUP ...", id, svc->pid);
12911292
rc = kill(svc->pid, SIGHUP);
12921293
if (rc == -1 && (errno == ESRCH || errno == ENOENT)) {
12931294
/* nobody home, reset internal state machine */
@@ -3134,6 +3135,19 @@ int service_step(svc_t *svc)
31343135
case COND_ON:
31353136
kill(svc->pid, SIGCONT);
31363137
svc_set_state(svc, SVC_RUNNING_STATE);
3138+
3139+
/*
3140+
* Propagate reload (~): upstream reloaded, so
3141+
* we must also reload/restart, not just resume.
3142+
*/
3143+
if (svc->flux_reload && !svc_is_changed(svc)) {
3144+
if (svc_is_noreload(svc))
3145+
service_stop(svc);
3146+
else
3147+
service_reload(svc);
3148+
break;
3149+
}
3150+
31373151
/* Reassert condition if we go from waiting and no change */
31383152
if (!svc_is_changed(svc)) {
31393153
if (svc->notify == SVC_NOTIFY_PID) {

src/sig.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,8 @@ void do_shutdown(shutop_t op)
303303
int in_cont = in_container();
304304
int signo = SIGTERM;
305305

306+
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: entered, op=%d\n", op);
307+
306308
if (!in_cont) {
307309
/*
308310
* On a PREEMPT-RT system, Finit must run as the highest prioritized
@@ -322,13 +324,15 @@ void do_shutdown(shutop_t op)
322324
* Tell remaining non-monitored processes to exit, give them
323325
* time to exit gracefully, 2 sec was customary, we go for 1.
324326
*/
327+
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: killing remaining processes ...\n");
325328
iterate_proc(kill_cb, &signo);
326329
if (do_wait(1)) {
327330
signo = SIGKILL;
328331
iterate_proc(kill_cb, &signo);
329332
}
330333

331334
/* Exit plugins gracefully */
335+
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: plugin_exit() + cond_exit() ...\n");
332336
plugin_exit();
333337

334338
/* We can no longer act on conditions, activate script runner */
@@ -351,12 +355,14 @@ void do_shutdown(shutop_t op)
351355
for (int fd = 3; fd < 128; fd++)
352356
close(fd);
353357

358+
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: vfork, child will reboot ...\n");
354359
if (vfork()) {
355360
/*
356361
* Put PID 1 aside and let child perform reboot/halt
357362
* kernel may exit child and we don't want to exit PID 1
358363
* ... causing "aiii killing init" during reboot ...
359364
*/
365+
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: PID 1 returning from vfork\n");
360366
return;
361367
}
362368

0 commit comments

Comments
 (0)