Can process grouping make a big difference in Linux service processes?

Post date: Mar 01, 2017 9:8:45 PM

Discovering the problem:

I was recently doing an urgent troubleshoot. At some time during the troubleshoot, I needed to bring down a bunch of related processes on a group of RHEL 6 Linux hosts. Normally, this task is made easy with cluster ssh and a one-line shell script. But, after running initctl stop on all of these services and initctl status to make sure they were stopped: 'initctl status' said every one of them was stopped, but every one of them was still running. If this was a single process on a single host, I'd just send the kill signal and continue on without much further thought, but in this case I needed to manually kill a lot of processes (somewhere around 50 processes total). This led me to believe something was really wrong with the setup, since it was every single one of the processes that upstart failed to stop.

Reproducing the problem:

It turns out, this problem was extremely easy to reproduce on a local Virtual Machine. I installed a fresh CentOS 6 VM and set out to recreate the issue I had in production. First, I created a really simple java program that didn't do much but would run forever as a 'service'.

Here is what this service looks like, it just prints the date every second in an endless loop.

[root@vm6 ~]# cat fake.java

import java.util.*;

public class fake {

public static void main(String args[])

throws InterruptedException {

while (true) {

System.out.println(new Date() + " Testing");

Thread.sleep(1000);

}

[root@vm6 ~]# javac fake.java

[root@vm6 ~]# java fake

Wed Mar 01 11:24:01 CST 2017 Testing

Wed Mar 01 11:24:02 CST 2017 Testing

Wed Mar 01 11:24:03 CST 2017 Testing

[root@vm6 ~]#

Now, I needed to write a simple upstart script, with a shell script to launch the java process.

Here is the upstart script, I made copies and changed variable PARAM to easily simulate multiple services:

[root@vm6 init]# cat /etc/init/fake1.conf

start on runlevel [345]

stop on runlevel [016]

env PARAM=test1

console none

respawn

exec su - testuser -c "/usr/local/bin/fake.sh $PARAM"

The shell script is quite simple:

[root@vm6 init]# cat /usr/local/bin/fake.sh

cd /usr/local/bin

java fake $1 >/tmp/fake_$1.log

Now to fire these up and see if they do the same thing when stopped.

[root@vm6 init]# start fake1

fake1 start/running, process 4413

[root@vm6 init]# start fake2

fake2 start/running, process 4436

[root@vm6 init]# start fake3

fake3 start/running, process 4459

[root@vm6 init]# ps -fp $(pgrep -f fake)

UID PID PPID C STIME TTY STAT TIME CMD

root 4413 1 0 11:39 ? Ss 0:00 su - testuser -c /usr/local/bin/fake.sh test1

testuser 4414 4413 0 11:39 ? Ss 0:00 -bash -c /usr/local/bin/fake.sh test1

testuser 4425 4414 0 11:39 ? Sl 0:00 java fake test1

root 4436 1 0 11:39 ? Ss 0:00 su - testuser -c /usr/local/bin/fake.sh test2

testuser 4437 4436 0 11:39 ? Ss 0:00 -bash -c /usr/local/bin/fake.sh test2

testuser 4448 4437 0 11:39 ? Sl 0:00 java fake test2

root 4459 1 0 11:39 ? Ss 0:00 su - testuser -c /usr/local/bin/fake.sh test3

testuser 4460 4459 0 11:39 ? Ss 0:00 -bash -c /usr/local/bin/fake.sh test3

testuser 4471 4460 0 11:39 ? Sl 0:00 java fake test3

[root@vm6 init]# stop fake1

fake1 stop/waiting

[root@vm6 init]# stop fake2

fake2 stop/waiting

[root@vm6 init]# stop fake3

fake3 stop/waiting

[root@vm6 init]# status fake1

fake1 stop/waiting

[root@vm6 init]# status fake2

fake2 stop/waiting

[root@vm6 init]# status fake3

fake3 stop/waiting

[root@vm6 init]# ps -fp $(pgrep -f fake)

UID PID PPID C STIME TTY STAT TIME CMD

testuser 4425 1 0 11:39 ? Sl 0:00 java fake test1

testuser 4448 1 0 11:39 ? Sl 0:00 java fake test2

testuser 4471 1 0 11:39 ? Sl 0:00 java fake test3

Well, that reproduced the problem quite nicely! I stopped all three services, and the shell scripts got killed but all three java processes remain running.

Now, for the hard part...

Troubleshooting / debugging

I started by using strace to trace system calls / signals on the three processes for one of the fake "services".

[root@vm6 ~]# start fake1

fake1 start/running, process 4515

[root@vm6 ~]# pgrep -fl fake

4515 su - testuser -c /usr/local/bin/fake.sh test1

4516 -bash -c /usr/local/bin/fake.sh test1

4527 java fake test1

[root@vm6 ~]# strace -tt -ff -p 4515 -p 4516 -p 4527 2>/tmp/trace1 &

[1] 4559

[root@vm6 ~]# stop fake1

fake1 stop/waiting

[root@vm6 ~]# pgrep -fl fake

4527 java fake test1

[root@vm6 ~]# kill 4527

[root@vm6 ~]#

[1]+ Done strace -tt -ff -p 4515 -p 4516 -p 4527 2> /tmp/trace1

Lets take a look at signals sent to our processes

[root@vm6 ~]# grep '\-\- SIG' /tmp/trace1

[pid 4515] 11:52:42.021765 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

[pid 4516] 11:52:42.022852 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4515, si_uid=0} ---

[pid 4515] 11:52:44.033677 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4515, si_uid=0} ---

[pid 4527] 11:52:58.028242 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3931, si_uid=0} ---

So, I can see that init (si_pid=1) sent a kill to the "su" process. In turn, the "su" process (si_pid=4515) sent a kill to the shell script (pid 4516). But, the kill signal to the java process came from my shell (si_pid=3931), because the service's shell didn't propagate the kill signal to the java process. After researching this a little bit, I found that this is expected. If a shell process gets a kill signal, it won't kill sub-processes before exiting.

While messing around with the upstart script, I stumbled onto something. Here is an alternate version of the upstart script

[root@vm6 init]# cat fake1.conf

start on runlevel [345]

stop on runlevel [016]

env PARAM=test1

console none

respawn

exec su - testuser -- /usr/local/bin/fake.sh $PARAM

First, an observation on the change in behavior

[root@vm6 init]# start fake1

fake1 start/running, process 4610

[root@vm6 init]# start fake2

fake2 start/running, process 4633

[root@vm6 init]# start fake3

fake3 start/running, process 4656

[root@vm6 init]# pgrep -fl fake

4610 su - testuser -- /usr/local/bin/fake.sh test1

4611 -bash /usr/local/bin/fake.sh test1

4622 java fake test1

4633 su - testuser -- /usr/local/bin/fake.sh test2

4634 -bash /usr/local/bin/fake.sh test2

4645 java fake test2

4656 su - testuser -- /usr/local/bin/fake.sh test3

4657 -bash /usr/local/bin/fake.sh test3

4668 java fake test3

[root@vm6 init]# stop fake1

fake1 stop/waiting

[root@vm6 init]# stop fake2

fake2 stop/waiting

[root@vm6 init]# stop fake3

fake3 stop/waiting

[root@vm6 init]# pgrep -fl fake

[root@vm6 init]#

So, I have to wonder, why removing the "-c" in favor of "--" fixed the issue I had experienced. Lets do another strace to see what's going on.

[root@vm6 init]# start fake1

fake1 start/running, process 4687

[root@vm6 init]# pgrep -fl fake

4687 su - testuser -- /usr/local/bin/fake.sh test1

4688 -bash /usr/local/bin/fake.sh test1

4699 java fake test1

[root@vm6 init]# strace -tt -ff -p 4687 -p 4688 -p 4699 2>/tmp/trace2 &

[1] 4710

[root@vm6 init]# stop fake1

fake1 stop/waiting

[root@vm6 init]#

[1]+ Done strace -tt -ff -p 4687 -p 4688 -p 4699 2> /tmp/trace2

Let's see those kill signals this time:

[root@vm6 init]# grep '\-\- SIG' /tmp/trace2

[pid 4699] 12:09:07.521458 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

[pid 4688] 12:09:07.521507 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

[pid 4687] 12:09:07.521531 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

12:09:09.538486 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4687, si_uid=0} ---

Wait, what?!? Upstart sent the kill signal to all three processes. Now I start to feel like I might know how Alice felt after discovering the rabbit hole went much deeper than it first appeared. So, what's going on here? Why would init care if I used "-c" or any other parameter for that matter?

Ok, let's trace that again, including the init process in our trace. I'll have to manually stop the trace, as (I hope) the init process won't exit.

[root@vm6 init]# start fake

start: Unknown job: fake

[root@vm6 init]# start fake1

fake1 start/running, process 4722

[root@vm6 init]# pgrep -fl fake

4722 su - testuser -- /usr/local/bin/fake.sh test1

4723 -bash /usr/local/bin/fake.sh test1

4734 java fake test1

[root@vm6 init]# strace -tt -ff -p 1 -p 4722 -p 4723 -p 4734 2>/tmp/trace3 &

[1] 4745

[root@vm6 init]# stop fake1

fake1 stop/waiting

[root@vm6 init]# fg

strace -tt -ff -p 1 -p 4722 -p 4723 -p 4734 2> /tmp/trace3

[root@vm6 init]#

Let's take a look, including kill syscall and signals in our grep filter this time

[root@vm6 init]# egrep -e 'kill\(|\-\- SIG' /tmp/trace3

[pid 1] 12:14:42.940553 kill(4294962574, SIGTERM <unfinished ...>

[pid 4734] 12:14:42.940958 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

[pid 4723] 12:14:42.941006 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

[pid 4722] 12:14:42.941848 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---

[pid 4722] 12:14:42.943613 kill(4723, SIGTERM <unfinished ...>

[pid 1] 12:14:42.963620 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4734, si_status=143, si_utime=11, si_stime=11} ---

[pid 4722] 12:14:44.958977 kill(4723, SIGKILL) = 0

[pid 4722] 12:14:44.959284 kill(4722, SIGTERM) = 0

[pid 4722] 12:14:44.959349 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4722, si_uid=0} ---

12:14:44.959954 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=4723, si_status=SIGTERM, si_utime=0, si_stime=2} ---

So, upstart sends a kill to... pid 4294962574. That doesn't seem right. I think this is a display error in strace; my guess is that strace is displaying an unsigned long, and the pid is negative. That's right, a negative pid. From kill's man page we can see what the intent is.

-n where n is larger than 1. All processes in process group n are signaled. When an argument of the

form ‘-n’ is given, and it is meant to denote a process group, either the signal must be specified

first, or the argument must be preceded by a ‘--’ option, otherwise it will be taken as the signal to

send.

OK, so here is the magic. Upstart is sending the kill signal to a process group, not to a pid. That explains why all three of my latest processes got the kill signal. Now, lets back up a step and look at what the difference is between "-c" vs. "--".

One service with "--" and one with "-c"

4949 su - testuser -- /usr/local/bin/fake.sh test1

4950 -bash /usr/local/bin/fake.sh test1

4961 java fake test1

4972 su - testuser -c /usr/local/bin/fake.sh test2

4973 -bash -c /usr/local/bin/fake.sh test2

4984 java fake test2

So, what's different? Let's check process groups using "-O pgrp" with ps

[root@vm6 init]# ps -O pgrp -p $(pgrep -f fake)

PID PGRP S TTY TIME COMMAND

4949 4949 S ? 00:00:00 su - testuser -- /usr/local/bin/fake.sh test1

4950 4949 S ? 00:00:00 -bash /usr/local/bin/fake.sh test1

4961 4949 S ? 00:00:00 java fake test1

4972 4972 S ? 00:00:00 su - testuser -c /usr/local/bin/fake.sh test2

4973 4973 S ? 00:00:00 -bash -c /usr/local/bin/fake.sh test2

4984 4973 S ? 00:00:00 java fake test2

Here is the "Aha!" moment for me. I can see that with "-c", the shell script is in a new process group and with "--" it is in same process group as su. In this case, when we stop fake1, init sends kill to process group 4949, killing all three. When we stop fake2, init sends kill to process group 4972, sending kill to only the "su" process. "su" intercepts the kill signal and sends kill to the shell subprocess, pid 4973. The shell exits, leaving java running. With this knowledge, I've come up with a couple more ways to fix the issue.

The "Fix"

1. use "--" instead of "-c" to keep same process group across the 3 service processes
2. use "exec java" in the shell script so that java replaces the shell process and gets killed by "su"
3. trap kill signal in shell script and send kill to java process before exiting

Now, I have to wonder... upstart sends kill to process group. What about SYSV init (CentOS 5), or systemd (CentOS 7)?

So, option 1 is out for RHEL 7, because systemd sends kill signal to the pid and not to the process group like sysv or upstart.

Page updated

Report abuse