On a specific host, one process sometimes crashes, so kernel dumping was enabled for it. Once a dump was recorded, but since then there have been three more drops - there are no dumps.
$ cat /etc/security/limits.conf | grep core | grep -v '#' * - core unlimited $ cat /proc/sys/kernel/core_pattern /tmp/core.%e.%p.%h.%t $ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2063246 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 2063246 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited $ cat /proc/$(pgrep myprocess)/limits | grep core Max core file size unlimited unlimited bytes There is enough free space (about 17Gb, the dump takes 2-4 Gb). No one stopped the manual process. That was exactly the fall, I judge by the logs:
monit:
[NOVT Feb 12 05:07:21] error : 'myprocess' process is not running [NOVT Feb 12 05:07:21] info : 'myprocess' trying to restart [NOVT Feb 12 05:07:21] info : 'myprocess' start: /etc/init.d/myprocess [NOVT Feb 12 05:09:21] info : 'myprocess' process is running with pid 22233 nginx, which sends requests to this process (left only the necessary). We see that at 05:06:20 already returned 502.
1.2.3.4 myhost - [12/Feb/2016:05:05:49 +0600] "POST someurl" 200 2659 ... 1.2.3.4 myhost - [12/Feb/2016:05:05:49 +0600] "POST someurl" 200 933 ... 1.2.3.4 myhost - [12/Feb/2016:05:06:20 +0600] "POST someurl" 502 166 ... I specifically tested and made sure that core dumps are written to the exact same configuration, including when there is already one dump (used kill -SIGSEGV pid ). UPD: tested right on this host: dump written.
The documentation lists possible causes, but it seems that no conditions are changing, so dumps should be recorded always or never.
Questions:
- Can the linux process somehow shut down abnormally so that it does not fall under the conditions in which the dump is not initialized?
- What else could be the reason, where to dig, what to research?
cat /proc/$(pgrep myprocess)/limits | grep corecat /proc/$(pgrep myprocess)/limits | grep coreconfirms that a process that is already running has the correct limits. So far we are digging in the direction of the code with which the process ends. Some codes except0also do not cause a dump record. When we dig up - I will write a detailed answer. ) - Nick Volynkin ♦