Description
Things get killed, probably due to our lack of skills in scheduling and resource limitation, we need to know why in order to self heal, and support customers in the field. For this reason, in lieu of forcing the world to install platform monitoring, we need to collect some process, memory and node stats. Thus we can see when we have a leak, or are responsible for blowing a limit, or that something else is causing the platform to get under pressure... which in our case is probably Server. We'll cross that bridge...