Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
Cheshire-Cat
-
None
-
1
Description
We've seen occurrences where an unhandled exception cause the default crash dump report to dump call stack, along with the arguments, which potentially can be very large terms (ns_rebalancer_report for example).
Erlang logging framework will attempt to pretty print the large term which will cause a large memory and cpu consumption. It has been observed that when that happen, ns_server becomes unresponsive for long minutes and is not able to process any REST calls.
We can consider number of actions:
1) Further reduce the size and depth
2) Chronicle has customized logging filter to sanitize sensitive information. We can use the same technique to also control terms getting filtered out if they are too large.
Hareen Kancharla as the first step of the investigation, we want to be first confident and conclusive that the so called "pretty print" is causing this large resource impact. It would also be very interesting to directly link specifically why REST API are no longer responsive.
For that, I think we can write a simple program that creates a large term, in both size and hierarchy depth, pass it on to few functions calls, and at the leaf function cause a crash. Be nice if we can capture few "observer" screenshots to back our theory.
If and when we confirm, we can look at the proposals offered here and others, and validate they indeed resolve the issue.