Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45814

Logging may cause large resource utilization

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • Cheshire-Cat
    • CheshireCat.Next
    • ns_server
    • None
    • 1

    Description

      We've seen occurrences where an unhandled exception cause the default crash dump report to dump call stack, along with the arguments, which potentially can be very large terms (ns_rebalancer_report for example).

      Erlang logging framework will attempt to pretty print the large term which will cause a large memory and cpu consumption. It has been observed that when that happen, ns_server becomes unresponsive for long minutes and is not able to process any REST calls.

      We can consider number of actions:
      1) Further reduce the size and depth
      2) Chronicle has customized logging filter to sanitize sensitive information. We can use the same technique to also control terms getting filtered out if they are too large.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Hareen Kancharla as the first step of the investigation, we want to be first confident and conclusive that the so called "pretty print" is causing this large resource impact. It would also be very interesting to directly link specifically why REST API are no longer responsive.

          For that, I think we can write a simple program that creates a large term, in both size and hierarchy depth, pass it on to few functions calls, and at the leaf function cause a crash. Be nice if we can capture few "observer" screenshots to back our theory.

          If and when we confirm, we can look at the proposals offered here and others, and validate they indeed resolve the issue.

          meni.hillel Meni Hillel (Inactive) added a comment - Hareen Kancharla as the first step of the investigation, we want to be first confident and conclusive that the so called "pretty print" is causing this large resource impact. It would also be very interesting to directly link specifically why REST API are no longer responsive. For that, I think we can write a simple program that creates a large term, in both size and hierarchy depth, pass it on to few functions calls, and at the leaf function cause a crash. Be nice if we can capture few "observer" screenshots to back our theory. If and when we confirm, we can look at the proposals offered here and others, and validate they indeed resolve the issue.
          hareen.kancharla Hareen Kancharla added a comment - - edited

          The attachment observer-mem-usage.png has details on the usage of memory. The large term was approximately 3.75 MB in size (erts_debug:flat_size(X)) and it took around 300K lines in the log-file to print the entire thing.

          1) Spike at 50 secs was when the large term was created.

          2) Spike at 20 secs was when the large term was printed. 

          The function to generate and print the grotesque large term: 

          recurse_large_term_start(X) ->
              recurse_large_term(X,[]).
          recurse_large_term(X, Acc) ->
              case X of
                  0 ->
                      Acc;
                  _ ->
                      recurse_large_term(X-1, [{{X, "large_term"}, Acc} | {Acc}])
              end.
          print_large_term(X) -> ?log_error("HKHK: formatting testing - ~p", [X]).

          hareen.kancharla Hareen Kancharla added a comment - - edited The attachment observer-mem-usage.png has details on the usage of memory. The large term was approximately 3.75 MB in size (erts_debug:flat_size(X)) and it took around 300K lines in the log-file to print the entire thing. 1) Spike at 50 secs was when the large term was created. 2) Spike at 20 secs was when the large term was printed.  The function to generate and print the grotesque large term:  recurse_large_term_start(X) -> recurse_large_term(X,[]). recurse_large_term(X, Acc) -> case X of 0 -> Acc; _ -> recurse_large_term(X- 1 , [{{X, "large_term" }, Acc} | {Acc}]) end. print_large_term(X) -> ?log_error( "HKHK: formatting testing - ~p" , [X]).
          meni.hillel Meni Hillel (Inactive) added a comment - Fixed by MB-45793

          People

            dfinlay Dave Finlay
            meni.hillel Meni Hillel (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty