Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47094

Slow processing of audit messages might lead to increase in RSS memory

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      =proc:<0.241.0>
      State: Garbing
      Name: couch_audit
      Spawned as: proc_lib:init_p/5
      Spawned by: <0.240.0>
      Started: Wed May  5 21:51:04 2021
      Message queue length: 362791
      Number of heap fragments: 1
      Heap fragment data: 57293
      Link list: [#Port<0.9683>, <0.240.0>, {from,<0.187.0>,#Ref<0.474767987.3728211972.78559>}]
      Reductions: 1945363027
      Stack+heap: 45988046
      OldHeap: 850246556
      Heap unused: 38139
      OldHeap unused: 56877
      BinVHeap: 34289
      OldBinVHeap: 2184
      BinVHeap unused: 12133
      OldBinVHeap unused: 72926   
      Memory: 7201540200
      New heap start: 7F474AEE5028
      New heap top: 7F4760D76EC0
      Stack top: 7F4760DC1540
      Stack end: 7F4760DC1698
      Old heap start: 7F42C490F028
      Old heap top: 7F4459F7DBA0
      Old heap end: 7F4459FECD08
      "status" 109L, 3232B written
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Jeelan Poola we are running the following tests with different types of queries like basic, large fields, range, groups, Compute-intensive and multiple emitted

          1 node, 20M docs, 3 views, 100 updates/sec, 100 queries/sec, stale=false 

          vikas.chaudhary Vikas Chaudhary added a comment - Jeelan Poola  we are running the following tests with different types of queries like basic, large fields, range, groups, Compute-intensive and multiple emitted 1 node, 20M docs, 3 views, 100 updates/sec, 100 queries/sec, stale=false

          Jeelan Poola, Vikas Chaudhary for dropping audit events to INFO logs there should be many threads(maybe >=5000/(num data nodes)) doing high query workload at same time.
          I tested it with changing couchdb code

          [couchdb:error,2021-07-20T13:19:16.152+05:30,couchdb_n_0@cb.local:couch_audit<0.289.0>:couch_log:error:33]Dropping audit records to info log
          [couchdb:info,2021-07-20T13:19:16.152+05:30,couchdb_n_0@cb.local:couch_audit<0.289.0>:couch_log:info:30]Dropped audit entry: <ud>{40963,
                                    [{view_name,<<"test">>},
                                     {ddoc_name,<<"Testing">>},
                                     {bucket,<<"src">>},
                                     {request_type,external},
                                     {query_parameters,{[]}},
                                     {status,200},
                                     {timestamp,<<"2021-07-20T13:19:16.152+05:30">>},
                                     {user_agent,<<"curl/7.64.1">>},
                                     {auth,<<"Administrator">>},
                                     {real_userid,
                                         {[{domain,builtin},
                                           {user,<<"Administrator">>}]}},
                                     {local,{[{ip,<<"127.0.0.1">>},{port,58381}]}},
                                     {remote,{[{ip,<<"127.0.0.1">>},{port,58381}]}}]}</ud>
          

          ankit.prabhu Ankit Prabhu added a comment - Jeelan Poola , Vikas Chaudhary for dropping audit events to INFO logs there should be many threads(maybe >=5000/(num data nodes)) doing high query workload at same time. I tested it with changing couchdb code [couchdb:error,2021-07-20T13:19:16.152+05:30,couchdb_n_0@cb.local:couch_audit<0.289.0>:couch_log:error:33]Dropping audit records to info log [couchdb:info,2021-07-20T13:19:16.152+05:30,couchdb_n_0@cb.local:couch_audit<0.289.0>:couch_log:info:30]Dropped audit entry: <ud>{40963, [{view_name,<<"test">>}, {ddoc_name,<<"Testing">>}, {bucket,<<"src">>}, {request_type,external}, {query_parameters,{[]}}, {status,200}, {timestamp,<<"2021-07-20T13:19:16.152+05:30">>}, {user_agent,<<"curl/7.64.1">>}, {auth,<<"Administrator">>}, {real_userid, {[{domain,builtin}, {user,<<"Administrator">>}]}}, {local,{[{ip,<<"127.0.0.1">>},{port,58381}]}}, {remote,{[{ip,<<"127.0.0.1">>},{port,58381}]}}]}</ud>

          Jeelan Poola we can have a functional test to cover the drop part, not the performance test. RSS from 6.6.2 vs 6.6.3 is similar

          http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_662-9588_access_35cf&snapshot=leto_663-9762_access_57f9 

          vikas.chaudhary Vikas Chaudhary added a comment - Jeelan Poola  we can have a functional test to cover the drop part, not the performance test. RSS from 6.6.2 vs 6.6.3 is similar http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_662-9588_access_35cf&snapshot=leto_663-9762_access_57f9  
          jeelan.poola Jeelan Poola added a comment -

          Thank you Vikas Chaudhary! Yes, we should have a functional test to cover the drop part. Could you please help with that?
          Ankit Prabhu Is 100 queries per sec sufficient to test customer equivalent views query work load?

          jeelan.poola Jeelan Poola added a comment - Thank you Vikas Chaudhary ! Yes, we should have a functional test to cover the drop part. Could you please help with that? Ankit Prabhu Is 100 queries per sec sufficient to test customer equivalent views query work load?

          Filed CBQE-7150 for functional test

          vikas.chaudhary Vikas Chaudhary added a comment - Filed CBQE-7150 for functional test

          People

            vikas.chaudhary Vikas Chaudhary
            malarky Chris Malarky
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty