Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19791

Investigate performance of DCP 'Processer' task and nonIO thread utilization

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Major
    • None
    • 4.1.1
    • couchbase-bucket
    • AWS

    Description

      Context

      During analysis of a customer issue (stalled DCP / high memory usage), it was observed that all the nonIO threads were very busy compared to the front-end. From `top`:

         PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
       39747 couchba+  20   0  0.104t 0.104t   3732 R 94.4 88.6  59:35.91 mc:writer_6
       39751 couchba+  20   0  0.104t 0.104t   3732 R 94.4 88.6 203:08.92 mc:nonio_10
       39752 couchba+  20   0  0.104t 0.104t   3732 R 88.9 88.6 203:10.47 mc:nonio_11
       22790 couchba+  20   0  0.104t 0.104t   3732 S 50.0 88.6 131:27.50 mc:worker +
      ... no other memcached threads above 0.1% ...
      

      Looking at the "dispatcher" stats, a DCP Processor task had been running for 2m28s!

      nonio_worker_11
           cur_time: 1464692987176104
           runtime:  2m:28s
           state:    running
           task:     Processing buffered items for eq_dcpq:replication:ns_1@cb-104->ns_1@cb-115:BucketName
           waketime: 1464680706630924
      

      Note this is similar to MB-18452 - where these tasks never yield if they still have work to do, however in this instance I'm more concerned at how "backed up" the nonIO threads have got - even through they are not yielding, they have a large amount of work pending - many streams have more than 500,000 items pending, even though only 50% of one front-end thread is busy:

      $ grep -E "stream_\d+_buffer_items" stats.log 
       eq_dcpq:replication:ns_1@cb-104->ns_1@cb-115:Bucket:stream_76_buffer_items:           45452
       eq_dcpq:replication:ns_1@cb-104->ns_1@cb-115:Bucket:stream_77_buffer_items:           0
       eq_dcpq:replication:ns_1@cb-107->ns_1@cb-115:Bucket:stream_131_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-107->ns_1@cb-115:Bucket:stream_132_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-110->ns_1@cb-115:Bucket:stream_182_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-111->ns_1@cb-115:Bucket:stream_199_buffer_items:          69149
       eq_dcpq:replication:ns_1@cb-111->ns_1@cb-115:Bucket:stream_200_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-122->ns_1@cb-115:Bucket:stream_398_buffer_items:          315964
       eq_dcpq:replication:ns_1@cb-122->ns_1@cb-115:Bucket:stream_399_buffer_items:          67237
       eq_dcpq:replication:ns_1@cb-132->ns_1@cb-115:Bucket:stream_580_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-132->ns_1@cb-115:Bucket:stream_581_buffer_items:          175071
       eq_dcpq:replication:ns_1@cb-135->ns_1@cb-115:Bucket:stream_633_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-135->ns_1@cb-115:Bucket:stream_634_buffer_items:          556340
       eq_dcpq:replication:ns_1@cb-140->ns_1@cb-115:Bucket:stream_723_buffer_items:          537877
       eq_dcpq:replication:ns_1@cb-144->ns_1@cb-115:Bucket:stream_792_buffer_items:          0
       eq_dcpq:replication:ns_1@cb-144->ns_1@cb-115:Bucket:stream_793_buffer_items:          583909
       eq_dcpq:replication:ns_1@cb-156->ns_1@cb-115:Bucket:stream_1010_buffer_items:         84960
       eq_dcpq:replication:ns_1@cb-156->ns_1@cb-115:Bucket:stream_1011_buffer_items:         0
      

      Task

      Investigate the performance of this component, to see if we can see how it has become so backed up. Check if this still occurs on Watson.

      See below for logs etc.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty