Details
-
Task
-
Resolution: Unresolved
-
Major
-
None
-
4.1.1
-
AWS
Description
Context
During analysis of a customer issue (stalled DCP / high memory usage), it was observed that all the nonIO threads were very busy compared to the front-end. From `top`:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
39747 couchba+ 20 0 0.104t 0.104t 3732 R 94.4 88.6 59:35.91 mc:writer_6
|
39751 couchba+ 20 0 0.104t 0.104t 3732 R 94.4 88.6 203:08.92 mc:nonio_10
|
39752 couchba+ 20 0 0.104t 0.104t 3732 R 88.9 88.6 203:10.47 mc:nonio_11
|
22790 couchba+ 20 0 0.104t 0.104t 3732 S 50.0 88.6 131:27.50 mc:worker +
|
... no other memcached threads above 0.1% ...
|
Looking at the "dispatcher" stats, a DCP Processor task had been running for 2m28s!
nonio_worker_11
|
cur_time: 1464692987176104
|
runtime: 2m:28s
|
state: running
|
task: Processing buffered items for eq_dcpq:replication:ns_1@cb-104->ns_1@cb-115:BucketName
|
waketime: 1464680706630924
|
Note this is similar to MB-18452 - where these tasks never yield if they still have work to do, however in this instance I'm more concerned at how "backed up" the nonIO threads have got - even through they are not yielding, they have a large amount of work pending - many streams have more than 500,000 items pending, even though only 50% of one front-end thread is busy:
$ grep -E "stream_\d+_buffer_items" stats.log
|
eq_dcpq:replication:ns_1@cb-104->ns_1@cb-115:Bucket:stream_76_buffer_items: 45452
|
eq_dcpq:replication:ns_1@cb-104->ns_1@cb-115:Bucket:stream_77_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-107->ns_1@cb-115:Bucket:stream_131_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-107->ns_1@cb-115:Bucket:stream_132_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-110->ns_1@cb-115:Bucket:stream_182_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-111->ns_1@cb-115:Bucket:stream_199_buffer_items: 69149
|
eq_dcpq:replication:ns_1@cb-111->ns_1@cb-115:Bucket:stream_200_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-122->ns_1@cb-115:Bucket:stream_398_buffer_items: 315964
|
eq_dcpq:replication:ns_1@cb-122->ns_1@cb-115:Bucket:stream_399_buffer_items: 67237
|
eq_dcpq:replication:ns_1@cb-132->ns_1@cb-115:Bucket:stream_580_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-132->ns_1@cb-115:Bucket:stream_581_buffer_items: 175071
|
eq_dcpq:replication:ns_1@cb-135->ns_1@cb-115:Bucket:stream_633_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-135->ns_1@cb-115:Bucket:stream_634_buffer_items: 556340
|
eq_dcpq:replication:ns_1@cb-140->ns_1@cb-115:Bucket:stream_723_buffer_items: 537877
|
eq_dcpq:replication:ns_1@cb-144->ns_1@cb-115:Bucket:stream_792_buffer_items: 0
|
eq_dcpq:replication:ns_1@cb-144->ns_1@cb-115:Bucket:stream_793_buffer_items: 583909
|
eq_dcpq:replication:ns_1@cb-156->ns_1@cb-115:Bucket:stream_1010_buffer_items: 84960
|
eq_dcpq:replication:ns_1@cb-156->ns_1@cb-115:Bucket:stream_1011_buffer_items: 0
|
Task
Investigate the performance of this component, to see if we can see how it has become so backed up. Check if this still occurs on Watson.
See below for logs etc.
Attachments
Issue Links
- relates to
-
MB-19837 Increase default number of ep-engine nonIO threads
- Closed