Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
7.1.0
-
Build number: 7.1.0-1361
OS: Amazon Linux 2
ARM instance: m6g.large
2vCPU
8GB Memory
40GB EBS
-
Triaged
-
1
-
Unknown
-
KV 2021-Nov
Description
During rebalance performance tests on ARM AWS instances, the tests consistently hang - an example job can be found here along with the logs:
http://perf.jenkins.couchbase.com/job/Cloud-Tester/600/
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-10-07T223241-ns_1%40ec2-3-219-56-9.compute-1.amazonaws.com.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-10-07T223241-ns_1%40ec2-3-223-6-164.compute-1.amazonaws.com.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-10-07T223241-ns_1%40ec2-44-195-22-82.compute-1.amazonaws.com.zip
The rebalance seems to hang on 'still waiting for backfill on connection', this happens 115 times in the logs:
[rebalance:debug,2021-10-07T22:35:41.445Z,ns_1@ec2-44-195-22-82.compute-1.amazonaws.com:<0.1108.3>:dcp_replicator:wait_for_data_move_on_one_node:192]Still waiting for backfill on connection "replication:ns_1@ec2-44-195-22-82.compute-1.amazonaws.com->ns_1@ec2-3-223-6-164.compute-1.amazonaws.com:bucket-1" bucket "bucket-1", partition 745, last estimate {0,0, <<"calculating-item-count">>} |
During this time memcached keeps returning <<"calculating-item-count">> with no estimation, CPU usage also spikes at this time.
Attachments
Issue Links
Activity
Field | Original Value | New Value |
---|---|---|
Component/s | couchbase-bucket [ 10173 ] | |
Component/s | memcached [ 11621 ] |
Affects Version/s | Neo [ 17615 ] |
Fix Version/s | Neo [ 17615 ] |
Assignee | Trond Norbye [ trond ] | Daniel Owen [ owend ] |
Assignee | Daniel Owen [ owend ] | Dave Rigby [ drigby ] |
Attachment | Screenshot 2021-10-20 at 13.37.03.png [ 164992 ] |
Attachment | Screenshot 2021-10-20 at 13.41.23.png [ 164993 ] |
Attachment | Screenshot 2021-10-20 at 13.45.15.png [ 164994 ] |
Attachment | Screenshot 2021-10-20 at 13.41.23.png [ 164993 ] |
Attachment | Screenshot 2021-10-20 at 13.37.03.png [ 164992 ] |
Attachment | Screenshot 2021-10-20 at 13.45.15.png [ 164994 ] |
Attachment | Screenshot 2021-10-20 at 13.37.03.png [ 164995 ] |
Attachment | Screenshot 2021-10-20 at 13.49.48.png [ 164996 ] |
Attachment | Screenshot 2021-10-20 at 13.52.05.png [ 164997 ] |
Attachment | x86 dashboard.png [ 165003 ] |
Assignee | Dave Rigby [ drigby ] | Paolo Cocchi [ paolo.cocchi ] |
Summary | AWS ARM m6g.large Stuck Calculating Item Count | AWS m6g.large rebalance hung due to backfilling paused |
Rank | Ranked higher |
Epic Link |
|
Rank | Ranked higher |
Attachment | cbcollect_info_ns_1@ec2-3-235-136-83.compute-1.amazonaws.com_20211020-120410.zip [ 167716 ] | |
Attachment | cbcollect_info_ns_1@ec2-3-237-95-29.compute-1.amazonaws.com_20211020-120409.zip [ 167717 ] | |
Attachment | cbcollect_info_ns_1@ec2-3-238-93-68.compute-1.amazonaws.com_20211020-120410.zip [ 167718 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Sprint | KV 2021-Nov [ 1866 ] |
Rank | Ranked lower |
Attachment | MB-49037_dcp-backoff.png [ 168523 ] | |
Attachment | MB-49037_mem.png [ 168524 ] |
Attachment | MB-49037_ht-mem.png [ 168926 ] |
Attachment | MB-49037_HT-ejection.png [ 169207 ] |
Attachment |
|
Attachment | MB-49037_HT-ejection.png [ 169440 ] |
Attachment | MB-49037_b1695.png [ 169483 ] |
Triage | Untriaged [ 10351 ] | Triaged [ 10350 ] |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Resolved [ 5 ] |
Labels | arm memcached | arm memcached performance |
Assignee | Paolo Cocchi [ paolo.cocchi ] | Daniel Owen [ owend ] |
Status | Resolved [ 5 ] | Closed [ 6 ] |