Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49037

AWS m6g.large rebalance hung due to backfilling paused

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.1.0
    • 7.1.0
    • couchbase-bucket
    • Build number: 7.1.0-1361

      OS: Amazon Linux 2
      ARM instance: m6g.large

      2vCPU
      8GB Memory
      40GB EBS

    Description

      During rebalance performance tests on ARM AWS instances, the tests consistently hang - an example job can be found here along with the logs:

      http://perf.jenkins.couchbase.com/job/Cloud-Tester/600/

       

      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-10-07T223241-ns_1%40ec2-3-219-56-9.compute-1.amazonaws.com.zip
      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-10-07T223241-ns_1%40ec2-3-223-6-164.compute-1.amazonaws.com.zip
      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-10-07T223241-ns_1%40ec2-44-195-22-82.compute-1.amazonaws.com.zip

       

      The rebalance seems to hang on 'still waiting for backfill on connection', this happens 115 times in the logs:

       

      [rebalance:debug,2021-10-07T22:35:41.445Z,ns_1@ec2-44-195-22-82.compute-1.amazonaws.com:<0.1108.3>:dcp_replicator:wait_for_data_move_on_one_node:192]Still waiting for backfill on connection "replication:ns_1@ec2-44-195-22-82.compute-1.amazonaws.com->ns_1@ec2-3-223-6-164.compute-1.amazonaws.com:bucket-1" bucket "bucket-1", partition 745, last estimate {0,0, <<"calculating-item-count">>}

      During this time memcached keeps returning <<"calculating-item-count">> with no estimation, CPU usage also spikes at this time.

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            sean.corrigan Sean Corrigan created issue -
            drigby Dave Rigby made changes -
            Field Original Value New Value
            Component/s couchbase-bucket [ 10173 ]
            Component/s memcached [ 11621 ]
            drigby Dave Rigby made changes -
            Affects Version/s Neo [ 17615 ]
            drigby Dave Rigby made changes -
            Fix Version/s Neo [ 17615 ]
            drigby Dave Rigby made changes -
            Assignee Trond Norbye [ trond ] Daniel Owen [ owend ]
            drigby Dave Rigby made changes -
            Link This issue relates to MB-48825 [ MB-48825 ]
            drigby Dave Rigby made changes -
            Assignee Daniel Owen [ owend ] Dave Rigby [ drigby ]
            drigby Dave Rigby made changes -
            Attachment Screenshot 2021-10-20 at 13.37.03.png [ 164992 ]
            drigby Dave Rigby made changes -
            Attachment Screenshot 2021-10-20 at 13.41.23.png [ 164993 ]
            drigby Dave Rigby made changes -
            Attachment Screenshot 2021-10-20 at 13.45.15.png [ 164994 ]
            drigby Dave Rigby made changes -
            Attachment Screenshot 2021-10-20 at 13.41.23.png [ 164993 ]
            drigby Dave Rigby made changes -
            Attachment Screenshot 2021-10-20 at 13.37.03.png [ 164992 ]
            drigby Dave Rigby made changes -
            Attachment Screenshot 2021-10-20 at 13.45.15.png [ 164994 ]
            drigby Dave Rigby made changes -
            drigby Dave Rigby made changes -
            drigby Dave Rigby made changes -
            drigby Dave Rigby made changes -
            Attachment x86 dashboard.png [ 165003 ]
            drigby Dave Rigby made changes -
            Assignee Dave Rigby [ drigby ] Paolo Cocchi [ paolo.cocchi ]
            drigby Dave Rigby made changes -
            Summary AWS ARM m6g.large Stuck Calculating Item Count AWS m6g.large rebalance hung due to backfilling paused
            owend Daniel Owen made changes -
            Rank Ranked higher
            owend Daniel Owen made changes -
            Link This issue is duplicated by MB-48825 [ MB-48825 ]
            owend Daniel Owen made changes -
            Link This issue relates to MB-48825 [ MB-48825 ]
            owend Daniel Owen made changes -
            Epic Link MB-38441 [ 123649 ]
            owend Daniel Owen made changes -
            Rank Ranked higher
            owend Daniel Owen made changes -
            Link This issue relates to MB-49134 [ MB-49134 ]
            paolo.cocchi Paolo Cocchi made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            paolo.cocchi Paolo Cocchi made changes -
            Sprint KV 2021-Nov [ 1866 ]
            owend Daniel Owen made changes -
            Rank Ranked lower
            paolo.cocchi Paolo Cocchi made changes -
            Attachment MB-49037_dcp-backoff.png [ 168523 ]
            Attachment MB-49037_mem.png [ 168524 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment MB-49037_ht-mem.png [ 168926 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment MB-49037_HT-ejection.png [ 169207 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment MB-49037_HT-ejection.png [ 169207 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment MB-49037_HT-ejection.png [ 169440 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment MB-49037_b1695.png [ 169483 ]
            paolo.cocchi Paolo Cocchi made changes -
            Triage Untriaged [ 10351 ] Triaged [ 10350 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            ritam.sharma Ritam Sharma made changes -
            Labels arm memcached arm memcached performance
            owend Daniel Owen made changes -
            Assignee Paolo Cocchi [ paolo.cocchi ] Daniel Owen [ owend ]
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              owend Daniel Owen
              sean.corrigan Sean Corrigan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty