Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25993

vbucket move doesn't seem to work when vbuckets are large

    XMLWordPrintable

Details

    Description

      Steps:

      • Create a cluster with 3 nodes, 1 bucket, and 4 vbuckets (no replicas)
      • Disable auto-compaction (because of MB-25982)
      • Load 800M items (~512B)
      • Add a new node to the cluster
      • Start rebalance

      Obviously the test is absolutely artificial. It does highlight some import issues.

      First of all, memcached on the receiving node goes way above the high watermark and gets swapped. The slowness is massive.

      rebalance fails after 4 hours:

      [user:error,2017-09-10T05:18:59.756-07:00,ns_1@172.23.96.117:<0.1380.0>:ns_orchestrator:do_log_rebalance_completion:1249]Rebalance exited with reason {failed_to_initiate_compaction,"bucket-1",
                                       'ns_1@172.23.96.120',#Ref<21717.0.11.15106>}
      

      master events:

      {"vbucket":1,"type":"vbucketMoveStart","ts":1505030966.350933,"pid":"<0.1726.34>","node":"ns_1@172.23.96.117","bucket":"bucket-1","chainBefore":["172.23.96.117:11209"],"chainAfter":["172.23.96.120:11209"]}
      {"type":"compactionInhibited","ts":1505030966.407942,"node":"172.23.96.117:11209","bucket":"bucket-1"}
      {"type":"compactionInhibited","ts":1505030966.407959,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"type":"compactionInhibited","ts":1505030966.408001,"node":"172.23.96.117:11209","bucket":"bucket-1"}
      {"type":"compactionInhibited","ts":1505030966.408019,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"type":"dcpReplicatorStart","ts":1505030966.750747,"producerNode":"ns_1@172.23.96.117","producerConn":"<21717.26063.17>","pid":"<21717.26025.17>","consumerNode":"ns_1@172.23.96.120","consumerConn":"<21717.26031.17>","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpAddStream","ts":1505030966.750924,"streamType":"add","side":"consumer","pid":"<21717.26031.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpAddStreamResponse","ts":1505030966.768431,"success":true,"status":"success","side":"consumer","rawStatus":0,"pid":"<21717.26031.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"indexingInitiated","ts":1505030966.770189,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"backfillPhaseEnded","ts":1505030966.771487,"bucket":"bucket-1"}
      {"vbucket":1,"type":"seqnoWaitingStarted","ts":1505030966.77191,"seqno":200022292,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"seqnoWaitingEnded","ts":1505045939.149706,"seqno":200022292,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"vbucketStateChange","ts":1505045939.150008,"state":"active","host":"172.23.96.117:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"seqnoWaitingStarted","ts":1505045939.244877,"seqno":200022292,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"seqnoWaitingEnded","ts":1505045939.250352,"seqno":200022292,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"waitIndexUpdatedStarted","ts":1505045939.250386,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"waitIndexUpdatedEnded","ts":1505045939.315197,"node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"takeoverStarted","ts":1505045939.315302,"oldMaster":"172.23.96.117:11209","node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpCloseStream","ts":1505045939.421114,"side":"consumer","pid":"<21717.26031.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpCloseStream","ts":1505045939.421163,"side":"producer","pid":"<21717.26063.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpCloseStreamResponse","ts":1505045939.449897,"success":true,"status":"success","side":"consumer","rawStatus":0,"pid":"<21717.26031.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpCloseStreamResponse","ts":1505045939.449964,"success":true,"status":"success","side":"producer","rawStatus":0,"pid":"<21717.26031.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpAddStream","ts":1505045939.450316,"streamType":"takeover","side":"consumer","pid":"<21717.26031.17>","opaque":1,"node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpSetVbucketState","ts":1505045939.508115,"state":"pending","pid":"<21717.26031.17>","node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"dcpSetVbucketState","ts":1505045939.510226,"state":"active","pid":"<21717.26031.17>","node":"ns_1@172.23.96.120","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"takeoverEnded","ts":1505045939.510412,"oldMaster":"172.23.96.117:11209","node":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"vbucketStateChange","ts":1505045939.510961,"state":"active","host":"172.23.96.120:11209","bucket":"bucket-1"}
      {"vbucket":1,"type":"vbucketMoverTerminate","ts":1505045939.587338,"reason":"normal","pid":"<0.1726.34>","node":"ns_1@172.23.96.117","bucket":"bucket-1"}
      {"vbucket":1,"type":"updateMap","ts":1505045939.587381,"bucket":"bucket-1","chainBefore":["172.23.96.117:11209"],"chainAfter":["172.23.96.120:11209"]}
      {"type":"dcpReplicatorTerminate","ts":1505045939.588695,"reason":"shutdown","producerNode":"ns_1@172.23.96.117","producerConn":"<21717.26063.17>","pid":"<21717.26025.17>","consumerNode":"ns_1@172.23.96.120","consumerConn":"<21717.26031.17>","connectionName":"replication:ns_1@172.23.96.117->ns_1@172.23.96.120:bucket-1","bucket":"bucket-1"}
      {"vbucket":1,"type":"vbucketMoveDone","ts":1505045939.705132,"bucket":"bucket-1"}
      

      Graphs: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hera_510-1137_rebalance_5ecc

      Attachments

        1. mem_used.png
          mem_used.png
          1.12 MB
        2. swap_120.png
          swap_120.png
          357 kB

        Issue Links

          Activity

            People

              pavelpaulau Pavel Paulau (Inactive)
              pavelpaulau Pavel Paulau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty