Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32642

High swap rebalance time for kv nodes

    XMLWordPrintable

Details

    Description

      Build 6.0.0-1693

      Observed that one node swap rebalance time is ~50-60% higher than one node rebalance in/out time.

      First one node swap rebalance(3 -> 3) time: 230 min
      Second one node swap rebalance(3 -> 3) time: 295 min
      One node rebalance in(3 -> 4) time: 173 min
      One node rebalance out(4 -> 3) time: 176 min

      Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/249
      Logs-
      KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.12.zip
      KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.13.zip
      KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.14.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Poonam Dhavale , updated description with #nodes involved.

          mahesh.mandhare Mahesh Mandhare (Inactive) added a comment - Poonam Dhavale , updated description with #nodes involved.

           

          3 -> 3 swap rebalance moves 680 vBuckets whereas rebalance in(3 -> 4) and  rebalance out(4 -> 3) each move 512 vBuckets.

          So, it is expected 3 -> 3 swap to take longer than rebalance in(3 -> 4) and  rebalance out(4 -> 3).

           

           Is it expected to that swap will always take 50-60% longer? No, that depends on various other factors such as amount of data in the bucket, the load on the system, cpu and other resources available etc.

          In the showfast link I had posted earlier, swap took ~70% longer. That was for DGM.

          Here is the showfast link for rebalance sanity tests run by perf team. swap takes around ~41 - 55% longer.

          http://showfast.sc.couchbase.com/#/timeline/Linux/reb/kv/Sanity 

           

          Given that the observations in this ticket (swap rebalance time is ~50-60% higher) are within the range seen during the weekly perf tests, I think, we can close this ticket.

           

           

          poonam Poonam Dhavale added a comment -   3 -> 3 swap rebalance moves 680 vBuckets whereas rebalance in(3 -> 4) and  rebalance out(4 -> 3) each move 512 vBuckets. So, it is expected 3 -> 3 swap to take longer than rebalance in(3 -> 4) and  rebalance out(4 -> 3).    Is it expected to that swap will always take 50-60% longer? No, that depends on various other factors such as amount of data in the bucket, the load on the system, cpu and other resources available etc. In the showfast link I had posted earlier, swap took ~70% longer. That was for DGM. Here is the showfast link for rebalance sanity tests run by perf team. swap takes around ~41 - 55% longer. http://showfast.sc.couchbase.com/#/timeline/Linux/reb/kv/Sanity     Given that the observations in this ticket (swap rebalance time is ~50-60% higher) are within the range seen during the weekly perf tests, I think, we can close this ticket.    

          Thanks Poonam for investigating this issue. 

          It still is cause for concern that moving 25% more vbuckets takes 50% more time. I understand that this is what we see in the weekly perf tests as well. Feels like there is room for improvement here.

          We don't have to commit to fixing it immediately but can we investigate further to nail down the root cause?

          shivani.gupta Shivani Gupta added a comment - Thanks Poonam for investigating this issue.  It still is cause for concern that moving 25% more vbuckets takes 50% more time. I understand that this is what we see in the weekly perf tests as well. Feels like there is room for improvement here. We don't have to commit to fixing it immediately but can we investigate further to nail down the root cause?

           

          Hi Shivani,

          In the 3->3 swap case described above, it is moving 33% more vBuckets. 

          Regarding why it is taking 50-60% more time when it is moving only 33% more vBuckets:

          In addition to moving higher #of vBuckets, swap also has different rebalance characteristic when compared to reb-in/reb-out. This affects the vBucket scheduling logic which also plays a role in how fast a rebalance can go. 

          The vBucket scheduling logic (described in the link below) allows limited # of backfills and moves for nodes that are acting as the old or the new master.

          Consider a 3 node cluster, N0, N1, N2.

          • 3->3 swap rebalance to remove N2 and replace it with N3.
            • 341 active vBuckets will move from N2 to N3. N2 is the old master for all of these.
            • 341 replica vBuckets will move, the master for these is one of N0 or N1.
          • 3->4 rebalance in to add N3:
            • 256 active vBuckets will move to N3. The master for these is one of N0, N1, N2.
            • 256 replica vBuckets will move to N3. The master for these is one of N0, N1, N2.
          • 4 -> 3 rebalance out to remove N3 will have similar characteristics as described above for 3 ->4 rebalance in.

          So, in above swap rebalance, one node (N2) is the old master for majority of the vBuckets (341). 

          Whereas for reb-in & reb-out, the current/old master for vBucket movements are more or less evenly distributed across the 3 nodes. (170 each).

          This affects the order in which vBuckets are moved and how many are moved at a time.

          But, I have added a note to the design doc below to investigate whether we can improve on swap rebalance time. This will be for Cheshire Cat.

          https://docs.google.com/document/d/1pqNY7GufVCyiEk8ikkltyCu-15KtqZKYlzpNSKiV2mI/edit#heading=h.4iy0vndbwik5 

           

           

          poonam Poonam Dhavale added a comment -   Hi Shivani, In the 3->3 swap case described above, it is moving 33% more vBuckets.  Regarding why it is taking 50-60% more time when it is moving only 33% more vBuckets: In addition to moving higher #of vBuckets, swap also has different rebalance characteristic when compared to reb-in/reb-out. This affects the vBucket scheduling logic which also plays a role in how fast a rebalance can go.  The vBucket scheduling logic (described in the link below) allows limited # of backfills and moves for nodes that are acting as the old or the new master. Consider a 3 node cluster, N0, N1, N2. 3->3 swap rebalance to remove N2 and replace it with N3. 341 active vBuckets will move from N2 to N3. N2 is the old master for all of these. 341 replica vBuckets will move, the master for these is one of N0 or N1. 3->4 rebalance in to add N3: 256 active vBuckets will move to N3. The master for these is one of N0, N1, N2. 256 replica vBuckets will move to N3. The master for these is one of N0, N1, N2. 4 -> 3 rebalance out to remove N3 will have similar characteristics as described above for 3 ->4 rebalance in. So, in above swap rebalance, one node (N2) is the old master for majority of the vBuckets (341).  Whereas for reb-in & reb-out, the current/old master for vBucket movements are more or less evenly distributed across the 3 nodes. (170 each). This affects the order in which vBuckets are moved and how many are moved at a time. But, I have added a note to the design doc below to investigate whether we can improve on swap rebalance time. This will be for Cheshire Cat. https://docs.google.com/document/d/1pqNY7GufVCyiEk8ikkltyCu-15KtqZKYlzpNSKiV2mI/edit#heading=h.4iy0vndbwik5      

          Thanks Poonam for patiently explaining in detail. It makes more sense now why swap takes longer.

          Since you are already tracking investigating Cheshire Cat improvement in your document, we can close this ticket.

          shivani.gupta Shivani Gupta added a comment - Thanks Poonam for patiently explaining in detail. It makes more sense now why swap takes longer. Since you are already tracking investigating Cheshire Cat improvement in your document, we can close this ticket.

          People

            ajit.yagaty Ajit Yagaty [X] (Inactive)
            mahesh.mandhare Mahesh Mandhare (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty