Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32642

High swap rebalance time for kv nodes

    XMLWordPrintable

Details

    Description

      Build 6.0.0-1693

      Observed that one node swap rebalance time is ~50-60% higher than one node rebalance in/out time.

      First one node swap rebalance(3 -> 3) time: 230 min
      Second one node swap rebalance(3 -> 3) time: 295 min
      One node rebalance in(3 -> 4) time: 173 min
      One node rebalance out(4 -> 3) time: 176 min

      Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/249
      Logs-
      KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.12.zip
      KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.13.zip
      KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.14.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          mahesh.mandhare Mahesh Mandhare (Inactive) created issue -
          drigby Dave Rigby added a comment -

          Could you give a bit more background on this issue? You've marked it as a bug, but this just sounds like an observation that swap rebalance takes longer than a single rebalance in?

          As such, that doesn't sound like a bug to me (possibly an improvement?) - unless it's a regress from some previous build.

          (As an aside, swap-rebalance is still quicker than node_in + node_out, so not clear this is even unexpected behaviour).

          If you think this is a bug, please update to include expected behaviour and actual behaviour. If not then change to an improvement making clear what you think should be improved.

          drigby Dave Rigby added a comment - Could you give a bit more background on this issue? You've marked it as a bug, but this just sounds like an observation that swap rebalance takes longer than a single rebalance in? As such, that doesn't sound like a bug to me (possibly an improvement?) - unless it's a regress from some previous build. (As an aside, swap-rebalance is still quicker than node_in + node_out, so not clear this is even unexpected behaviour). If you think this is a bug, please update to include expected behaviour and actual behaviour. If not then change to an improvement making clear what you think should be improved.
          drigby Dave Rigby made changes -
          Field Original Value New Value
          Assignee Dave Rigby [ drigby ] Mahesh Mandhare [ mahesh.mandhare ]
          mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
          Component/s ns_server [ 10019 ]
          Component/s couchbase-bucket [ 10173 ]
          shivani.gupta Shivani Gupta added a comment - - edited

          Dave Rigby this is more of an investigation at this point i.e. why is the swap rebalance time so much more than rebalance-in? Is there a legitimate explanation for this? Or is it odd behavior?

          This has been observed on the high bucket density testing with 30 buckets where rebalance time is already high. So swap rebalance increasing by 50% really spikes the rebalance time.

          If you think it should first be investigated by ns_server team, please re-assign.

          I am changing it to improvement and re-assigning to you so that you can suggest next steps.

          shivani.gupta Shivani Gupta added a comment - - edited Dave Rigby this is more of an investigation at this point i.e. why is the swap rebalance time so much more than rebalance-in? Is there a legitimate explanation for this? Or is it odd behavior? This has been observed on the high bucket density testing with 30 buckets where rebalance time is already high. So swap rebalance increasing by 50% really spikes the rebalance time. If you think it should first be investigated by ns_server team, please re-assign. I am changing it to improvement and re-assigning to you so that you can suggest next steps.
          shivani.gupta Shivani Gupta made changes -
          Issue Type Bug [ 1 ] Task [ 3 ]
          shivani.gupta Shivani Gupta made changes -
          Issue Type Task [ 3 ] Improvement [ 4 ]
          shivani.gupta Shivani Gupta made changes -
          Assignee Mahesh Mandhare [ mahesh.mandhare ] Dave Rigby [ drigby ]
          drigby Dave Rigby added a comment -

          ns_server orchestrates rebalances, including when vBuckets are moved. I’d suggest any analysis starts there. Reassigning.

          drigby Dave Rigby added a comment - ns_server orchestrates rebalances, including when vBuckets are moved. I’d suggest any analysis starts there. Reassigning.
          drigby Dave Rigby made changes -
          Assignee Dave Rigby [ drigby ] Ajit Yagaty [ ajit.yagaty ]

          I have the other high bucket density rebalance ticket MB-32645

          So, assigning this to me as well.

          poonam Poonam Dhavale added a comment - I have the other high bucket density rebalance ticket  MB-32645 So, assigning this to me as well.
          poonam Poonam Dhavale made changes -
          Assignee Ajit Yagaty [ ajit.yagaty ] Poonam Dhavale [ poonam ]
          lynn.straus Lynn Straus added a comment -

          setting initial fix version to Mad Hatter so that investigation occurs in MH timeframe.  Please update fix version once investigation completes.

          lynn.straus Lynn Straus added a comment - setting initial fix version to Mad Hatter so that investigation occurs in MH timeframe.  Please update fix version once investigation completes.
          lynn.straus Lynn Straus made changes -
          Fix Version/s Mad-Hatter [ 15037 ]

          Different types of rebalances can take different amount of time as they may be moving different # of vbuckets.
           
          The description does not say how many nodes were involved in the swap, rebalance-in and rebalance-out. That will tell us how many vBuckets are being moved during each.
           
          Also, the links to the logs are not working.
           
          E.g. Check out the weekly results from the tests run by perf team below.
           
          4 > 4 Swap  took 63.9 mins where as rebalance-in (4>5) took 37.2 and rebalance out (5->4) took 34.4. 
           
          swap took ~70% more time than the above rebalance-in and rebalance-out.
           
          4->4 swap rebalance moves 512 vBuckets where as 4->5 rebalance-in and 5->4 rebalance out move only 408 vBuckets each. 
           
          http://showfast.sc.couchbase.com/#/timeline/Linux/reb/kv/DGM 
           
          Please specify the # of nodes involved during each rebalance.

          poonam Poonam Dhavale added a comment - Different types of rebalances can take different amount of time as they may be moving different # of vbuckets.   The description does not say how many nodes were involved in the swap, rebalance-in and rebalance-out. That will tell us how many vBuckets are being moved during each.   Also, the links to the logs are not working.   E.g. Check out the weekly results from the tests run by perf team below.   4 > 4 Swap  took 63.9 mins where as rebalance-in (4 >5) took 37.2 and rebalance out (5->4) took 34.4.    swap took ~70% more time than the above rebalance-in and rebalance-out.   4->4 swap rebalance moves 512 vBuckets where as 4->5 rebalance-in and 5->4 rebalance out move only 408 vBuckets each.    http://showfast.sc.couchbase.com/#/timeline/Linux/reb/kv/DGM     Please specify the # of nodes involved during each rebalance.
          poonam Poonam Dhavale made changes -
          Assignee Poonam Dhavale [ poonam ] Mahesh Mandhare [ mahesh.mandhare ]
          mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
          Description Build 6.0.0-1693

          Observed that one node swap rebalance time is ~50-60% higher than one node rebalance in/out time.

          First one node swap rebalance time: 230 min
          Second one node swap rebalance time: 295 min
          One node rebalance in time: 173 min
          One node rebalance out time: 176 min


          Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/249
          Logs-
          KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.12.zip
          KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.13.zip
          KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.14.zip
          Build 6.0.0-1693

          Observed that one node swap rebalance time is ~50-60% higher than one node rebalance in/out time.

          First one node swap rebalance(3 -> 3) time: 230 min
          Second one node swap rebalance(3 -> 3) time: 295 min
          One node rebalance in(3 -> 4) time: 173 min
          One node rebalance out(4 -> 3) time: 176 min


          Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/249
          Logs-
          KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.12.zip
          KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.13.zip
          KV node- https://s3.amazonaws.com/bugdb/jira/index_reb_multibucket/collectinfo-2019-01-08T151840-ns_1%40172.23.97.14.zip

          Poonam Dhavale , updated description with #nodes involved.

          mahesh.mandhare Mahesh Mandhare (Inactive) added a comment - Poonam Dhavale , updated description with #nodes involved.

           

          3 -> 3 swap rebalance moves 680 vBuckets whereas rebalance in(3 -> 4) and  rebalance out(4 -> 3) each move 512 vBuckets.

          So, it is expected 3 -> 3 swap to take longer than rebalance in(3 -> 4) and  rebalance out(4 -> 3).

           

           Is it expected to that swap will always take 50-60% longer? No, that depends on various other factors such as amount of data in the bucket, the load on the system, cpu and other resources available etc.

          In the showfast link I had posted earlier, swap took ~70% longer. That was for DGM.

          Here is the showfast link for rebalance sanity tests run by perf team. swap takes around ~41 - 55% longer.

          http://showfast.sc.couchbase.com/#/timeline/Linux/reb/kv/Sanity 

           

          Given that the observations in this ticket (swap rebalance time is ~50-60% higher) are within the range seen during the weekly perf tests, I think, we can close this ticket.

           

           

          poonam Poonam Dhavale added a comment -   3 -> 3 swap rebalance moves 680 vBuckets whereas rebalance in(3 -> 4) and  rebalance out(4 -> 3) each move 512 vBuckets. So, it is expected 3 -> 3 swap to take longer than rebalance in(3 -> 4) and  rebalance out(4 -> 3).    Is it expected to that swap will always take 50-60% longer? No, that depends on various other factors such as amount of data in the bucket, the load on the system, cpu and other resources available etc. In the showfast link I had posted earlier, swap took ~70% longer. That was for DGM. Here is the showfast link for rebalance sanity tests run by perf team. swap takes around ~41 - 55% longer. http://showfast.sc.couchbase.com/#/timeline/Linux/reb/kv/Sanity     Given that the observations in this ticket (swap rebalance time is ~50-60% higher) are within the range seen during the weekly perf tests, I think, we can close this ticket.    

          Thanks Poonam for investigating this issue. 

          It still is cause for concern that moving 25% more vbuckets takes 50% more time. I understand that this is what we see in the weekly perf tests as well. Feels like there is room for improvement here.

          We don't have to commit to fixing it immediately but can we investigate further to nail down the root cause?

          shivani.gupta Shivani Gupta added a comment - Thanks Poonam for investigating this issue.  It still is cause for concern that moving 25% more vbuckets takes 50% more time. I understand that this is what we see in the weekly perf tests as well. Feels like there is room for improvement here. We don't have to commit to fixing it immediately but can we investigate further to nail down the root cause?

           

          Hi Shivani,

          In the 3->3 swap case described above, it is moving 33% more vBuckets. 

          Regarding why it is taking 50-60% more time when it is moving only 33% more vBuckets:

          In addition to moving higher #of vBuckets, swap also has different rebalance characteristic when compared to reb-in/reb-out. This affects the vBucket scheduling logic which also plays a role in how fast a rebalance can go. 

          The vBucket scheduling logic (described in the link below) allows limited # of backfills and moves for nodes that are acting as the old or the new master.

          Consider a 3 node cluster, N0, N1, N2.

          • 3->3 swap rebalance to remove N2 and replace it with N3.
            • 341 active vBuckets will move from N2 to N3. N2 is the old master for all of these.
            • 341 replica vBuckets will move, the master for these is one of N0 or N1.
          • 3->4 rebalance in to add N3:
            • 256 active vBuckets will move to N3. The master for these is one of N0, N1, N2.
            • 256 replica vBuckets will move to N3. The master for these is one of N0, N1, N2.
          • 4 -> 3 rebalance out to remove N3 will have similar characteristics as described above for 3 ->4 rebalance in.

          So, in above swap rebalance, one node (N2) is the old master for majority of the vBuckets (341). 

          Whereas for reb-in & reb-out, the current/old master for vBucket movements are more or less evenly distributed across the 3 nodes. (170 each).

          This affects the order in which vBuckets are moved and how many are moved at a time.

          But, I have added a note to the design doc below to investigate whether we can improve on swap rebalance time. This will be for Cheshire Cat.

          https://docs.google.com/document/d/1pqNY7GufVCyiEk8ikkltyCu-15KtqZKYlzpNSKiV2mI/edit#heading=h.4iy0vndbwik5 

           

           

          poonam Poonam Dhavale added a comment -   Hi Shivani, In the 3->3 swap case described above, it is moving 33% more vBuckets.  Regarding why it is taking 50-60% more time when it is moving only 33% more vBuckets: In addition to moving higher #of vBuckets, swap also has different rebalance characteristic when compared to reb-in/reb-out. This affects the vBucket scheduling logic which also plays a role in how fast a rebalance can go.  The vBucket scheduling logic (described in the link below) allows limited # of backfills and moves for nodes that are acting as the old or the new master. Consider a 3 node cluster, N0, N1, N2. 3->3 swap rebalance to remove N2 and replace it with N3. 341 active vBuckets will move from N2 to N3. N2 is the old master for all of these. 341 replica vBuckets will move, the master for these is one of N0 or N1. 3->4 rebalance in to add N3: 256 active vBuckets will move to N3. The master for these is one of N0, N1, N2. 256 replica vBuckets will move to N3. The master for these is one of N0, N1, N2. 4 -> 3 rebalance out to remove N3 will have similar characteristics as described above for 3 ->4 rebalance in. So, in above swap rebalance, one node (N2) is the old master for majority of the vBuckets (341).  Whereas for reb-in & reb-out, the current/old master for vBucket movements are more or less evenly distributed across the 3 nodes. (170 each). This affects the order in which vBuckets are moved and how many are moved at a time. But, I have added a note to the design doc below to investigate whether we can improve on swap rebalance time. This will be for Cheshire Cat. https://docs.google.com/document/d/1pqNY7GufVCyiEk8ikkltyCu-15KtqZKYlzpNSKiV2mI/edit#heading=h.4iy0vndbwik5      

          Thanks Poonam for patiently explaining in detail. It makes more sense now why swap takes longer.

          Since you are already tracking investigating Cheshire Cat improvement in your document, we can close this ticket.

          shivani.gupta Shivani Gupta added a comment - Thanks Poonam for patiently explaining in detail. It makes more sense now why swap takes longer. Since you are already tracking investigating Cheshire Cat improvement in your document, we can close this ticket.
          mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
          Assignee Mahesh Mandhare [ mahesh.mandhare ] Poonam Dhavale [ poonam ]
          dfinlay Dave Finlay made changes -
          Fix Version/s Cheshire-Cat [ 15915 ]
          Fix Version/s Mad-Hatter [ 15037 ]
          poonam Poonam Dhavale made changes -
          Assignee Poonam Dhavale [ poonam ] Ajit Yagaty [ ajit.yagaty ]
          dfinlay Dave Finlay made changes -
          Fix Version/s CheshireCat.Next [ 16908 ]
          Fix Version/s Cheshire-Cat [ 15915 ]
          meni.hillel Meni Hillel (Inactive) made changes -
          Fix Version/s backlog [ 15925 ]
          Fix Version/s CheshireCat.Next [ 16908 ]

          People

            ajit.yagaty Ajit Yagaty [X] (Inactive)
            mahesh.mandhare Mahesh Mandhare (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty