Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-33381

update_vbucket_map_history can lose latest maps, causing delta recovery to fail

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      Currently update_vbucket_map_history , works as below

      When we get a newmap being set it checks if that map is present in the history, if so we do nothing(no reordering update the history as it was previously).

      If the map isn't present we add it to the head of the list and delete the tail if the list is above 10. 

       

      Consider a situation where we have a large number of buckets say 9, with different vbucket maps, named bucket-1 to bucket-9. The vbucket_map_history lists is full at 10 past maps. 

      On rebalance,  bucket-1 goes first it tries to set the vbucket_map_history sees that the map is already present in the history, but it is the last one on the list, we do nothing(no reordering update the history as it was previously) in this case.

      The second bucket-2 rebalance runs and we discover that it has a new map we set the map in vbucket_map_history at the expense of bucket-1's latest vbucket map. The issue here is since the vbucket_map_history is not per bucket, it is per cluster, and we don't reorder according to latest map. 

       

      So after successfully rebalance we have lost potentially bucket-1's latest vbucket map. 

       

      Assume now we failover a node and add it back for delta-recovery, we cannot recover bucket-1 because we have lost that map. 

       

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-33381
          # Subject Branch Project Status CR V

          Activity

             

             In update_vbucket_map_history(), if NewEntry is member of the History then we should move it to the head of that list.

            However, this is not a complete solution. 

            Consider there are 10 buckets - each with slightly different map.

            During rebalance, the vbucket map history is updated with the fast forward map. If rebalance fails after updating the history, then fast forward map of a bucket could kick out existing map for the same or another bucket.

            We need enough slots in the vbucket map history. This is MB-33365 - however that says for certifying > 10 buckets.

            I think, we need to  fix MB-33365 even for 10 buckets to handle issues like above.

             

            This is just off the top of my head. I have not thought thru this.

             

            poonam Poonam Dhavale added a comment -     In update_vbucket_map_history (), if NewEntry is member of the History then we should move it to the head of that list. However, this is not a complete solution.  Consider there are 10 buckets - each with slightly different map. During rebalance, the vbucket map history is updated with the fast forward map. If rebalance fails after updating the history, then fast forward map of a bucket could kick out existing map for the same or another bucket. We need enough slots in the vbucket map history. This is MB-33365 - however that says for certifying > 10 buckets. I think, we need to  fix MB-33365 even for 10 buckets to handle issues like above.   This is just off the top of my head. I have not thought thru this.  

            Build couchbase-server-6.5.0-3595 contains ns_server commit c77abde with commit message:
            MB-33381,MB-33365: Don't lose vbmaps required for delta recovery

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3595 contains ns_server commit c77abde with commit message: MB-33381 , MB-33365 : Don't lose vbmaps required for delta recovery

            People

              Abhijeeth.Nuthan Abhijeeth Nuthan
              Abhijeeth.Nuthan Abhijeeth Nuthan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty