Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49079

CBSE: Do not fail Rebalance for merge replica counter failure

    XMLWordPrintable

Details

    • 1

    Description

      [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by Varun Velamuri from CBSE-10499.
      MB-47874 was also created for the same problem from same CBSE, aiming to find and fix the root cause, while current MB is aimed at making Rebalance not fail even if this bug gets hit.]

      If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.

      From CBSE-10499 description, which is the source of the original error:

      Background and Analysis:

       Analytics Nodes were added on `2021-07-27`, Rebalance is failing since then:

      2021-07-27T10:08:18.443000+0100 10.114.141.4 added node 10.114.141.16
       
      2021-07-27T10:14:59.570000+0100 10.114.141.4 added node 10.114.141.17
       

       Rebalance is failing with below error:

      [ns_server:error,2021-07-27T13:19:53.907+01:00,ns_1@10.114.141.10:service_status_keeper-index<0.725.0>:service_status_keeper:handle_cast:119]Service service_index returned incorrect status [ns_server:error,2021-07-27T13:22:33.398+01:00,ns_1@10.114.141.10:service_agent-index<0.669.0>:service_agent:handle_info:287]Rebalancer <13523.17251.1> died unexpectedly: {worker_died, {'EXIT',<13523.17280.1>, {rebalance_failed, {service_error, <<"Unable to read index layout from cluster 127.0.0.1:8091. err = Cannot merge counter with different base values">>}}}} ..
       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            kevin.cherkauer Kevin Cherkauer created issue -
            kevin.cherkauer Kevin Cherkauer made changes -
            Field Original Value New Value
            Link This issue Clones MB-47873 [ MB-47873 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Link This issue relates to MB-48497 [ MB-48497 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Link This issue relates to CBSE-10499 [ CBSE-10499 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Description There are multiple phases in index rebalance which can fail with a variety of errors. Current behaviour is to fail rebalance for any error encountered during any of the phases. As failing index service rebalance can block system wide progress, it is not a good idea to fail rebalance for every error.

            This is a blanket ticket with the goal of:

            a. Investigating all possible errors that can be encountered during rebalance

            b. Identify the error cases where index service can continue the rebalance without failing it - may be by taking a work-around path. E.g., if merging replica counter fails, indexer can  avoid replica repair for the definition on which merge failed, rather than failing rebalance
            [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri]]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.
            kevin.cherkauer Kevin Cherkauer made changes -
            Summary Do not fail Rebalance for merge replica counter failure CBSE: Do not fail Rebalance for merge replica counter failure
            kevin.cherkauer Kevin Cherkauer made changes -
            Description [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri]]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.
            [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri] from CBSE-10499.]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.
            kevin.cherkauer Kevin Cherkauer made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Link This issue relates to MB-47874 [ MB-47874 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Description [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri] from CBSE-10499.]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.
            [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri] from CBSE-10499.]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.

            From CBSE-10499 description, which is the source of the original error:

            +*Background and Analysis:*+

             Analytics Nodes were added on `2021-07-27`, Rebalance is failing since then:
            |{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T10:{color}{color:#009900}08{color}{color:#000000}:{color}{color:#009900}18.443000{color}{color:#000000}+{color}{color:#009900}0100{color}{color:#000000} {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.4{color}{color:#000000} added node {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.16{color}|
            | |
            |{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T10:{color}{color:#009900}14{color}{color:#000000}:{color}{color:#009900}59.570000{color}{color:#000000}+{color}{color:#009900}0100{color}{color:#000000} {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.4{color}{color:#000000} added node {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.17{color}|
            | |

             Rebalance is failing with below error:
            |{color:#000000}[ns_server:error,{color}{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T13:{color}{color:#009900}19{color}{color:#000000}:{color}{color:#009900}53.907{color}{color:#000000}+{color}{color:#009900}01{color}{color:#000000}:{color}{color:#009900}00{color}{color:#000000},ns_1{color}{color:#808080}@10{color}{color:#000000}.114.{color}{color:#009900}141.10{color}{color:#000000}:service_status_keeper-index<{color}{color:#009900}0.725{color}{color:#000000}.{color}{color:#009900}0{color}{color:#000000}>:service_status_keeper:handle_cast:{color}{color:#009900}119{color}{color:#000000}]Service service_index returned incorrect status [ns_server:error,{color}{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T13:{color}{color:#009900}22{color}{color:#000000}:{color}{color:#009900}33.398{color}{color:#000000}+{color}{color:#009900}01{color}{color:#000000}:{color}{color:#009900}00{color}{color:#000000},ns_1{color}{color:#808080}@10{color}{color:#000000}.114.{color}{color:#009900}141.10{color}{color:#000000}:service_agent-index<{color}{color:#009900}0.669{color}{color:#000000}.{color}{color:#009900}0{color}{color:#000000}>:service_agent:handle_info:{color}{color:#009900}287{color}{color:#000000}]Rebalancer <{color}{color:#009900}13523.17251{color}{color:#000000}.{color}{color:#009900}1{color}{color:#000000}> died unexpectedly: {worker_died, {{color}{color:#0000FF}'EXIT'{color}{color:#000000},<{color}{color:#009900}13523.17280{color}{color:#000000}.{color}{color:#009900}1{color}{color:#000000}>, {rebalance_failed, {service_error, <<{color}{color:#0000FF}"Unable to read index layout from cluster 127.0.0.1:8091. err = Cannot merge counter with different base values"{color}{color:#000000}>>}}}} ..{color}|
            | |
            kevin.cherkauer Kevin Cherkauer made changes -
            Description [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri] from CBSE-10499.]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.

            From CBSE-10499 description, which is the source of the original error:

            +*Background and Analysis:*+

             Analytics Nodes were added on `2021-07-27`, Rebalance is failing since then:
            |{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T10:{color}{color:#009900}08{color}{color:#000000}:{color}{color:#009900}18.443000{color}{color:#000000}+{color}{color:#009900}0100{color}{color:#000000} {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.4{color}{color:#000000} added node {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.16{color}|
            | |
            |{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T10:{color}{color:#009900}14{color}{color:#000000}:{color}{color:#009900}59.570000{color}{color:#000000}+{color}{color:#009900}0100{color}{color:#000000} {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.4{color}{color:#000000} added node {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.17{color}|
            | |

             Rebalance is failing with below error:
            |{color:#000000}[ns_server:error,{color}{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T13:{color}{color:#009900}19{color}{color:#000000}:{color}{color:#009900}53.907{color}{color:#000000}+{color}{color:#009900}01{color}{color:#000000}:{color}{color:#009900}00{color}{color:#000000},ns_1{color}{color:#808080}@10{color}{color:#000000}.114.{color}{color:#009900}141.10{color}{color:#000000}:service_status_keeper-index<{color}{color:#009900}0.725{color}{color:#000000}.{color}{color:#009900}0{color}{color:#000000}>:service_status_keeper:handle_cast:{color}{color:#009900}119{color}{color:#000000}]Service service_index returned incorrect status [ns_server:error,{color}{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T13:{color}{color:#009900}22{color}{color:#000000}:{color}{color:#009900}33.398{color}{color:#000000}+{color}{color:#009900}01{color}{color:#000000}:{color}{color:#009900}00{color}{color:#000000},ns_1{color}{color:#808080}@10{color}{color:#000000}.114.{color}{color:#009900}141.10{color}{color:#000000}:service_agent-index<{color}{color:#009900}0.669{color}{color:#000000}.{color}{color:#009900}0{color}{color:#000000}>:service_agent:handle_info:{color}{color:#009900}287{color}{color:#000000}]Rebalancer <{color}{color:#009900}13523.17251{color}{color:#000000}.{color}{color:#009900}1{color}{color:#000000}> died unexpectedly: {worker_died, {{color}{color:#0000FF}'EXIT'{color}{color:#000000},<{color}{color:#009900}13523.17280{color}{color:#000000}.{color}{color:#009900}1{color}{color:#000000}>, {rebalance_failed, {service_error, <<{color}{color:#0000FF}"Unable to read index layout from cluster 127.0.0.1:8091. err = Cannot merge counter with different base values"{color}{color:#000000}>>}}}} ..{color}|
            | |
            [Split out of MB-47873 per 2021-10-20 GSI scrum. Originally opened by [~varun.velamuri] from CBSE-10499.
            MB-47874 was also created for the same problem from same CBSE, aiming to find and fix the root cause, while current MB is aimed at making Rebalance not fail even if this bug gets hit.]

            If merging replica counter fails, indexer can avoid replica repair for the definition on which merge failed, rather than failing rebalance.

            From CBSE-10499 description, which is the source of the original error:

            +*Background and Analysis:*+

             Analytics Nodes were added on `2021-07-27`, Rebalance is failing since then:
            |{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T10:{color}{color:#009900}08{color}{color:#000000}:{color}{color:#009900}18.443000{color}{color:#000000}+{color}{color:#009900}0100{color}{color:#000000} {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.4{color}{color:#000000} added node {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.16{color}|
            | |
            |{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T10:{color}{color:#009900}14{color}{color:#000000}:{color}{color:#009900}59.570000{color}{color:#000000}+{color}{color:#009900}0100{color}{color:#000000} {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.4{color}{color:#000000} added node {color}{color:#009900}10.114{color}{color:#000000}.{color}{color:#009900}141.17{color}|
            | |

             Rebalance is failing with below error:
            |{color:#000000}[ns_server:error,{color}{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T13:{color}{color:#009900}19{color}{color:#000000}:{color}{color:#009900}53.907{color}{color:#000000}+{color}{color:#009900}01{color}{color:#000000}:{color}{color:#009900}00{color}{color:#000000},ns_1{color}{color:#808080}@10{color}{color:#000000}.114.{color}{color:#009900}141.10{color}{color:#000000}:service_status_keeper-index<{color}{color:#009900}0.725{color}{color:#000000}.{color}{color:#009900}0{color}{color:#000000}>:service_status_keeper:handle_cast:{color}{color:#009900}119{color}{color:#000000}]Service service_index returned incorrect status [ns_server:error,{color}{color:#009900}2021{color}{color:#000000}-{color}{color:#009900}07{color}{color:#000000}-27T13:{color}{color:#009900}22{color}{color:#000000}:{color}{color:#009900}33.398{color}{color:#000000}+{color}{color:#009900}01{color}{color:#000000}:{color}{color:#009900}00{color}{color:#000000},ns_1{color}{color:#808080}@10{color}{color:#000000}.114.{color}{color:#009900}141.10{color}{color:#000000}:service_agent-index<{color}{color:#009900}0.669{color}{color:#000000}.{color}{color:#009900}0{color}{color:#000000}>:service_agent:handle_info:{color}{color:#009900}287{color}{color:#000000}]Rebalancer <{color}{color:#009900}13523.17251{color}{color:#000000}.{color}{color:#009900}1{color}{color:#000000}> died unexpectedly: {worker_died, {{color}{color:#0000FF}'EXIT'{color}{color:#000000},<{color}{color:#009900}13523.17280{color}{color:#000000}.{color}{color:#009900}1{color}{color:#000000}>, {rebalance_failed, {service_error, <<{color}{color:#0000FF}"Unable to read index layout from cluster 127.0.0.1:8091. err = Cannot merge counter with different base values"{color}{color:#000000}>>}}}} ..{color}|
            | |
            kevin.cherkauer Kevin Cherkauer made changes -
            Affects Version/s 7.0.0 [ 17233 ]
            Affects Version/s 6.6.0 [ 16787 ]
            kevin.cherkauer Kevin Cherkauer made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            mihir.kamdar Mihir Kamdar (Inactive) made changes -
            Labels request-dev-verify
            kevin.cherkauer Kevin Cherkauer made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              kevin.cherkauer Kevin Cherkauer
              kevin.cherkauer Kevin Cherkauer
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty