Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46246

[Upgrade] - Online upgrade with graceful failover repeatedly fails with Rebalance exited with reason {service_rebalance_failed,eventing, {agent_died,<31275.26925.17>, {lost_connection,

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • 7.0.0
    • Cheshire-Cat
    • eventing
    • 6.6.2-9588 -> 7.0.0-5141
    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Scripts to Repro
      1. Run the 6.6.2 longevity test for 3 days.

      ./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
      

      2. It had 27 nodes at the end of the test.
      3. Added 6 7.0.0(172.23.105.102,172.23.105.62,172.23.106.232,172.23.106.239,172.23.106.37, 172.23.106.246) nodes and removed 6 node from 6.6.2(172.23.110.75,172.23.110.76,172.23.105.61,172.23.106.191,172.23.106.209,172.23.106.70)
      to do a swap rebalance of all the services(1 of each kind).
      4. Failed over 6 nodes - one of which is 172.23.105.29(eventing) all of which were 6.6.2. See . Stopped couchbase server and upgraded to 7.0.0-5141 did a recovery and did a rebalance. It failed as shown below

      Rebalance exited with reason {service_rebalance_failed,eventing,
      {agent_died,<31275.26925.17>,
      {lost_connection,
      {'ns_1@172.23.105.29',shutdown}}}}.
      Rebalance Operation Id = d5decdeac0700bd3bd8609dafc785a5c
      

      Repeatedly retried rebalance. It kept on failing with the same error.

      See https://issues.couchbase.com/browse/MB-46198?focusedCommentId=500912&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-500912 for more details.

      attaching cbcollect_info. This test last passed when we upgraded from 6.6.2-9588 to 7.0.0-5033. Looking for a workaround so that we can get out of this sticky situation.

      It took good 4 days to reach this stage of upgrade.

      172.23.105.102 : rebalance

      [user:error,2021-05-11T07:59:12.657-07:00,ns_1@172.23.105.102:<0.31265.5>:ns_orchestrator:log_rebalance_completion:1405]Rebalance exited with reason {service_rebalance_failed,eventing,
      [user:error,2021-05-11T08:36:07.522-07:00,ns_1@172.23.105.102:<0.31265.5>:ns_orchestrator:log_rebalance_completion:1405]Rebalance exited with reason {service_rebalance_failed,eventing,
      [user:error,2021-05-11T09:00:44.444-07:00,ns_1@172.23.105.102:<0.31265.5>:ns_orchestrator:log_rebalance_completion:1405]Rebalance exited with reason {service_rebalance_failed,eventing,
      

      172.23.105.29 : crash

      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T07:46:53.936-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T07:53:03.145-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T07:59:12.653-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:05:21.990-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:11:31.097-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:17:40.313-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:23:49.122-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:29:58.156-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:36:07.517-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:42:16.661-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:48:25.870-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T08:54:35.387-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T09:00:44.442-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T09:06:54.248-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T09:13:03.517-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T09:19:12.815-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      /opt/couchbase/var/lib/couchbase/logs/info.log:[user:info,2021-05-11T09:25:22.441-07:00,ns_1@172.23.105.29:<0.1059.0>:ns_log:crash_consumption_loop:63]Service 'eventing' exited with status 1. Restarting. Messages:
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ankit.prabhu Ankit Prabhu
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty