Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45969

[Windows] Rebalance failed due to service_rebalance_failed, eventing, worker died

    XMLWordPrintable

Details

    Description

      Build: 7.0.0-5017

      Scenario:

      • Multi node cluster with all index, n1ql,cbas,eventing,fts services deployed
      • Running various rebalances in parallel to constant data load
      • Deploying eventing functions when rebalance_in is been triggered
      • Rebalance failed during rebalance_in operation of kv node

        +----------------+---------------+-----------------------+---------------+--------------+
        | Nodes          | Services      | Version               | CPU           | Status       |
        +----------------+---------------+-----------------------+---------------+--------------+
        | 172.23.136.114 | index, n1ql   | 7.0.0-5017-enterprise | 14.5080915318 | Cluster node |
        | 172.23.136.106 | kv            | 7.0.0-5017-enterprise | 95.3309016852 | Cluster node |
        | 172.23.138.127 | cbas          | 7.0.0-5017-enterprise | 23.905        | Cluster node |
        | 172.23.136.108 | kv            | 7.0.0-5017-enterprise | 94.7126321842 | Cluster node |
        | 172.23.136.112 | backup        | 7.0.0-5017-enterprise | 1.21166666667 | Cluster node |
        | 172.23.136.115 | eventing, fts | 7.0.0-5017-enterprise | 8.803946568   | Cluster node |
        | 172.23.136.113 | index, n1ql   | 7.0.0-5017-enterprise | 13.9064348928 | Cluster node |
        | 172.23.136.110 | kv            | 7.0.0-5017-enterprise | 92.4732706106 | Cluster node |
        | 172.23.136.105 | kv            | 7.0.0-5017-enterprise | 94.7908767427 | Cluster node |
        | 172.23.136.107 | ['kv']        |                       |               | <--- IN ---  |
        +----------------+---------------+-----------------------+---------------+--------------+
        

      Observation:

      Seeing eventing rebalance failed with reason,

      "Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT"

      Rebalance exited with reason {service_rebalance_failed,eventing,
      {worker_died,
      {'EXIT',<0.20190.18>,
      {{badmatch,
      {error,
      {bad_nodes,eventing,prepare_rebalance,
      [{'ns_1@172.23.136.115',
      {error,
      {unknown_error,
      <<"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]">>}}}]}}},
      [{service_rebalancer,rebalance_worker,1,
      [{file,"src/service_rebalancer.erl"},
      {line,158}]},
      {proc_lib,init_p,3,
      [{file,"proc_lib.erl"},{line,234}]}]}}}}.
      Rebalance Operation Id = e197b89281485206bb3f29fba4e1f1ca

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          ashwin.govindarajulu Ashwin Govindarajulu created issue -
          ashwin.govindarajulu Ashwin Govindarajulu made changes -
          Field Original Value New Value
          Attachment rebalance_failure_test.log [ 137773 ]
          ashwin.govindarajulu Ashwin Govindarajulu made changes -
          Description  

          *Build*: 7.0.0-5017

          *Scenario*:
           * Multi node cluster with all index, n1ql,cbas,eventing,fts services deployed
           * Running various rebalances in parallel to constant data load
           * Rebalance failed during final rebalance_in operation of kv node
          {noformat}
          +----------------+---------------+-----------------------+---------------+--------------+
          | Nodes | Services | Version | CPU | Status |
          +----------------+---------------+-----------------------+---------------+--------------+
          | 172.23.136.114 | index, n1ql | 7.0.0-5017-enterprise | 14.5080915318 | Cluster node |
          | 172.23.136.106 | kv | 7.0.0-5017-enterprise | 95.3309016852 | Cluster node |
          | 172.23.138.127 | cbas | 7.0.0-5017-enterprise | 23.905 | Cluster node |
          | 172.23.136.108 | kv | 7.0.0-5017-enterprise | 94.7126321842 | Cluster node |
          | 172.23.136.112 | backup | 7.0.0-5017-enterprise | 1.21166666667 | Cluster node |
          | 172.23.136.115 | eventing, fts | 7.0.0-5017-enterprise | 8.803946568 | Cluster node |
          | 172.23.136.113 | index, n1ql | 7.0.0-5017-enterprise | 13.9064348928 | Cluster node |
          | 172.23.136.110 | kv | 7.0.0-5017-enterprise | 92.4732706106 | Cluster node |
          | 172.23.136.105 | kv | 7.0.0-5017-enterprise | 94.7908767427 | Cluster node |
          | 172.23.136.107 | ['kv'] | | | <--- IN --- |
          +----------------+---------------+-----------------------+---------------+--------------+
          {noformat}

          *Observation:*

          Seeing eventing rebalance failed with reason,

          "{{Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT}}*"*
          {noformat}
          Rebalance exited with reason {service_rebalance_failed,eventing,
          {worker_died,
          {'EXIT',<0.20190.18>,
          {{badmatch,
          {error,
          {bad_nodes,eventing,prepare_rebalance,
          [{'ns_1@172.23.136.115',
          {error,
          {unknown_error,
          <<"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]">>}}}]}}},
          [{service_rebalancer,rebalance_worker,1,
          [{file,"src/service_rebalancer.erl"},
          {line,158}]},
          {proc_lib,init_p,3,
          [{file,"proc_lib.erl"},{line,234}]}]}}}}.
          Rebalance Operation Id = e197b89281485206bb3f29fba4e1f1ca{noformat}
           
          *Build*: 7.0.0-5017

          *Scenario*:
           * Multi node cluster with all index, n1ql,cbas,eventing,fts services deployed
           * Running various rebalances in parallel to constant data load
           * Deploying eventing functions when rebalance_in is been triggered
           * Rebalance failed during rebalance_in operation of kv node
          {noformat}+----------------+---------------+-----------------------+---------------+--------------+
          | Nodes | Services | Version | CPU | Status |
          +----------------+---------------+-----------------------+---------------+--------------+
          | 172.23.136.114 | index, n1ql | 7.0.0-5017-enterprise | 14.5080915318 | Cluster node |
          | 172.23.136.106 | kv | 7.0.0-5017-enterprise | 95.3309016852 | Cluster node |
          | 172.23.138.127 | cbas | 7.0.0-5017-enterprise | 23.905 | Cluster node |
          | 172.23.136.108 | kv | 7.0.0-5017-enterprise | 94.7126321842 | Cluster node |
          | 172.23.136.112 | backup | 7.0.0-5017-enterprise | 1.21166666667 | Cluster node |
          | 172.23.136.115 | eventing, fts | 7.0.0-5017-enterprise | 8.803946568 | Cluster node |
          | 172.23.136.113 | index, n1ql | 7.0.0-5017-enterprise | 13.9064348928 | Cluster node |
          | 172.23.136.110 | kv | 7.0.0-5017-enterprise | 92.4732706106 | Cluster node |
          | 172.23.136.105 | kv | 7.0.0-5017-enterprise | 94.7908767427 | Cluster node |
          | 172.23.136.107 | ['kv'] | | | <--- IN --- |
          +----------------+---------------+-----------------------+---------------+--------------+
          {noformat}

          *Observation:*

          Seeing eventing rebalance failed with reason,

          "{{Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT}}*"*
          {noformat}Rebalance exited with reason {service_rebalance_failed,eventing,
          {worker_died,
          {'EXIT',<0.20190.18>,
          {{badmatch,
          {error,
          {bad_nodes,eventing,prepare_rebalance,
          [{'ns_1@172.23.136.115',
          {error,
          {unknown_error,
          <<"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]">>}}}]}}},
          [{service_rebalancer,rebalance_worker,1,
          [{file,"src/service_rebalancer.erl"},
          {line,158}]},
          {proc_lib,init_p,3,
          [{file,"proc_lib.erl"},{line,234}]}]}}}}.
          Rebalance Operation Id = e197b89281485206bb3f29fba4e1f1ca{noformat}
           
          ritam.sharma Ritam Sharma made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          ankit.prabhu Ankit Prabhu made changes -
          Assignee Jeelan Poola [ jeelan.poola ] Ankit Prabhu [ ankit.prabhu ]
          ankit.prabhu Ankit Prabhu made changes -
          Resolution Not a Bug [ 10200 ]
          Status Open [ 1 ] Resolved [ 5 ]
          ashwin.govindarajulu Ashwin Govindarajulu made changes -
          Assignee Ankit Prabhu [ ankit.prabhu ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
          Status Resolved [ 5 ] Closed [ 6 ]
          lynn.straus Lynn Straus made changes -
          Fix Version/s 7.0.0 [ 17233 ]
          lynn.straus Lynn Straus made changes -
          Fix Version/s Cheshire-Cat [ 15915 ]

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty