Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7571

online upgrade 2.0.0 -> 2.0.1 rebalance exited with {unexpected_exit, {'EXIT',<0.2421.1>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<0.2412.1>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}}

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: installer, ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      centos5.7 with 4GB of RAM
    • Sprint:
      PCI Team - Sprint 1

      Description

      steps:
      1. 3 1976 nodes in cluster with 3 different buckets, 1 view in each bucket, start upload data for each bucket through mcsoda:
      ip:10.3.121.112
      ip:10.3.121.114
      ip:10.3.121.115

      2. add 2 139build nodes:10.3.121.116, 10.3.121.117 and remove 10.3.121.114,10.3.121.115 ( swap rebalance)

      rebalance exit with progress ~ 40% when each bucket contains only ~400K at the time

      <0.2363.1> exited with {unexpected_exit,
      {'EXIT',<0.2421.1>,
      {{badmatch,
      [{'EXIT',
      {timeout,

      {gen_server,call, [<0.2412.1>,had_backfill,30000]}

      }}]},
      [

      {ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}

      ]}}}

      also please note, that error message doesn't contain any mention of rebalance ( <0.2363.1> exited with {unexpected_exit)

      I also noticed that during the rebalance stats graphics do not show up on, and statistics for some buckets did not change at all. for example, the number of operations per second for default bucket was always 526 operations per second.
      only when rebalance failed and I stopped mcsoda loading all stats were updated

      1. bella_vms.png
        103 kB
        Andrei Baranouski
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        jin Jin Lim added a comment -

        Per bug scrubs we aren't really sure about the severity of this issue but upgrade it to critical for immediate engineering response on this.

        Show
        jin Jin Lim added a comment - Per bug scrubs we aren't really sure about the severity of this issue but upgrade it to critical for immediate engineering response on this.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Andrei,

        is this 100% reproducible ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - Andrei, is this 100% reproducible ?
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        Build 160

        1976 nodes
        ip:10.3.121.112
        ip:10.3.121.113
        ip:10.3.121.114

        2.0.1-160 nodes
        ip:10.3.121.115
        ip:10.3.121.116

        Starting rebalance, KeepNodes = ['ns_1@10.3.121.112','ns_1@10.3.121.115',
        'ns_1@10.3.121.116'], EjectNodes = ['ns_1@10.3.121.113',
        'ns_1@10.3.121.114']

        This time I got

        Rebalance exited with reason {{

        {linked_process_died,<18357.6680.0>,normal}

        ,
        {gen_server,call,
        ['capi_set_view_manager-sasl',
        {set_vbucket_states,

        https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.112-8091-diag.txt.gz
        https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.116-2182013-1018-diag.zip
        https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.114-2182013-1051-diag.zip
        https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.115-2182013-113-diag.zip
        https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.113-2182013-1056-diag.zip

        Show
        andreibaranouski Andrei Baranouski added a comment - Build 160 1976 nodes ip:10.3.121.112 ip:10.3.121.113 ip:10.3.121.114 2.0.1-160 nodes ip:10.3.121.115 ip:10.3.121.116 Starting rebalance, KeepNodes = ['ns_1@10.3.121.112','ns_1@10.3.121.115', 'ns_1@10.3.121.116'], EjectNodes = ['ns_1@10.3.121.113', 'ns_1@10.3.121.114'] This time I got Rebalance exited with reason {{ {linked_process_died,<18357.6680.0>,normal} , {gen_server,call, ['capi_set_view_manager-sasl', {set_vbucket_states, https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.112-8091-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.116-2182013-1018-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.114-2182013-1051-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.115-2182013-113-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.113-2182013-1056-diag.zip
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        if the error code is different we should file a new bug

        Show
        farshid Farshid Ghods (Inactive) added a comment - if the error code is different we should file a new bug
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        for new issue MB-7771 online upgrade 2.0.0 -> 2.0.1 rebalance exited with {{

        {linked_process_died,<18357.6680.0>,normal}

        , {gen_server,call, ['capi_set_view_manager-sasl', {set_vbucket_states,

        Show
        andreibaranouski Andrei Baranouski added a comment - for new issue MB-7771 online upgrade 2.0.0 -> 2.0.1 rebalance exited with {{ {linked_process_died,<18357.6680.0>,normal} , {gen_server,call, ['capi_set_view_manager-sasl', {set_vbucket_states,

          People

          • Assignee:
            farshid Farshid Ghods (Inactive)
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Agile

                Gerrit Reviews

                There are no open Gerrit changes