Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7554

Rebalance fails with "bad match wait_backfill_determination" error on a very small load

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      2.0.1-125

      Description

      Load 1M items on a 4 node cluster.
      Rebalance in 2 nodes.

      Rebalance and Compaction start in parallel.

      Rebalance is very slow in initial few minutes, catches up, but fails with a timeout exit,

      The load/cluster is a very basic configiuration. This is a working on 2.0

        • Reason for termination ==
        • {unexpected_exit,
          {'EXIT',<0.896.2>,
          {{badmatch,
          [{'EXIT',
          {timeout,
          {gen_server,call, [<20117.4759.0>,had_backfill,30000]}

          }}]},
          [

          {ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}

          ]}}}

      [error_logger:error,2013-01-17T23:11:39.476,ns_1@10.176.169.6:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_vbucket_mover:init/1
      pid: <0.24152.1>
      registered_name: []
      exception exit: {unexpected_exit,
      {'EXIT',<0.896.2>,
      {{badmatch,
      [{'EXIT',
      {timeout,

      {gen_server,call, [<20117.4759.0>,had_backfill,30000]}

      }}]},
      [

      {ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}

      ]}}}
      in function gen_server:terminate/6
      ancestors: [<0.13807.1>]
      messages: [{backfill_done,
      {'ns_1@10.176.169.6',1019,
      ['ns_1@10.176.169.6','ns_1@10.169.54.218'],
      ['ns_1@10.176.155.132','ns_1@10.168.94.60']}},
      {move_done_new_style,
      {'ns_1@10.176.169.6',1019,
      ['ns_1@10.176.169.6','ns_1@10.169.54.218'],
      ['ns_1@10.176.155.132','ns_1@10.168.94.60']}},

      {'EXIT',<0.6884.2>,normal}

      ,
      {backfill_done,
      {'ns_1@10.169.54.218',678,
      ['ns_1@10.169.54.218','ns_1@10.168.173.242'],
      ['ns_1@10.168.94.60','ns_1@10.176.155.132']}},
      {move_done_new_style,
      {'ns_1@10.169.54.218',678,
      ['ns_1@10.169.54.218','ns_1@10.168.173.242'],
      ['ns_1@10.168.94.60','ns_1@10.176.155.132']}},

      {'EXIT',<0.7095.2>,normal}

      ]
      links: [<0.13807.1>,<0.24159.1>,<0.57.0>]
      dictionary: [

      {bucket_name,"default"}

      ,

      {i_am_master_mover,true}

      ,

      {child_processes,[<0.7095.2>,<0.6884.2>,<0.6760.2>, <0.3866.2>,<0.3858.2>,<0.862.2>,<0.855.2>, <0.852.2>,<0.787.2>,<0.26181.1>, <0.26131.1>,<0.26086.1>,<0.24174.1>, <0.24173.1>]}

      ]
      trap_exit: true
      status: running
      heap_size: 28657
      stack_size: 24
      reductions: 1198089

      Logs at

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        ketaki Ketaki Gangal created issue -
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Field Original Value New Value
        Assignee Aleksey Kondratenko [ alkondratenko ] Aliaksey Artamonau [ aliaksey artamonau ]
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        I need diags from other nodes. From 'ns_1@10.176.155.132' in particular.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - I need diags from other nodes. From 'ns_1@10.176.155.132' in particular.
        Aliaksey Artamonau Aliaksey Artamonau made changes -
        Assignee Aliaksey Artamonau [ aliaksey artamonau ] Ketaki Gangal [ ketaki ]
        Hide
        ketaki Ketaki Gangal added a comment -

        The cluster is no longer around.

        Do we have an idea of what is causing these timeouts based off these limited logs?

        Show
        ketaki Ketaki Gangal added a comment - The cluster is no longer around. Do we have an idea of what is causing these timeouts based off these limited logs?
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        No, unfortunately there's not enough information there.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - No, unfortunately there's not enough information there.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        please reopen if this case occurs again

        Show
        farshid Farshid Ghods (Inactive) added a comment - please reopen if this case occurs again
        farshid Farshid Ghods (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Incomplete [ 4 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Sprint Status Current Sprint [ 10027 ]

          People

          • Assignee:
            ketaki Ketaki Gangal
            Reporter:
            ketaki Ketaki Gangal
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes