Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.5.0
    • Security Level: Public
    • Labels:
      None
    • Environment:

      Description

      The code attached should be built with the following command:

      cc -lcouchbase vb.c

      and run like this:

      ./a.out foo 1024 172.16.32.139:8091

      foo - id of the document
      1024 - size of the value
      172.16.32.139:8091 - entry point

      After that it will store single key into the cluster, create design document with the view and start continuously fetching its result.

      While this running try to add and remove the node from the cluster and it will fail during rebalance:

      This is what i'm observing on the "Logs" tab in the admin console:

      Rebalance exited with reason {{{{badmatch,
      {error,

      {error, <<"Partition 43 not in active nor passive set">>}

      }},
      [

      {capi_set_view_manager,handle_call,3}

      ,

      {gen_server,handle_msg,5}

      ,

      {gen_server,init_it,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['capi_set_view_manager-default',

      {wait_index_updated,43}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@172.16.16.152'}

      ,

      {if_rebalance,<0.4811.1>,initiate_indexing}

      ,
      infinity]}}

      1. ns-diag-20130925152852.txt.gz
        4.95 MB
        Aleksey Kondratenko
      2. ns-diag-20130925152905.txt.gz
        3.76 MB
        Aleksey Kondratenko
      3. ns-diag-20130925161642.txt.gz
        17.67 MB
        Aleksey Kondratenko
      4. vb.c
        8 kB
        Sergey Avseyev

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          sundar Sundar Sridharan added a comment -

          Assigning to Sergey for help on how to reproduce with the vb.c script?

          Show
          sundar Sundar Sridharan added a comment - Assigning to Sergey for help on how to reproduce with the vb.c script?
          Hide
          sundar Sundar Sridharan added a comment - - edited

          Root cause: It is the flusher which notifies when a checkpoint is persisted. Now if the flusher operation times out due to a slow disk on heavy disk load, then memcached returns ENGINE_TMPFAIL back to ns-server. However this command does not clear up the state inside ep-engine for that connection. As a result, when the next checkpoint persistence request appears, ep-engine wrongly assumes that the persistence was already completed, and returns SUCCESS early.
          Fix is to clear up the engine-specific state when returning TMPFAIL. Present at …
          http://review.couchbase.org/#/c/30677/

          Show
          sundar Sundar Sridharan added a comment - - edited Root cause: It is the flusher which notifies when a checkpoint is persisted. Now if the flusher operation times out due to a slow disk on heavy disk load, then memcached returns ENGINE_TMPFAIL back to ns-server. However this command does not clear up the state inside ep-engine for that connection. As a result, when the next checkpoint persistence request appears, ep-engine wrongly assumes that the persistence was already completed, and returns SUCCESS early. Fix is to clear up the engine-specific state when returning TMPFAIL. Present at … http://review.couchbase.org/#/c/30677/
          Hide
          chiyoung Chiyoung Seo added a comment -

          The fix was merged into both 2.5.0 and master branch.

          Show
          chiyoung Chiyoung Seo added a comment - The fix was merged into both 2.5.0 and master branch.
          Hide
          iryna iryna added a comment -

          verified 2.5.0-954 build

          Show
          iryna iryna added a comment - verified 2.5.0-954 build
          Hide
          iryna iryna added a comment -

          build 2.5.0-1015: MB-9800 opened

          Show
          iryna iryna added a comment - build 2.5.0-1015: MB-9800 opened

            People

            • Assignee:
              iryna iryna
              Reporter:
              avsej Sergey Avseyev
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes