Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.7
    • Component/s: Core
    • Security Level: Public
    • Labels:
      None

      Description

      During some failover/rebalance scenarios, it could be the case that no master is responsible for the document. While this should not be the case, it is observed in scenarios where the client may still have an outdated config from somewhere.

      This leads to RuntimExceptions raised, but reconfigure is never actively triggered. In QE tests, this manifests itself in errors during change and rebound.

      While it should be elsewhere investigated how those -1 get in place, checking for this and triggering reconfigure is a safety net for running operations.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        daschl Michael Nitschinger added a comment -

        No fix applied:

        Will show phase timings..
        --------------------------
        Phase statistics for RAMP
        OK/sec: 3619

        OK: 108596
        ERR: 0
        Phase statistics for CHANGE
        OK/sec: 3204

        OK: 999917
        ERR: 141386
        Phase statistics for REBOUND
        OK/sec: 3966

        OK: 357026
        ERR: 26898
        ---------------------

        Fix applied:

        --------------------------
        Phase statistics for RAMP
        OK/sec: 3536

        OK: 106102
        ERR: 0
        Phase statistics for CHANGE
        OK/sec: 1870

        OK: 471474
        ERR: 825
        Phase statistics for REBOUND
        OK/sec: 4063

        OK: 365714
        ERR: 0
        ---------------------

        stester run: /stester -C 127.0.0.1:8050 -i 20devcluster.ini -c failover.Once --vdsw_dvname ddoc/vquery --hdsw_http_threads 5 --grace_after 30 --ept 1 --ramp 30 --num_nodes 2 --hdsw_mc_threads 10 --workload dsw.Hybrid --action_delay 10 --hdsw_cb_threads 10 --action FO_REBALANCE --dsw_timeres 1 -d -o viewlog_3_f.out

        Show
        daschl Michael Nitschinger added a comment - No fix applied: Will show phase timings.. -------------------------- Phase statistics for RAMP OK/sec: 3619 OK: 108596 ERR: 0 Phase statistics for CHANGE OK/sec: 3204 OK: 999917 ERR: 141386 Phase statistics for REBOUND OK/sec: 3966 OK: 357026 ERR: 26898 --------------------- Fix applied: -------------------------- Phase statistics for RAMP OK/sec: 3536 OK: 106102 ERR: 0 Phase statistics for CHANGE OK/sec: 1870 OK: 471474 ERR: 825 Phase statistics for REBOUND OK/sec: 4063 OK: 365714 ERR: 0 --------------------- stester run: /stester -C 127.0.0.1:8050 -i 20devcluster.ini -c failover.Once --vdsw_dvname ddoc/vquery --hdsw_http_threads 5 --grace_after 30 --ept 1 --ramp 30 --num_nodes 2 --hdsw_mc_threads 10 --workload dsw.Hybrid --action_delay 10 --hdsw_cb_threads 10 --action FO_REBALANCE --dsw_timeres 1 -d -o viewlog_3_f.out
        Show
        daschl Michael Nitschinger added a comment - http://review.couchbase.org/#/c/26636/
        Hide
        daschl Michael Nitschinger added a comment -

        Note that before this change, the RuntimeException bubbled up to the userlevel, blocked everything there - but more importantly, cf.checkConfigUpdate(); never got triggered!

        Show
        daschl Michael Nitschinger added a comment - Note that before this change, the RuntimeException bubbled up to the userlevel, blocked everything there - but more importantly, cf.checkConfigUpdate(); never got triggered!
        Hide
        daschl Michael Nitschinger added a comment -

        This also improves this scenario run:

        Effective stester command line
        -C 127.0.0.1:8050 \
        -i 20devcluster.ini \
        -c failover.Once \
        --vdsw_dvname ddoc/vquery \
        --hdsw_http_threads 5 \
        --grace_after 30 \
        --ept 1 \
        --ramp 30 \
        --num_nodes 2 \
        --hdsw_mc_threads 10 \
        --workload dsw.Hybrid \
        --action_delay 10 \
        --hdsw_cb_threads 10 \
        --action FO_REBALANCE \
        --dsw_timeres 1 \
        -d \

        --------------------------
        Phase statistics for RAMP
        OK/sec: 3057

        OK: 91713
        ERR: 0
        Phase statistics for CHANGE
        OK/sec: 3172

        OK: 957997
        ERR: 187000
        Phase statistics for REBOUND
        OK/sec: 4108

        OK: 369758
        ERR: 63598
        ---------------------

        After

        Will show phase timings..
        --------------------------
        Phase statistics for RAMP
        OK/sec: 3453

        OK: 103594
        ERR: 0
        Phase statistics for CHANGE
        OK/sec: 2346

        OK: 731968
        ERR: 549
        Phase statistics for REBOUND
        OK/sec: 4064

        OK: 365817
        ERR: 0
        ---------------------

        Show
        daschl Michael Nitschinger added a comment - This also improves this scenario run: Effective stester command line -C 127.0.0.1:8050 \ -i 20devcluster.ini \ -c failover.Once \ --vdsw_dvname ddoc/vquery \ --hdsw_http_threads 5 \ --grace_after 30 \ --ept 1 \ --ramp 30 \ --num_nodes 2 \ --hdsw_mc_threads 10 \ --workload dsw.Hybrid \ --action_delay 10 \ --hdsw_cb_threads 10 \ --action FO_REBALANCE \ --dsw_timeres 1 \ -d \ -------------------------- Phase statistics for RAMP OK/sec: 3057 OK: 91713 ERR: 0 Phase statistics for CHANGE OK/sec: 3172 OK: 957997 ERR: 187000 Phase statistics for REBOUND OK/sec: 4108 OK: 369758 ERR: 63598 --------------------- After Will show phase timings.. -------------------------- Phase statistics for RAMP OK/sec: 3453 OK: 103594 ERR: 0 Phase statistics for CHANGE OK/sec: 2346 OK: 731968 ERR: 549 Phase statistics for REBOUND OK/sec: 4064 OK: 365817 ERR: 0 ---------------------

          People

          • Assignee:
            daschl Michael Nitschinger
            Reporter:
            daschl Michael Nitschinger
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes