Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14984

[system tests] Rebalance exited with reason {wait_seqno_persisted_failed,"RevAB"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.0.0
    • 4.0.0
    • XDCR
    • Security Level: Public
    • None
    • 4.0.0-2093
    • Untriaged
    • Centos 64-bit
    • Unknown
    • Mar 9 - Mar 27

    Description

      steps:
      1. 3 nodes in cluster, 4 buckets. run data loader more then a day
      2. setup replication from SRC to cluster DEST for all buckets.
      3. rebalance in at SRC cluster
      rebalance in at DEST cluster
      4. Graceful Fail Over(rebalance) for node in SRC cluster, add back(Delta Recovery)
      5. click failover, Hard Fail Over for node in SRC cluster A, add back(Full Recovery) and rebalance
      6. remove node in SRC cluster, stop rebalance. Cancel removing node and rebalance
      7. rebalance out 1 node on SRC cluster
      8. rebalance out 1 node on DEST cluster
      9. rebalance in 2 nodes on SRC cluster

      result:
      cat info.log | grep -A 30 -B 30 "exited with reason"
      {stats_collector,handle_info,2,
      [

      {file,"src/stats_collector.erl"},
      {line,125}]},
      {gen_server,handle_msg,5,
      [{file,"gen_server.erl"},{line,604}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,239}]}]}

      [stats:error,2015-05-14T9:56:17.628,ns_1@172.23.105.156:<0.4775.0>:stats_collector:handle_info:133]Exception in stats collector: {exit,
      {{error,closed},
      {gen_server,call,
      ['ns_memcached-RevAB',
      {stats,<<>>},
      180000]}},
      [{gen_server,call,3,
      [{file,"gen_server.erl"},{line,188}]
      },
      {ns_memcached,do_call,3,
      [{file,"src/ns_memcached.erl"},
      {line,1425}]},
      {stats_collector,grab_all_stats,1,
      [{file,"src/stats_collector.erl"}

      ,

      {line,84}

      ]},
      {stats_collector,handle_info,2,
      [

      {file,"src/stats_collector.erl"}

      ,

      {line,125}

      ]},
      {gen_server,handle_msg,5,
      [

      {file,"gen_server.erl"}

      ,

      {line,604}

      ]},
      {proc_lib,init_p_do_apply,3,
      [

      {file,"proc_lib.erl"}

      ,

      {line,239}

      ]}]}

      [user:info,2015-05-14T9:56:17.628,ns_1@172.23.105.156:<0.1466.0>:ns_orchestrator:handle_info:482]Rebalance exited with reason {unexpected_exit,
      {'EXIT',<0.16693.67>,
      {wait_seqno_persisted_failed,"RevAB",943,
      25440,
      [{'ns_1@172.23.105.156',
      {'EXIT',
      badmatch,{error,closed,
      {gen_server,call,
      [

      {'janitor_agent-RevAB', 'ns_1@172.23.105.156'}

      ,
      {if_rebalance,<0.9143.66>,
      {wait_seqno_persisted,943,25440}},
      infinity]}}}}]}}}

      [ns_server:warn,2015-05-14T9:56:17.630,ns_1@172.23.105.156:<0.16930.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,

      {error,econnrefused}}}, retrying.
      [ns_server:info,2015-05-14T9:56:17.632,ns_1@172.23.105.156:<0.17531.67>:compaction_new_daemon:spawn_scheduled_kv_compactor:467]Start compaction of vbuckets for bucket RevAB with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:warn,2015-05-14T9:56:17.641,ns_1@172.23.105.156:<0.17088.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,{error,econnrefused}

      }}, retrying.
      [ns_server:warn,2015-05-14T9:56:17.678,ns_1@172.23.105.156:<0.17529.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,

      {error,econnrefused}}}, retrying.
      [ns_server:warn,2015-05-14T9:56:17.678,ns_1@172.23.105.156:<0.17514.67>:ns_memcached:connect:1282]Unable to connect: {error,{badmatch,{error,econnrefused}

      }}, retrying.
      [ns_server:info,2015-05-14T9:56:17.732,ns_1@172.23.105.156:<0.17554.67>:diag_handler:log_all_tap_and_checkpoint_stats:130]logging tap & checkpoint stats
      [user:info,2015-05-14T9:56:17.790,ns_1@172.23.105.156:<0.1500.0>:ns_log:crash_consumption_loop:70]Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 137. Restarting. Messages: 2015-05-14T09:55:06.527524-07:00 WARNING (AbRegNums) Backfill task (1 to 1154) cancelled for vb 769
      2015-05-14T09:55:06.527555-07:00 WARNING (AbRegNums) Backfill task (983 to 1104) cancelled for vb 451
      2015-05-14T09:55:06.527585-07:00 WARNING (AbRegNums) Backfill task (992 to 1092) cancelled for vb 468
      2015-05-14T09:55:06.527615-07:00 WARNING (AbRegNums) Backfill task (1059 to 1158) cancelled for vb 482
      2015-05-14T09:55:06.527643-07:00 WARNING (AbRegNums) Backfill task (1 to 1109) cancelled for vb 831
      [stats:warn,2015-05-14T9:56:18.704,ns_1@172.23.105.156:<0.4855.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
      [stats:warn,2015-05-14T9:56:18.705,ns_1@172.23.105.156:<0.4811.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
      [stats:warn,2015-05-14T9:56:18.705,ns_1@172.23.105.156:<0.4781.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks
      [stats:warn,2015-05-14T9:56:18.706,ns_1@172.23.105.156:<0.5298.0>:goxdcr_stats_collector:latest_tick:53]Dropped 7 ticks

      please note that MB-14983 created for the same run

      will provide collect info soon

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            xiaomei Xiaomei Zhang (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                PagerDuty