Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11349

KV+XDCR System test : Compaction is failing constantly, but reporting success

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Test Blocker
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • CentOS 6.x 8*8 clusters 2 uni-xdcrs
      Each node 15GB RAM, 4cores
    • Untriaged
    • Unknown
    • June 30 - July 18

    Description

      Build
      --------
      3.0.0-786 (xdcr on upr, internal replication on upr)

      Clusters
      -----------
      Source : http://172.23.105.44:8091/
      Destination : http://172.23.105.54:8091/
      The clusters are available to investigate. No urgency to reclaim. Pls let me know if you need me to collect logs.

      Steps
      --------
      1. Load on both clusters till vb_active_resident_items_ratio < 30.
      2. Access phase with 98% gets, 2%sets runs for 3 hours
      3. Rebalance-out 1 node at cluster1 with workload [high dgm ~4%]

      Every attempt to rebalance out one node fails. The last one left 3 nodes in pending state.

      First rebalance-out failed with error:
      -----------------------------------------------
      Many messages like -

      Control connection to memcached on 'ns_1@172.23.105.49' disconnected: {{badmatch,
      {error,
      timeout}},
      [{mc_client_binary,
      stats_recv,
      4,
      [

      {file, "src/mc_client_binary.erl"}, {line, 163}]},
      {mc_client_binary,
      stats,
      4,
      [{file, "src/mc_client_binary.erl"}

      ,

      {line, 411}

      ]},
      {ns_memcached,
      handle_info,
      2,
      [

      {file, "src/ns_memcached.erl"}, {line, 725}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"}

      ,

      {line, 170}

      ]},
      {gen_server,
      init_it,
      6,
      [

      {file, "gen_server.erl"}

      ,

      {line, 304}

      ]},
      {proc_lib,
      init_p_do_apply,
      3,
      [

      {file, "proc_lib.erl"}

      ,

      {line, 239}

      ]}]}

      Subsequent rebalance-out attempts
      -------------------------------------------------
      timeout}} ns_memcached000 ns_1@172.23.105.52 14:20:19 - Fri Jun 6, 2014
      Control connection to memcached on 'ns_1@172.23.105.48' disconnected: {badmatch,
      {error,
      timeout}} ns_memcached000 ns_1@172.23.105.48 14:20:19 - Fri Jun 6, 2014
      Control connection to memcached on 'ns_1@172.23.105.45' disconnected: {badmatch,
      {error,
      timeout}} ns_memcached000 ns_1@172.23.105.45 14:20:19 - Fri Jun 6, 2014
      Rebalance exited with reason

      {not_all_nodes_are_ready_yet, ['ns_1@172.23.105.50']}
      ns_orchestrator002 ns_1@172.23.105.44 14:17:19 - Fri Jun 6, 2014
      Bucket "saslbucket" loaded on node 'ns_1@172.23.105.52' in 0 seconds. ns_memcached000 ns_1@172.23.105.52 14:16:32 - Fri Jun 6, 2014
      Bucket "saslbucket" loaded on node 'ns_1@172.23.105.45' in 0 seconds. ns_memcached000 ns_1@172.23.105.45 14:16:32 - Fri Jun 6, 2014
      Control connection to memcached on 'ns_1@172.23.105.45' disconnected: {badmatch,
      {error,
      timeout}} ns_memcached000 ns_1@172.23.105.45 14:16:32 - Fri Jun 6, 2014
      Control connection to memcached on 'ns_1@172.23.105.52' disconnected: {badmatch,
      {error,
      timeout}} ns_memcached000 ns_1@172.23.105.52 14:16:32 - Fri Jun 6, 2014
      Started rebalancing bucket standardbucket1
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.44','ns_1@172.23.105.45',
      'ns_1@172.23.105.48','ns_1@172.23.105.49',
      'ns_1@172.23.105.50','ns_1@172.23.105.51',
      'ns_1@172.23.105.52'], EjectNodes = ['ns_1@172.23.105.47'], Failed over and being ejected nodes = []; no delta recovery nodes
      Rebalance exited with reason {not_all_nodes_are_ready_yet, ['ns_1@172.23.105.50']}

      Started rebalancing bucket standardbucket1
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.44','ns_1@172.23.105.45',
      'ns_1@172.23.105.48','ns_1@172.23.105.49',
      'ns_1@172.23.105.50','ns_1@172.23.105.51',
      'ns_1@172.23.105.52'], EjectNodes = ['ns_1@172.23.105.47'], Failed over and being ejected nodes = []; no delta recovery nodes

      Pls feel free to close if another similar issue is still open.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty