Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36719

[Volume] Rebalance Failed with mover crashed.

    XMLWordPrintable

Details

    Description

      Steps to Reproduce:

      1. Create a 4 node cluster.

         +-------------+----------+--------------+
        | Nodes       | Services | Status       |
        +-------------+----------+--------------+
        | 172.23.97.3 | [u'kv']  | Cluster node |
        | 172.23.97.4 | None     | <--- IN ---  |
        | 172.23.97.5 | None     | <--- IN ---  |
        | 172.23.97.6 | None     | <--- IN ---  |
        +-------------+----------+--------------+

      2. Create a bucket with compression=off, eviction policy = valueOnly, replicas = 1.
      3. Load 50M docs in the bucket with durability=MAJORITY. This step was successful.

        +----------------+---------+----------+-----+----------+--------------+--------------+-------------+
        | Bucket         | Type    | Replicas | TTL | Items    | RAM Quota    | RAM Used     | Disk Used   |
        +----------------+---------+----------+-----+----------+--------------+--------------+-------------+
        | GleamBookUsers | membase | 1        | 0   | 50000000 | 431270920192 | 136477644848 | 40167732516 |
        +----------------+---------+----------+-----+----------+--------------+--------------+-------------+ 

      4. Rebalance In 1 node(172.23.97.10) with another 20M updates, 10M creates with durability=MAJORITY in parallel.

      These are performance testing boxes having RAM quota of 101 GB each allocated for data service.

      Rebalance didn't start for 380 seconds and then after Rebalance reached 1%, there was a memcached crash on the node which was under Rebalance In operation(172.23.97.10)

      TImeStamp of Rebalance failure is:

      ns_server.info.log:[ns_server:info,2019-10-30T22:41:35.212-07:00,ns_1@172.23.97.3:rebalance_agent<0.648.0>:rebalance_agent:handle_down:296]Rebalancer process <0.27979.5> died (reason {mover_crashed, 

      This corresponds to the following Memcached Crash on 172.23.97.10 (and hence rebalance failed):

      memcached.log:2019-10-30T22:41:35.131875-07:00 CRITICAL Breakpad caught a crash (Couchbase version 6.5.0-4724). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/237c4e35-8628-fd84-65d6db96-7f49e040.dmp before terminating.
      memcached.log:2019-10-30T22:41:35.131909-07:00 CRITICAL Stack backtrace of crashed thread:
      memcached.log:2019-10-30T22:41:35.132140-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x13138d]
      memcached.log:2019-10-30T22:41:35.132166-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x1491ee]
      memcached.log:2019-10-30T22:41:35.132184-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x149504]
      memcached.log:2019-10-30T22:41:35.132203-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fc02423e000+0xf370]
      memcached.log:2019-10-30T22:41:35.132258-07:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7fc023e7d000+0x351d7]
      memcached.log:2019-10-30T22:41:35.132321-07:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7fc023e7d000+0x368c8]
      memcached.log:2019-10-30T22:41:35.132387-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7fc024973000+0x91195]
      memcached.log:2019-10-30T22:41:35.132407-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x144cc2]
      memcached.log:2019-10-30T22:41:35.132447-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc024973000+0x8ef86]
      memcached.log:2019-10-30T22:41:35.132485-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc024973000+0x8efd1]
      memcached.log:2019-10-30T22:41:35.132520-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc024973000+0x8f213]
      memcached.log:2019-10-30T22:41:35.132549-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0x58746]
      memcached.log:2019-10-30T22:41:35.132566-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0xd95d3]
      memcached.log:2019-10-30T22:41:35.132579-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0xdadc8]
      memcached.log:2019-10-30T22:41:35.132597-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0x1915c7]
      memcached.log:2019-10-30T22:41:35.132610-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0xe5915]
      memcached.log:2019-10-30T22:41:35.132623-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0x13745e]
      memcached.log:2019-10-30T22:41:35.132635-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0x137a31]
      memcached.log:2019-10-30T22:41:35.132646-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fc01e204000+0x131594]
      memcached.log:2019-10-30T22:41:35.132657-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fc02681c000+0x8f27]
      memcached.log:2019-10-30T22:41:35.132670-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fc02423e000+0x7dc5]
      memcached.log:2019-10-30T22:41:35.132734-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7fc023e7d000+0xf776d] 

      Error Messages:

      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {‘EXIT’,<0.26837.6>,
      {{{badmatch,{error,closed}},
      [{mc_client_binary,cmd_vocal_recv,5,
      [{file,“src/mc_client_binary.erl”},
      {line,155}]},
      {mc_client_binary,
      wait_for_seqno_persistence,3,
      [{file,“src/mc_client_binary.erl”},
      {line,696}]},
      {ns_memcached,
      ‘-wait_for_seqno_persistence/3-fun-0-‘,3,
      [{file,“src/ns_memcached.erl”},
      {line,1272}]},
      {ns_memcached,
      ‘-perform_very_long_call/3-fun-0-‘,2,
      [{file,“src/ns_memcached.erl”},
      {line,344}]},
      {ns_memcached_sockets_pool,
      ‘-executing_on_socket/3-fun-0-‘,3,
      [{file,
      “src/ns_memcached_sockets_pool.erl”},
      {line,92}]},
      {async,‘-async_init/4-fun-1-’,3,
      [{file,“src/async.erl”},{line,197}]}]},
      {gen_server,call,
      [{‘janitor_agent-GleamBookUsers’,
      ‘ns_1@172.23.97.10’},
      {if_rebalance,<0.28325.5>,
      {update_vbucket_state,511,active,
      undefined,undefined,undefined}},
      infinity]}}}}}.
      Rebalance Operation Id = a7371ab853216cbc3dd2f8df81a329da

      Worker <0.32290.5> (for action {move,{511,
      [‘ns_1@172.23.97.4’,‘ns_1@172.23.97.6’],
      [‘ns_1@172.23.97.10’,‘ns_1@172.23.97.6’],
      []}}) exited with reason {unexpected_exit,
      {‘EXIT’,
      <0.26837.6>,
      {{{badmatch,
      {error,
      closed}},
      [{mc_client_binary,
      cmd_vocal_recv,
      5,
      [{file,
      “src/mc_client_binary.erl”},
      {line,
      155}]},
      {mc_client_binary,
      wait_for_seqno_persistence,
      3,
      [{file,
      “src/mc_client_binary.erl”},
      {line,
      696}]},
      {ns_memcached,
      ‘-wait_for_seqno_persistence/3-fun-0-‘,
      3,
      [{file,
      “src/ns_memcached.erl”},
      {line,
      1272}]},
      {ns_memcached,
      ‘-perform_very_long_call/3-fun-0-‘,
      2,
      [{file,
      “src/ns_memcached.erl”},
      {line,
      344}]},
      {ns_memcached_sockets_pool,
      ‘-executing_on_socket/3-fun-0-‘,
      3,
      [{file,
      “src/ns_memcached_sockets_pool.erl”},
      {line,
      92}]},
      {async,
      ‘-async_init/4-fun-1-’,
      3,
      [{file,
      “src/async.erl”},
      {line,
      197}]}]},
      {gen_server,
      call,
      [{‘janitor_agent-GleamBookUsers’,
      ‘ns_1@172.23.97.10’},
      {if_rebalance,
      <0.28325.5>,
      {update_vbucket_state,
      511,
      active,
      undefined,
      undefined,
      undefined}},
      infinity]}}}}

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              prateek.kumar Prateek Kumar (Inactive)
              prateek.kumar Prateek Kumar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty