Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32933

Memcached crash upon KV node rebalance in

    XMLWordPrintable

Details

    Description

      Build : 6.5.0-2237

      Steps to reproduce :
      1. Setup a cluster with 1 node having kv service.
      2. Install travel-sample bucket
      3. Add another kv node to the cluster and start rebalance

      Rebalance fails, and memcached crashes.

      Service 'memcached' exited with status 134. Restarting. Messages:
      2019-02-05T10:45:41.177287-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fd20fb2f000+0xa6326]
      2019-02-05T10:45:41.177311-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0x46) [0x400000+0x29cd6]
      2019-02-05T10:45:41.177328-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fd20fb2f000+0x898bb]
      2019-02-05T10:45:41.177345-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fd20fb2f000+0x60b76]
      2019-02-05T10:45:41.177369-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fd20fb2f000+0xfcb30]
      2019-02-05T10:45:41.177390-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fd215903000+0x91e7]
      2019-02-05T10:45:41.177407-08:00 CRITICAL /lib/x86_64-linux-gnu/libpthread.so.0() [0x7fd213c8d000+0x76ba]
      2019-02-05T10:45:41.177476-08:00 CRITICAL /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd2138c3000+0x10741d]
      [*** LOG ERROR ***] [2019-02-05 10:45:41] [spdlog_file_logger] async log: thread pool doesn't exist anymore
      

      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.10065.0>,
      {{{{{child_interrupted,
      {'EXIT',<0.9212.0>,socket_closed}},
      [{dcp_replicator,spawn_and_wait,1,
      [{file,"src/dcp_replicator.erl"},
      {line,249}]},
      {dcp_replicator,handle_call,3,
      [{file,"src/dcp_replicator.erl"},
      {line,121}]},
      {gen_server,try_handle_call,4,
      [{file,"gen_server.erl"},{line,636}]},
      {gen_server,handle_msg,6,
      [{file,"gen_server.erl"},{line,665}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,247}]}]},
      {gen_server,call,
      [<0.9203.0>,
      {setup_replication,[1020]},
      infinity]}},
      {gen_server,call,
      ['replication_manager-travel-sample',
      {change_vbucket_replication,1020,
      'ns_1@172.23.104.134'},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-travel-sample',
      'ns_1@172.23.104.126'},
      {if_rebalance,<0.8835.0>,
      {update_vbucket_state,989,active,paused,
      undefined}},
      infinity]}}}}} 
      

      This issue is impacting build sanity, hence marked Test Blocker. Last known good build was 6.5.0-2234
      Changes between 6.5.0-2234 and 6.5.0-2237 : http://172.23.123.43:8000/getchangelog?product=couchbase-server&fromb=6.5.0-2234&tob=6.5.0-2237.

      Possible commit that caused regression :
      CHANGELOG for kv_engine

      • Commit: 299a69fef2a5dde3f244a9fc14f75b496ae12b8e in build: 6.5.0-2237
        MB-32807 [SR]: Enable Multiple-Replicas

      With this patch we switch on Multiple-Replicas for Durability.
      Two main changes here:

      1) We remove any hard-coded node name in ReplicationChain. So now a
      chain can be set only by ns_server through set-vbucket-state.

      2) Given that ns_server doesn't pass the topology information yet, we
      temporarily update the Replication Chain at master when the Producer
      receives the 'consumer_name' via DcpControl.
      Note that this is just a workaround, we'll remove it as soon as
      ns_server provides the topology via set-vbucket-state.

      Change-Id: I8413824adf62f5bcca5fca3f7bc91ea8875ab34a
      Reviewed-on: http://review.couchbase.org/104101
      Reviewed-by: Dave Rigby <daver@couchbase.com>
      Tested-by: Build Bot <build@couchbase.com>

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              drigby Dave Rigby (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty