Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30074

[System Test] KV node still in pending state after memcached crashes

    XMLWordPrintable

Details

    Description

      Build : 5.5.0-2884
      Test : GSI component test : -test tests/2i/test_idx_rebalance_replica_vulcan_kv_opt.yml -scope tests/2i/scope_idx_rebalance_replica_vulcan.yml
      Scale : 3
      Iteration : 3rd iteration (~20 hrs of run)
      In the 3rd iteration, the step in the test to rebalance out a KV node fails when memcached crashes on the master node. The cluster becomes pretty unusable after this. The subsequent rebalance has got stuck (see MB-30073), the buckets are in warmup state forever

      The following error is shown on the diag log:

      Service 'memcached' exited with status 137. Restarting. Messages:
      2018-06-11T18:33:16.430335Z WARNING 106: Slow operation. {"cid":"172.23.106.161:40160/81905100","duration":"4193 ms","trace":"request=10723899535396330:4193337","command":"GET","peer":"172.23.106.161:40160"}
      2018-06-11T18:33:16.433047Z WARNING 110: Slow operation. {"cid":"172.23.106.161:40210/31c25100","duration":"4205 ms","trace":"request=10723899526306866:4205138","command":"GET","peer":"172.23.106.161:40210"}
      2018-06-11T18:33:16.433697Z WARNING (other-3) Slow runtime for 'Running a flusher loop: shard 2' on thread writer_worker_0: 4198 ms
      2018-06-11T18:33:16.521086Z WARNING (other-1) Slow runtime for 'Checkpoint Remover on vb 213' on thread nonIO_worker_1: 20 ms
      2018-06-11T18:33:16.581773Z WARNING (other-2) Slow runtime for 'Checkpoint Remover on vb 498' on thread nonIO_worker_0: 33 ms
      2018-06-11T18:33:17.253240Z WARNING (other-2) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 346 ms
      2018-06-11T18:33:18.229913Z WARNING (default) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 623 ms
      2018-06-11T18:33:19.469135Z WARNING (other-1) Slow runtime for 'Checkpoint Remover on vb 339' on thread nonIO_worker_0: 20 ms
      

      Following is seen in the debug log:

      [error_logger:error,2018-06-11T18:33:24.935-07:00,ns_1@172.23.104.16:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
      =========================CRASH REPORT=========================
        crasher:
          initial call: erlang:apply/2
          pid: <0.12839.0>
          registered_name: []
          exception error: no match of right hand side value {error,closed}
            in function  mc_client_binary:stats_recv/4 (src/mc_client_binary.erl, line 164)
            in call from mc_client_binary:stats/4 (src/mc_client_binary.erl, line 406)
            in call from ns_memcached:do_handle_call/3 (src/ns_memcached.erl, line 460)
            in call from ns_memcached:worker_loop/3 (src/ns_memcached.erl, line 228)
          ancestors: ['ns_memcached-default',<0.12824.0>,
                        'single_bucket_kv_sup-default',ns_bucket_sup,
                        ns_bucket_worker_sup,ns_server_sup,ns_server_nodes_sup,
                        <0.170.0>,ns_server_cluster_sup,<0.89.0>]
          messages: []
          links: [<0.12825.0>,#Port<0.9577>]
          dictionary: [{last_call,verify_warmup},{sockname,{{127,0,0,1},51713}}]
          trap_exit: false
          status: running
          heap_size: 6772
          stack_size: 27
          reductions: 241838516
        neighbours:
      
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ajit.yagaty Ajit Yagaty [X] (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty