Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48616

[System Test] Missing files on disk prevent memcached shutdown

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Critical
    • None
    • 7.0.2
    • memcached

    Description

      Build : 7.0.2-6684
      Test : -test tests/xdcr/test_xdcrStress.yml -scope tests/xdcr/scope_6x4Node.yml
      Scale : 3
      Iteration : 19
      Day : 3rd

      On 172.23.97.237, ever since it was being added to the cluster as part of a swap rebalance at 2021-09-23T00:37:34, the following errors were seen in the projector logs :

      2021-09-23T00:37:43.352-07:00 [Info] Error occurred during cluster info update (HTTP error 500 Internal Server Error getting "http://127.0.0.1:8091/pools/default?uuid=c2c6da925256c4df3b964d20f539c4bc": ["Unexpected server error, request logged."]) .. Retrying(1)
      2021-09-23T00:37:50.371-07:00 [Info] Error occurred during cluster info update (HTTP error 500 Internal Server Error getting "http://127.0.0.1:8091/pools/default?uuid=c2c6da925256c4df3b964d20f539c4bc": ["Unexpected server error, request logged."]) .. Retrying(2)
      ...
      ...
      ...
      2021-09-23T01:12:29.946-07:00 [Info] Error occurred during cluster info update (HTTP error 500 Internal Server Error getting "http://127.0.0.1:8091/pools/default?uuid=c2c6da925256c4df3b964d20f539c4bc": ["Unexpected server error, request logged."]) .. Retrying(299)
      2021-09-23T01:12:36.956-07:00 [Info] Error occurred during cluster info update (HTTP error 500 Internal Server Error getting "http://127.0.0.1:8091/pools/default?uuid=c2c6da925256c4df3b964d20f539c4bc": ["Unexpected server error, request logged."]) .. Retrying(300)
      panic: HTTP error 500 Internal Server Error getting "http://127.0.0.1:8091/pools/default?uuid=c2c6da925256c4df3b964d20f539c4bc": ["Unexpected server error, request logged."]
       
      goroutine 1 [running]:
      github.com/couchbase/indexing/secondary/common.CrashOnError(...)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/common/util.go:456
      github.com/couchbase/indexing/secondary/projector.NewProjector(0x400, 0xc0001dbd10, 0x7ffd01233a0b, 0x3a, 0x7ffd01233a50, 0x39, 0x1)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/projector/projector.go:99 +0x1329
      main.main()
      	goproj/src/github.com/couchbase/indexing/secondary/cmd/projector/main.go:157 +0x563
      

      Projector crashes, and seems to be getting in the same loop again, crashing again twice after 300 retries.

      Around the same time, in the debug logs, the following errors are seen :

      [user:info,2021-09-23T00:37:00.370-07:00,ns_1@172.23.97.237:<0.3774.115>:ns_log:crash_consumption_loop:63]Service 'memcached' exited with status 1. Restarting. Messages:
      2021-09-23T00:37:00.367185-07:00 ERROR Failed to start Prometheus Exposer: null context when constructing CivetServer. Possible problem binding to port.
      2021-09-23T00:37:00.367236-07:00 CRITICAL Failed to start Prometheus exposer on family:inet port:11280
      ...
      ...
      ...
      [ns_server:debug,2021-09-23T00:37:00.373-07:00,ns_1@172.23.97.237:memcached_config_mgr<0.21562.115>:memcached_config_mgr:find_port_pid_loop:135]Found memcached port undefined
      [error_logger:error,2021-09-23T00:37:00.374-07:00,ns_1@172.23.97.237:memcached_config_mgr<0.21562.115>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: memcached_config_mgr:init/1
          pid: <0.21562.115>
          registered_name: memcached_config_mgr
          exception exit: {noproc,{gen_server,call,[undefined,is_active,infinity]}}
            in function  gen_server:call/3 (gen_server.erl, line 223)
            in call from memcached_config_mgr:read_current_memcached_config/1 (src/memcached_config_mgr.erl, line 276)
            in call from memcached_config_mgr:init/1 (src/memcached_config_mgr.erl, line 51)
          ancestors: [ns_server_sup,ns_server_nodes_sup,<0.15711.14>,
                        ns_server_cluster_sup,root_sup,<0.139.0>]
          message_queue_len: 0
          messages: []
          links: [<0.3787.115>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 10958
          stack_size: 27
          reductions: 37512
        neighbours:
       
      [ns_server:debug,2021-09-23T00:37:00.374-07:00,ns_1@172.23.97.237:<0.21597.115>:remote_monitors:handle_down:151]Caller of remote monitor <0.21562.115> died with {noproc,
                                                        {gen_server,call,
                                                         [undefined,is_active,
                                                          infinity]}}. Exiting
      
      

      These looks to be the ones causing the projector crashes.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty