Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46481

[Magma] Bucket taking over 8 minutes to initialise after memcached kill

    XMLWordPrintable

Details

    Description

      Steps:
      1. Create a 6 node cluster:

      +----------------+-----------+-----------------------+---------------+--------------+
      | Nodes          | Services  | Version               | CPU           | Status       |
      +----------------+-----------+-----------------------+---------------+--------------+
      | 172.23.106.71  | kv        | 7.0.0-5219-enterprise | 1.48036632794 | Cluster node |
      | 172.23.106.216 | ['kv']    |                       |               | <--- IN ---  |
      | 172.23.106.215 | ['kv']    |                       |               | <--- IN ---  |
      | 172.23.106.206 | ['kv']    |                       |               | <--- IN ---  |
      | 172.23.106.211 | ['n1ql']  |                       |               | <--- IN ---  |
      | 172.23.106.214 | ['index'] |                       |               | <--- IN ---  |
      +----------------+-----------+-----------------------+---------------+--------------+
      

      2. Create a default magma bucket. Load some initial data in the bucket.
      3. Start a new data loading thread.
      4. Start a crash thread that is kill -9 memcached on data nodes. In every kill, test wait for bucket to come online. To check that we run cbstats continuously on the nodes to get the warmup=complete status.
      5. While cbstats is running, bucket is stuck in warmup and not able to come out of it. As soon as cbstats stopped bucket status becomes green.

      Node 71 cbstats

      2021-05-24 07:14:23,694 | test  | WARNING | Thread-1   | [bucket_ready_functions:_wait_warmup_completed:3893] Exception during cbstat all cmd: Traceback (most recent call last):
       
        File "/opt/couchbase/lib/python/cbstats", line 1010, in <module>
       
          main()
       
        File "/opt/couchbase/lib/python/cbstats", line 1007, in main
       
          c.execute()
       
        File "/opt/couchbase/lib/python/clitool.py", line 83, in execute
       
          f[0](mc, *args[2:], **opts.__dict__)
       
        File "/opt/couchbase/lib/python/cbstats", line 49, in g
       
          f(*args, **kwargs)
       
        File "/opt/couchbase/lib/python/cli_auth_utils.py", line 79, in g
       
          mc.sasl_auth_plain(username, password)
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 488, in sasl_auth_plain
       
          return self.sasl_auth_start('PLAIN', '\0'.join([foruser, user, password]))
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 484, in sasl_auth_start
       
          return self._doCmd(memcacheConstants.CMD_SASL_AUTH, mech, data)
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 303, in _doCmd
       
          return self._handleSingleResponse(opaque)
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 296, in _handleSingleResponse
       
          cmd, opaque, cas, keylen, extralen, data = self._handleKeyedResponse(myopaque)
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 281, in _handleKeyedResponse
       
          cmd, errcode, opaque, cas, keylen, extralen, rv = self._recvMsg()
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 250, in _recvMsg
       
          data = self._socketRecv(MIN_RECV_PACKET - len(response))
       
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 245, in _socketRecv
       
          raise TimeoutError(30)
       
      mc_bin_client.TimeoutError: Error: Operation timed out (30 seconds)
      

      Note: Not seen this issue on couchstore but this issue seems to be occurring on magma very recently. Not quite sure on the regression build. A quick analysis on the issue would be helpful.

      QE Test

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.10991.ini bucket_storage=magma,rerun=false,GROUP=P1;recovery,randomize_value=true,doc_size=256,bucket_eviction_policy=fullEviction,replicas=0,nodes_init=4,dcp_services=n1ql-index,enable_dp=True,collect_pcaps=True,transaction_version=1.1.5,upgrade_version=7.0.0-5219 -t magma.magma_crash_recovery.MagmaCrashTests.test_crash_during_recovery,num_items=10000000,doc_size=1024,sdk_timeout=60,doc_ops=create:delete,replicas=0,GROUP=P1;recovery'
      

      Test Run(test_1): http://qa.sc.couchbase.com/job/test_suite_executor-TAF/116874/console

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sarath Sarath Lakshman
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty