Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44103

[Magma] cbstats command fails/timeouts intermittently

    XMLWordPrintable

Details

    Description

      Description:
      Observed cbstats command fails intermittently. Most of the time issue is observed when we have multiple collections(and load is more). Below are steps of the test where this issue was observed.

      Steps to repro:
      #Created a 4 node cluster (replicas =3, 50 collections)
      (172.23.106.65, 172.23.106.68, 172.23.106.177, 172.23.120.196, 172.23.106.198(indexer node) )
      #Loaded 2.5 million items(50k items in each of 50 collections with doc size = 1024B)
      #Stopped persistence on node 172.23.106.65
      #Started new doc ops on replica nodes ( 172.23.106.68, 172.23.106.177, 172.23.120.196).
      #Kill memcached on node 172.23.120.65 to trigger rollbacks on other nodes

      1. After rollback triggered below cbstats command on node 172.23.120.196
        /opt/couchbase/bin/cbstats localhost:11210 -u Administrator -p password -b default failovers
      2. Below is the error for same

      Traceback (most recent call last):
        File "/opt/couchbase/lib/python/cbstats", line 999, in <module>
          main()
        File "/opt/couchbase/lib/python/cbstats", line 996, in main
          c.execute()
        File "/opt/couchbase/lib/python/clitool.py", line 71, in execute
          f[0](mc, *args[2:], **opts.__dict__)
        File "/opt/couchbase/lib/python/cbstats", line 38, in g
          f(*args, **kwargs)
        File "/opt/couchbase/lib/python/cli_auth_utils.py", line 67, in g
          mc.sasl_auth_plain(username, password)
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 483, in sasl_auth_plain
          return self.sasl_auth_start('PLAIN', '\0'.join([foruser, user, password]))
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 479, in sasl_auth_start
          return self._doCmd(memcacheConstants.CMD_SASL_AUTH, mech, data)
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 298, in _doCmd
          return self._handleSingleResponse(opaque)
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 291, in _handleSingleResponse
          cmd, opaque, cas, keylen, extralen, data = self._handleKeyedResponse(myopaque)
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 276, in _handleKeyedResponse
          cmd, errcode, opaque, cas, keylen, extralen, rv = self._recvMsg()
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 245, in _recvMsg
          data = self._socketRecv(MIN_RECV_PACKET - len(response))
        File "/opt/couchbase/lib/python/mc_bin_client.py", line 240, in _socketRecv
          raise TimeoutError(30)
      mc_bin_client.TimeoutError: Error: Operation timed out (30 seconds)
      

      Below log was observed in memcached, This is the timestamp at which command was executed

      2021-02-03T10:36:34.209271-08:00 INFO 25000: HELO [cbstats unknown version] XERROR, Collections [ {"ip":"::1","port":49080} - {"ip":"::1","port":11210} (<ud>Administrator</ud>) ]
      

      Cluster Info:
      +----------------+----------------+--------------+
      | Nodes          | Services       | Status       |
      +----------------+----------------+--------------+
      | 172.23.106.165 | kv             | Cluster node |
      | 172.23.106.168 | ['kv']         | <--- IN ---  |
      | 172.23.106.177 | ['kv']         | <--- IN ---  |
      | 172.23.106.196 | ['kv']         | <--- IN ---  |
      | 172.23.106.198 | ['n1ql,index'] | <--- IN ---  |
      +----------------+----------------+--------------+
      

      QE test: 
      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.92199.ini bucket_storage=magma,rerun=false,GROUP=P0;crash,randomize_value=True,bucket_eviction_policy=fullEviction,get-cbcollect-info=True,infra_log_level=debug,log_level=debug,dcp_services=n1ql:index,upgrade_version=7.0.0-4350 -t magma.magma_rollback.MagmaRollbackTests.test_crash_during_rollback,num_items=50000,doc_size=1024,nodes_init=4,num_rollbacks=5,vbuckets=1024,rollback_items=2000,replicas=3,key_size=12,init_loading=False,doc_ops=expiry:create:update:delete,num_collections=49,process_concurrency=2,collections_for_rollback=10,thread_to_use=1000,GROUP=P0;crash'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ankush.sharma Ankush Sharma
            ankush.sharma Ankush Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There is 1 open Gerrit change

                PagerDuty