Details
-
Bug
-
Resolution: Fixed
-
Major
-
Cheshire-Cat
-
7.0.0-4350
-
Untriaged
-
1
-
Unknown
-
KV-Engine Sprint 2021 August
Description
Description:
Observed cbstats command fails intermittently. Most of the time issue is observed when we have multiple collections(and load is more). Below are steps of the test where this issue was observed.
Steps to repro:
#Created a 4 node cluster (replicas =3, 50 collections)
(172.23.106.65, 172.23.106.68, 172.23.106.177, 172.23.120.196, 172.23.106.198(indexer node) )
#Loaded 2.5 million items(50k items in each of 50 collections with doc size = 1024B)
#Stopped persistence on node 172.23.106.65
#Started new doc ops on replica nodes ( 172.23.106.68, 172.23.106.177, 172.23.120.196).
#Kill memcached on node 172.23.120.65 to trigger rollbacks on other nodes
- After rollback triggered below cbstats command on node 172.23.120.196
/opt/couchbase/bin/cbstats localhost:11210 -u Administrator -p password -b default failovers - Below is the error for same
Traceback (most recent call last):
|
File "/opt/couchbase/lib/python/cbstats", line 999, in <module>
|
main()
|
File "/opt/couchbase/lib/python/cbstats", line 996, in main
|
c.execute()
|
File "/opt/couchbase/lib/python/clitool.py", line 71, in execute
|
f[0](mc, *args[2:], **opts.__dict__)
|
File "/opt/couchbase/lib/python/cbstats", line 38, in g
|
f(*args, **kwargs)
|
File "/opt/couchbase/lib/python/cli_auth_utils.py", line 67, in g
|
mc.sasl_auth_plain(username, password)
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 483, in sasl_auth_plain
|
return self.sasl_auth_start('PLAIN', '\0'.join([foruser, user, password]))
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 479, in sasl_auth_start
|
return self._doCmd(memcacheConstants.CMD_SASL_AUTH, mech, data)
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 298, in _doCmd
|
return self._handleSingleResponse(opaque)
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 291, in _handleSingleResponse
|
cmd, opaque, cas, keylen, extralen, data = self._handleKeyedResponse(myopaque)
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 276, in _handleKeyedResponse
|
cmd, errcode, opaque, cas, keylen, extralen, rv = self._recvMsg()
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 245, in _recvMsg
|
data = self._socketRecv(MIN_RECV_PACKET - len(response))
|
File "/opt/couchbase/lib/python/mc_bin_client.py", line 240, in _socketRecv
|
raise TimeoutError(30)
|
mc_bin_client.TimeoutError: Error: Operation timed out (30 seconds)
|
Below log was observed in memcached, This is the timestamp at which command was executed
2021-02-03T10:36:34.209271-08:00 INFO 25000: HELO [cbstats unknown version] XERROR, Collections [ {"ip":"::1","port":49080} - {"ip":"::1","port":11210} (<ud>Administrator</ud>) ]
|
Cluster Info:
|
+----------------+----------------+--------------+
|
| Nodes | Services | Status |
|
+----------------+----------------+--------------+
|
| 172.23.106.165 | kv | Cluster node |
|
| 172.23.106.168 | ['kv'] | <--- IN --- |
|
| 172.23.106.177 | ['kv'] | <--- IN --- |
|
| 172.23.106.196 | ['kv'] | <--- IN --- |
|
| 172.23.106.198 | ['n1ql,index'] | <--- IN --- |
|
+----------------+----------------+--------------+
|
QE test:
|
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.92199.ini bucket_storage=magma,rerun=false,GROUP=P0;crash,randomize_value=True,bucket_eviction_policy=fullEviction,get-cbcollect-info=True,infra_log_level=debug,log_level=debug,dcp_services=n1ql:index,upgrade_version=7.0.0-4350 -t magma.magma_rollback.MagmaRollbackTests.test_crash_during_rollback,num_items=50000,doc_size=1024,nodes_init=4,num_rollbacks=5,vbuckets=1024,rollback_items=2000,replicas=3,key_size=12,init_loading=False,doc_ops=expiry:create:update:delete,num_collections=49,process_concurrency=2,collections_for_rollback=10,thread_to_use=1000,GROUP=P0;crash'
|