Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
Cheshire-Cat
-
Centos 7 64 bit; Couchbase EE 7.0.0-3938
-
Triaged
-
Centos 64-bit
-
-
1
-
Yes
Description
Script to Repo
/testrunner -i /tmp/testexec.13082.ini GROUP=auto_reprovision_durability,rerun=False,get-cbcollect-info=True,infra_log_level=critical,log_level=error,upgrade_version=7.0.0-3938 -t failover.AutoFailoverTests.AutoFailoverTests.test_autofailover_during_rebalance,upgrade_version=7.0.0-3938,nodes_in=0,timeout=5,rerun=False,auto_reprovision=True,GROUP=auto_reprovision_durability,nodes_out=1,bucket_spec=single_bucket.buckets_all_ephemeral_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,get-cbcollect-info=True,replicas=2,durability=MAJORITY,log_level=error,failover_action=restart_server,nodes_init=4,can_abort_rebalance=False,num_node_failures=1,override_spec_params=durability;replicas,infra_log_level=critical |
Summary:
The test introduces server_restart on a node with autoreprovison enabled, and expects the ongoing rebalance to fail. After node comes up back again, bucket gets autoreprovisioned and a subsequent rebalance is triggered, which in this case failed.
(Not a new test)
Observations and timeline
Inject failure condition of "server_restart" on .119 and expect an ongoing rebalance to fail
2020-12-06 04:12:14,760 | test | ERROR | pool-1-thread-20 | [rest_client:print_UI_logs:2596] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.123.117', u'tstamp': 1607256732759L, u'shortText': u'message', u'serverTime': u'2020-12-06T04:12:12.759Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n |
.119 node comes up
2020-12-06 04:15:27,631 | test | ERROR | pool-1-thread-6 | [rest_client:print_UI_logs:2596] {u'code': 0, u'module': u'auto_reprovision', u'type': u'info', u'node': u'ns_1@172.23.123.117', u'tstamp': 1607256747350L, u'shortText': u'message', u'serverTime': u'2020-12-06T04:12:27.350Z', u'text': u'auto-reprovision is disabled as maximum number of nodes (1) that can be auto-reprovisioned has been reached.'} |
2020-12-06 04:15:27,631 | test | ERROR | pool-1-thread-6 | [rest_client:print_UI_logs:2596] {u'code': 0, u'module': u'auto_reprovision', u'type': u'info', u'node': u'ns_1@172.23.123.117', u'tstamp': 1607256747348L, u'shortText': u'message', u'serverTime': u'2020-12-06T04:12:27.348Z', u'text': u'Bucket "default" has been reprovisioned on following nodes: [\'ns_1@172.23.123.119\']. Nodes on which the data service restarted: [\'ns_1@172.23.123.119\'].'} |
Subsequent rebalance fails (expected it to pass)
2020-12-06 04:21:37,841 | test | ERROR | pool-1-thread-21 | [rest_client:print_UI_logs:2596] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.123.117', u'tstamp': 1607257293145L, u'shortText': u'message', u'serverTime': u'2020-12-06T04:21:33.145Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n |
Few error messages
172.23.123.119 : Found exceptionRunLoop message in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt |
2020-12-06T04:21:32.126048-08:00 WARNING 792: exception occurred in runloop during packet execution. Closing connection: PassiveDurabilityMonitor::completeSyncWrite vb:767 No Prepare waiting for completion, but received commit for key <ud>cid:0xaa:test_collections-161</ud>. Cookies: [{"aiostat":"success","connection":"[ {\"ip\":\"127.0.0.1\",\"port\":56137} - {\"ip\":\"127.0.0.1\",\"port\":11209} (<ud>@ns_server</ud>) ]","engine_storage":"0x0000000000000000","ewouldblock":false,"packet":{"bodylen":38,"cas":0,"datatype":"raw","extlen":16,"key":"<ud>..test_collections-161</ud>","keylen":22,"magic":"ClientRequest","opaque":202,"opcode":"DCP_COMMIT","vbucket":767},"refcount":1}] |
172.23.121.216 : Found ERROR message in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000004.txt |
2020-12-06T04:21:31.944275-08:00 ERROR (default) VBucket::commit (vb:767) failed as no HashTable item found with key:<ud>cid:0xaa:test_collections-161</ud> prepare_seqno:1803, commit_seqno:2032 |
172.23.121.217 : Found ERROR message in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000004.txt |
2020-12-06T04:21:31.945663-08:00 ERROR 57: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.121.217->ns_1@172.23.121.216:default - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":198,"opcode":"DCP_COMMIT","status":"Not found"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.121.217->ns_1@172.23.121.216:default, vb:767, state:in-memory |
Attachments
Issue Links
- relates to
-
MB-41089 Ephemeral HTCleaner may remove in-flight SyncWrite from the HashTable
- Closed