Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
7.1.0
-
7.1.0-1068
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_even2-magma.ini rerun=False,get-cbcollect-info=False,quota_percent=99,crash_warning=True,retry_get_process_num=200,bucket_storage=magma,enable_dp=True,log_level=debug,infra_log_level=debug -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_out,nodes_init=5,nodes_out=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=before,scrape_interval=5,rebalance_moves_per_node=32,quota_percent=80,skip_validations=False,GROUP=rebalance_with_collection_crud'
|
Steps to Repro
1. Create a 5 node cluster
2021-07-14 02:19:01,244 | test | INFO | pool-5-thread-7 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
----------------------------------------------------------------------
172.23.106.156 | kv | 7.1.0-1068-enterprise | 1.16541353383 | Cluster node |
172.23.106.159 | None | <--- IN — | ||
172.23.106.163 | None | <--- IN — | ||
172.23.106.164 | None | <--- IN — | ||
172.23.106.165 | None | <--- IN — |
----------------------------------------------------------------------
2. Create Bucket/scopes/collections/data
2021-07-14 02:20:39,635 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
----------------------------------------------------------------------------------------------------------------------------------------------
Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
----------------------------------------------------------------------------------------------------------------------------------------------
c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 | couchbase | magma | 2 | none | 0 | 3000000 | 9.77 GiB | 2.73 GiB | 3.04 GiB | 100 |
----------------------------------------------------------------------------------------------------------------------------------------------
3. Rebalance out 2 node
2021-07-14 02:20:47,256 | test | DEBUG | MainProcess | pool-5-thread-13 | [rest_client:rebalance:1338] Rebalance params: {'password': 'password', 'knownNodes': u'ns_1@172.23.106.165,ns_1@172.23.106.163,ns_1@172.23.106.156,ns_1@172.23.106.164,ns_1@172.23.106.159', 'ejectedNodes': u'ns_1@172.23.106.164,ns_1@172.23.106.165', 'user': 'Administrator'}
|
2021-07-14 02:20:47,322 | test | INFO | MainProcess | pool-5-thread-13 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
----------------------------------------------------------------------
172.23.106.165 | kv | 7.1.0-1068-enterprise | 3.87323943662 | — OUT ---> |
172.23.106.163 | kv | 7.1.0-1068-enterprise | 3.72834546824 | Cluster node |
172.23.106.156 | kv | 7.1.0-1068-enterprise | 4.52374968585 | Cluster node |
172.23.106.164 | kv | 7.1.0-1068-enterprise | 3.46081504702 | — OUT ---> |
172.23.106.159 | kv | 7.1.0-1068-enterprise | 3.43789209536 | Cluster node |
----------------------------------------------------------------------
Rebalance fails as shown below
2021-07-14 02:23:18,486 | test | ERROR | MainProcess | pool-5-thread-13 | [rest_client:_rebalance_status_and_progress:1547] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'cdc7d7ba66d97d2103c7ca44798a42bc', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=a7601ad898297ab91ce100a4e6ef3876', u'status': u'notRunning'} - rebalance failed
|
2021-07-14 02:23:18,529 | test | INFO | MainProcess | pool-5-thread-13 | [rest_client:print_UI_logs:2693] Latest logs from UI on 172.23.106.156:
|
2021-07-14 02:23:18,529 | test | ERROR | MainProcess | pool-5-thread-13 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.106.156', u'tstamp': 1626254597921L, u'shortText': u'message', u'serverTime': u'2021-07-14T02:23:17.921Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.3710.1>,\n {{dcp_wait_for_data_move_failed,\n "c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000",\n 909,\'ns_1@172.23.106.165\',\n [\'ns_1@172.23.106.159\',\n \'ns_1@172.23.106.163\',\n \'ns_1@172.23.106.156\'],\n {error,no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,459}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,234}]}]}}}}.\nRebalance Operation Id = a26bf938559f1ab59e8016f8e6a5e2cc'}
|
2021-07-14 02:23:18,529 | test | ERROR | MainProcess | pool-5-thread-13 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.106.156', u'tstamp': 1626254597884L, u'shortText': u'message', u'serverTime': u'2021-07-14T02:23:17.884Z', u'text': u'Worker <0.3015.1> (for action {move,{909,\n [\'ns_1@172.23.106.165\',\n \'ns_1@172.23.106.159\',\n \'ns_1@172.23.106.163\'],\n [\'ns_1@172.23.106.159\',\n \'ns_1@172.23.106.163\',\n \'ns_1@172.23.106.156\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.3710.1>,\n {{dcp_wait_for_data_move_failed,\n "c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000",\n 909,\n \'ns_1@172.23.106.165\',\n [\'ns_1@172.23.106.159\',\n \'ns_1@172.23.106.163\',\n \'ns_1@172.23.106.156\'],\n {error,\n no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 459}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 234}]}]}}}'}
|
|
2021-07-14 02:23:18,529 | test | ERROR | MainProcess | pool-5-thread-13 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'info', u'node': u'ns_1@172.23.106.156', u'tstamp': 1626254448139L, u'shortText': u'message', u'serverTime': u'2021-07-14T02:20:48.139Z', u'text': u'Bucket "c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000" rebalance does not seem to be swap rebalance'}
|
2021-07-14 02:23:18,530 | test | ERROR | MainProcess | pool-5-thread-13 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_rebalancer', u'type': u'info', u'node': u'ns_1@172.23.106.156', u'tstamp': 1626254447682L, u'shortText': u'message', u'serverTime': u'2021-07-14T02:20:47.682Z', u'text': u'Started rebalancing bucket c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000'}
|
|
2021-07-14 02:23:18,530 | test | ERROR | MainProcess | pool-5-thread-13 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'info', u'node': u'ns_1@172.23.106.156', u'tstamp': 1626254447262L, u'shortText': u'message', u'serverTime': u'2021-07-14T02:20:47.262Z', u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.106.163','ns_1@172.23.106.156',\n 'ns_1@172.23.106.159'], EjectNodes = ['ns_1@172.23.106.164',\n 'ns_1@172.23.106.165'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = a26bf938559f1ab59e8016f8e6a5e2cc"}
|
ERROR's seen on 172.23.106.165
2021-07-14 02:23:40,032 | infra | DEBUG | MainProcess | MainThread | [remote_util:execute_command_raw_jsch:3285] Running command on 172.23.106.165: grep -r ' ERROR ' /opt/couchbase/var/lib/couchbase/logs/memcached.log.000264.txt | grep -v 'XERROR' | grep -v 'compaction failed for vb' | grep -v 'failed - Already exists'
|
2021-07-14 02:23:40,073 | test | CRITICAL | MainProcess | MainThread | [basetestcase:check_coredump_exist:846] 172.23.106.165: Found ' ERROR ' logs - ['2021-07-14T02:23:17.786518-07:00 ERROR 844: (c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":84,"cas":0,"datatype":["JSON"],"extlen":0,"keylen":0,"magic":"ClientResponse","opaque":164,"opcode":"DCP_MUTATION","status":"Unknown Collection"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000, vb:909, state:backfilling\n', '2021-07-14T02:23:17.786558-07:00 ERROR 844: (c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":84,"cas":0,"datatype":["JSON"],"extlen":0,"keylen":0,"magic":"ClientResponse","opaque":164,"opcode":"DCP_MUTATION","status":"Unknown Collection"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000, vb:909, state:backfilling\n', '2021-07-14T02:23:17.786589-07:00 ERROR 844: (c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":84,"cas":0,"datatype":["JSON"],"extlen":0,"keylen":0,"magic":"ClientResponse","opaque":164,"opcode":"DCP_MUTATION","status":"Unknown Collection"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000, vb:909, state:backfilling\n', '2021-07-14T02:23:17.786610-07:00 ERROR 844: (c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":84,"cas":0,"datatype":["JSON"],"extlen":0,"keylen":0,"magic":"ClientResponse","opaque":164,"opcode":"DCP_MUTATION","status":"Unknown Collection"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000, vb:909, state:backfilling\n', '2021-07-14T02:23:17.786630-07:00 ERROR 844: (c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":84,"cas":0,"datatype":["JSON"],"extlen":0,"keylen":0,"magic":"ClientResponse","opaque":164,"opcode":"DCP_MUTATION","status":"Unknown Collection"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000, vb:909, state:backfilling\n', '2021-07-14T02:23:17.786649-07:00 ERROR 844: (c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000) DCP (Producer) eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":84,"cas":0,"datatype":["JSON"],"extlen":0,"keylen":0,"magic":"ClientResponse","opaque":164,"opcode":"DCP_MUTATION","status":"Unknown Collection"} for stream:stream name:eq_dcpq:replication:ns_1@172.23.106.165->ns_1@172.23.106.156:c%MNWwHodAxDAKixAjh6Tx1c0Y6I9nPBOnCx%8la44VtIgp4m-11-583000, vb:909, state:backfilling\n']
|
cbcollect_info attached.
Attachments
Issue Links
- duplicates
-
MB-42652 [Magma]: After dropping 10 collections out of 21 collections items count are more than expected.
- Closed