Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Not a Bug
-
Cheshire-Cat
-
Untriaged
-
Windows 64-bit
-
-
1
-
Unknown
Description
Build: 7.0.0-5017
Scenario:
- Multi node cluster with all index, n1ql,cbas,eventing,fts services deployed
- Running various rebalances in parallel to constant data load
- Deploying eventing functions when rebalance_in is been triggered
- Rebalance failed during rebalance_in operation of kv node
+----------------+---------------+-----------------------+---------------+--------------+
| Nodes | Services | Version | CPU | Status |
+----------------+---------------+-----------------------+---------------+--------------+
| 172.23.136.114 | index, n1ql | 7.0.0-5017-enterprise | 14.5080915318 | Cluster node |
| 172.23.136.106 | kv | 7.0.0-5017-enterprise | 95.3309016852 | Cluster node |
| 172.23.138.127 | cbas | 7.0.0-5017-enterprise | 23.905 | Cluster node |
| 172.23.136.108 | kv | 7.0.0-5017-enterprise | 94.7126321842 | Cluster node |
| 172.23.136.112 | backup | 7.0.0-5017-enterprise | 1.21166666667 | Cluster node |
| 172.23.136.115 | eventing, fts | 7.0.0-5017-enterprise | 8.803946568 | Cluster node |
| 172.23.136.113 | index, n1ql | 7.0.0-5017-enterprise | 13.9064348928 | Cluster node |
| 172.23.136.110 | kv | 7.0.0-5017-enterprise | 92.4732706106 | Cluster node |
| 172.23.136.105 | kv | 7.0.0-5017-enterprise | 94.7908767427 | Cluster node |
| 172.23.136.107 | ['kv'] | | | <--- IN --- |
+----------------+---------------+-----------------------+---------------+--------------+
Observation:
Seeing eventing rebalance failed with reason,
"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT"
Rebalance exited with reason {service_rebalance_failed,eventing,
|
{worker_died,
|
{'EXIT',<0.20190.18>,
|
{{badmatch,
|
{error,
|
{bad_nodes,eventing,prepare_rebalance,
|
[{'ns_1@172.23.136.115',
|
{error,
|
{unknown_error,
|
<<"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]">>}}}]}}},
|
[{service_rebalancer,rebalance_worker,1,
|
[{file,"src/service_rebalancer.erl"},
|
{line,158}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,234}]}]}}}}.
|
Rebalance Operation Id = e197b89281485206bb3f29fba4e1f1ca
|
Â
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Attachment | rebalance_failure_test.log [ 137773 ] |
Description |
*Build*: 7.0.0-5017 *Scenario*: * Multi node cluster with all index, n1ql,cbas,eventing,fts services deployed * Running various rebalances in parallel to constant data load * Rebalance failed during final rebalance_in operation of kv node {noformat} +----------------+---------------+-----------------------+---------------+--------------+ | Nodes | Services | Version | CPU | Status | +----------------+---------------+-----------------------+---------------+--------------+ | 172.23.136.114 | index, n1ql | 7.0.0-5017-enterprise | 14.5080915318 | Cluster node | | 172.23.136.106 | kv | 7.0.0-5017-enterprise | 95.3309016852 | Cluster node | | 172.23.138.127 | cbas | 7.0.0-5017-enterprise | 23.905 | Cluster node | | 172.23.136.108 | kv | 7.0.0-5017-enterprise | 94.7126321842 | Cluster node | | 172.23.136.112 | backup | 7.0.0-5017-enterprise | 1.21166666667 | Cluster node | | 172.23.136.115 | eventing, fts | 7.0.0-5017-enterprise | 8.803946568 | Cluster node | | 172.23.136.113 | index, n1ql | 7.0.0-5017-enterprise | 13.9064348928 | Cluster node | | 172.23.136.110 | kv | 7.0.0-5017-enterprise | 92.4732706106 | Cluster node | | 172.23.136.105 | kv | 7.0.0-5017-enterprise | 94.7908767427 | Cluster node | | 172.23.136.107 | ['kv'] | | | <--- IN --- | +----------------+---------------+-----------------------+---------------+--------------+ {noformat} *Observation:* Seeing eventing rebalance failed with reason, "{{Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT}}*"* {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.20190.18>, {{badmatch, {error, {bad_nodes,eventing,prepare_rebalance, [{'ns_1@172.23.136.115', {error, {unknown_error, <<"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]">>}}}]}}}, [{service_rebalancer,rebalance_worker,1, [{file,"src/service_rebalancer.erl"}, {line,158}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}. Rebalance Operation Id = e197b89281485206bb3f29fba4e1f1ca{noformat} |
*Build*: 7.0.0-5017
*Scenario*: * Multi node cluster with all index, n1ql,cbas,eventing,fts services deployed * Running various rebalances in parallel to constant data load * Deploying eventing functions when rebalance_in is been triggered * Rebalance failed during rebalance_in operation of kv node {noformat}+----------------+---------------+-----------------------+---------------+--------------+ | Nodes | Services | Version | CPU | Status | +----------------+---------------+-----------------------+---------------+--------------+ | 172.23.136.114 | index, n1ql | 7.0.0-5017-enterprise | 14.5080915318 | Cluster node | | 172.23.136.106 | kv | 7.0.0-5017-enterprise | 95.3309016852 | Cluster node | | 172.23.138.127 | cbas | 7.0.0-5017-enterprise | 23.905 | Cluster node | | 172.23.136.108 | kv | 7.0.0-5017-enterprise | 94.7126321842 | Cluster node | | 172.23.136.112 | backup | 7.0.0-5017-enterprise | 1.21166666667 | Cluster node | | 172.23.136.115 | eventing, fts | 7.0.0-5017-enterprise | 8.803946568 | Cluster node | | 172.23.136.113 | index, n1ql | 7.0.0-5017-enterprise | 13.9064348928 | Cluster node | | 172.23.136.110 | kv | 7.0.0-5017-enterprise | 92.4732706106 | Cluster node | | 172.23.136.105 | kv | 7.0.0-5017-enterprise | 94.7908767427 | Cluster node | | 172.23.136.107 | ['kv'] | | | <--- IN --- | +----------------+---------------+-----------------------+---------------+--------------+ {noformat} *Observation:* Seeing eventing rebalance failed with reason, "{{Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT}}*"* {noformat}Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.20190.18>, {{badmatch, {error, {bad_nodes,eventing,prepare_rebalance, [{'ns_1@172.23.136.115', {error, {unknown_error, <<"Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]">>}}}]}}}, [{service_rebalancer,rebalance_worker,1, [{file,"src/service_rebalancer.erl"}, {line,158}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}. Rebalance Operation Id = e197b89281485206bb3f29fba4e1f1ca{noformat} |
Priority | Major [ 3 ] | Critical [ 2 ] |
Assignee | Jeelan Poola [ jeelan.poola ] | Ankit Prabhu [ ankit.prabhu ] |
Resolution | Not a Bug [ 10200 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Ashwin Govindarajulu [ ashwin.govindarajulu ] |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Fix Version/s | 7.0.0 [ 17233 ] |
Fix Version/s | Cheshire-Cat [ 15915 ] |
Eventing deployment of function "a2_users_search" started around 22:27:20 and completed around 22:28:39.
ns_server.eventing.log:110960:2021-04-27T22:27:20.240-07:00 [Info] SuperSupervisor::SettingsChangeCallback [1] Path => /eventing/appsettings/a2_users_search value => map[string]interface {}{"dcp_stream_boundary":"everything", "deployment_status":true, "description":"", "execution_timeout":60, "language_compatibility":"6.6.2", "log_level":"INFO", "n1ql_consistency":"none", "num_timer_partitions":128, "processing_status":true, "timer_context_size":1024, "user_prefix":"eventing", "worker_count":1}
ns_server.eventing.log:110961:2021-04-27T22:27:20.240-07:00 [Info] SuperSupervisor::SettingsChangeCallback [1] Function: a2_users_search current state: 1 requested status for deployment: true processing: true
ns_server.eventing.log:110962:2021-04-27T22:27:20.240-07:00 [Info] SuperSupervisor::SettingsChangeCallback [1] Function: a2_users_search begin deployment process
ns_server.eventing.log:111240:2021-04-27T22:27:20.283-07:00 [Info] SuperSupervisor::SettingsChangeCallback [1] Function: a2_users_search adding to bootstrap list
ns_server.eventing.log:354804:2021-04-27T22:28:39.912-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a2_users_search bootstrap finished
ns_server.eventing.log:354808:2021-04-27T22:28:39.913-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a2_users_search added to deployed apps map
ns_server.eventing.log:354810:2021-04-27T22:28:39.913-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a2_users_search deleting from bootstrap list
ns_server.eventing.log:354811:2021-04-27T22:28:39.913-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a2_users_search deployment done
ns_server sent prepareTopologyChange around 22:27:55 and it is failed since "a2_users_search" is undergoing bootstrapping.
316446:2021-04-27T22:27:55.761-07:00 [Info] ServiceMgr::PrepareTopologyChange change: service.TopologyChange{ID:"05f19c630d7db8d9acc3c605c71eb58c", CurrentTopologyRev:service.Revision(nil), Type:"topology-change-rebalance", KeepNodes:[]struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{NodeInfo:service.NodeInfo{NodeID:"c2e16dfe88967da8a18a2a76462c6b93", Priority:0, Opaque:interface {}(nil)}, RecoveryType:"recovery-full"}}, EjectNodes:[]service.NodeInfo{}}
316448:2021-04-27T22:27:55.761-07:00 [Info] ServiceMgr::PrepareTopologyChange failed: Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a2_users_search:2021-04-27 22:27:20.2836704 -0700 PDT m=+2452.559014101]
Similarly other 2 instance failed due to "a5_users_search" and "a3_users_search" function are deploying.
358669:2021-04-27T22:28:51.700-07:00 [Info] ServiceMgr::PrepareTopologyChange change: service.TopologyChange{ID:"bae21d851732ae1adf845ab827147875", CurrentTopologyRev:service.Revision(nil), Type:"topology-change-rebalance", KeepNodes:[]struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{NodeInfo:service.NodeInfo{NodeID:"c2e16dfe88967da8a18a2a76462c6b93", Priority:0, Opaque:interface {}(nil)}, RecoveryType:"recovery-full"}}, EjectNodes:[]service.NodeInfo{}}
358671:2021-04-27T22:28:51.700-07:00 [Info] ServiceMgr::PrepareTopologyChange failed: Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a5_users_search:2021-04-27 22:28:39.9396245 -0700 PDT m=+2532.217998501]
362272:2021-04-27T22:29:49.092-07:00 [Info] ServiceMgr::PrepareTopologyChange change: service.TopologyChange{ID:"21f93518747d9ae00ad136c2b2b4a66e", CurrentTopologyRev:service.Revision(nil), Type:"topology-change-rebalance", KeepNodes:[]struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{NodeInfo:service.NodeInfo{NodeID:"c2e16dfe88967da8a18a2a76462c6b93", Priority:0, Opaque:interface {}(nil)}, RecoveryType:"recovery-full"}}, EjectNodes:[]service.NodeInfo{}}
362274:2021-04-27T22:29:49.092-07:00 [Info] ServiceMgr::PrepareTopologyChange failed: Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a5_users_search:2021-04-27 22:28:39.9396245 -0700 PDT m=+2532.217998501]
404409:2021-04-27T22:31:05.991-07:00 [Info] ServiceMgr::PrepareTopologyChange change: service.TopologyChange{ID:"612e12abe213e4a5e75b45daa366c8f2", CurrentTopologyRev:service.Revision(nil), Type:"topology-change-rebalance", KeepNodes:[]struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{NodeInfo:service.NodeInfo{NodeID:"c2e16dfe88967da8a18a2a76462c6b93", Priority:0, Opaque:interface {}(nil)}, RecoveryType:"recovery-full"}}, EjectNodes:[]service.NodeInfo{}}
404411:2021-04-27T22:31:05.991-07:00 [Info] ServiceMgr::PrepareTopologyChange failed: Some apps are deploying or resuming on nodeId: c2e16dfe88967da8a18a2a76462c6b93 Apps: map[a3_users_search:2021-04-27 22:30:20.5472091 -0700 PDT m=+2632.828999101]
a5_user_search deploying:
ns_server.eventing.log:354816:2021-04-27T22:28:39.919-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Path => /eventing/appsettings/a5_users_search value => map[string]interface {}{"dcp_stream_boundary":"everything", "deployment_status":true, "description":"", "execution_timeout":60, "language_compatibility":"6.6.2", "log_level":"INFO", "n1ql_consistency":"none", "num_timer_partitions":128, "processing_status":true, "timer_context_size":1024, "user_prefix":"eventing", "worker_count":1}
ns_server.eventing.log:354817:2021-04-27T22:28:39.919-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a5_users_search current state: 1 requested status for deployment: true processing: true
ns_server.eventing.log:354818:2021-04-27T22:28:39.919-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a5_users_search begin deployment process
ns_server.eventing.log:354819:2021-04-27T22:28:39.939-07:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: a5_users_search adding to bootstrap list
ns_server.eventing.log:397663:2021-04-27T22:30:20.511-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a5_users_search bootstrap finished
ns_server.eventing.log:397667:2021-04-27T22:30:20.511-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a5_users_search added to deployed apps map
ns_server.eventing.log:397669:2021-04-27T22:30:20.511-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a5_users_search deleting from bootstrap list
ns_server.eventing.log:397670:2021-04-27T22:30:20.511-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a5_users_search deployment done
a3_user_search deploying:
ns_server.eventing.log:397677:2021-04-27T22:30:20.518-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Path => /eventing/appsettings/a3_users_search value => map[string]interface {}{"dcp_stream_boundary":"everything", "deployment_status":true, "description":"", "execution_timeout":60, "language_compatibility":"6.6.2", "log_level":"INFO", "n1ql_consistency":"none", "num_timer_partitions":128, "processing_status":true, "timer_context_size":1024, "user_prefix":"eventing", "worker_count":1}
ns_server.eventing.log:397678:2021-04-27T22:30:20.518-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a3_users_search current state: 1 requested status for deployment: true processing: true
ns_server.eventing.log:397679:2021-04-27T22:30:20.518-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a3_users_search begin deployment process
ns_server.eventing.log:397691:2021-04-27T22:30:20.547-07:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: a3_users_search adding to bootstrap list
ns_server.eventing.log:509222:2021-04-27T22:31:47.230-07:00 [Info] SuperSupervisor::SettingsChangeCallback [4] Function: a3_users_search bootstrap finished
ns_server.eventing.log:509226:2021-04-27T22:31:47.230-07:00 [Info] SuperSupervisor::SettingsChangeCallback [4] Function: a3_users_search added to deployed apps map
ns_server.eventing.log:509228:2021-04-27T22:31:47.230-07:00 [Info] SuperSupervisor::SettingsChangeCallback [4] Function: a3_users_search deleting from bootstrap list
ns_server.eventing.log:509229:2021-04-27T22:31:47.230-07:00 [Info] SuperSupervisor::SettingsChangeCallback [4] Function: a3_users_search deployment done
Ashwin Govindarajulu this is expected behaviour to reject rebalance when eventing function is deploying/pausing.