Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
Cheshire-Cat
-
Triaged
-
Centos 64-bit
-
-
1
-
Unknown
Description
Build: 7.0.0-4554
Unexpected server error is found during rebalance after test execution.
Corresponding test log fragment is here:
[2021-02-27 03:18:36,683] - [cluster_helper] [140676780521216] - INFO - rebalancing all nodes in order to remove nodes |
[2021-02-27 03:18:36,685] - [rest_client] [140676780521216] - INFO - rebalance params : {'knownNodes': 'ns_1@172.23.106.20,ns_1@172.23.121.75,ns_1@172.23.97.222', 'ejectedNodes': 'ns_1@172.23.106.20,ns_1@172.23.97.222', 'user': 'Administrator', 'password': 'password'} |
[2021-02-27 03:18:36,713] - [rest_client] [140676780521216] - INFO - rebalance operation started |
[2021-02-27 03:18:36,717] - [rest_client] [140676780521216] - INFO - rebalance percentage : 0.00 % |
[2021-02-27 03:18:46,730] - [rest_client] [140676780521216] - INFO - rebalance percentage : 0.00 % |
[2021-02-27 03:18:56,734] - [rest_client] [140676780521216] - INFO - rebalance percentage : 0.00 % |
[2021-02-27 03:19:06,748] - [rest_client] [140676780521216] - ERROR - {'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed |
[2021-02-27 03:19:06,758] - [rest_client] [140676780521216] - INFO - Latest logs from UI on 172.23.121.75: |
[2021-02-27 03:19:06,758] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1614424746687, 'shortText': 'message', 'text': "Rebalance exited with reason {prepare_rebalance_failed,\n {error,\n {failed_nodes,\n [{'ns_1@172.23.97.222',{error,timeout}}]}}}.\nRebalance Operation Id = e19e8d25353baa670bdc6d422c8b5212", 'serverTime': '2021-02-27T03:19:06.687Z'} |
[2021-02-27 03:19:06,758] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1614424716686, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.121.75'], EjectNodes = ['ns_1@172.23.106.20',\n 'ns_1@172.23.97.222'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = e19e8d25353baa670bdc6d422c8b5212", 'serverTime': '2021-02-27T03:18:36.686Z'} |
[2021-02-27 03:19:06,758] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'warning', 'code': 102, 'module': 'menelaus_web', 'tstamp': 1614424716683, 'shortText': 'client-side error report', 'text': 'Client-side error-report for user "Administrator" on node \'ns_1@172.23.121.75\':\nUser-Agent:Python-httplib2/0.13.1 (gzip)\nStarting rebalance from test, ejected nodes [\'ns_1@172.23.106.20\', \'ns_1@172.23.97.222\']', 'serverTime': '2021-02-27T03:18:36.683Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'info', 'code': 11, 'module': 'menelaus_web', 'tstamp': 1614424716659, 'shortText': 'message', 'text': 'Deleted bucket "default"\n', 'serverTime': '2021-02-27T03:18:36.659Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1614424715998, 'shortText': 'message', 'text': 'Shutting down bucket "default" on \'ns_1@172.23.121.75\' for deletion', 'serverTime': '2021-02-27T03:18:35.998Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.106.20', 'type': 'info', 'code': 1, 'module': 'menelaus_web_sup', 'tstamp': 1614424661647, 'shortText': 'web start ok', 'text': 'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.106.20\'. Version: "7.0.0-4554-enterprise".', 'serverTime': '2021-02-27T03:17:41.647Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1614424661473, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.121.75' saw that node 'ns_1@172.23.106.20' came up. Tags: []", 'serverTime': '2021-02-27T03:17:41.473Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1614424624260, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.121.75' saw that node 'ns_1@172.23.106.20' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2021-02-27T03:17:04.260Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1614424533594, 'shortText': 'message', 'text': 'Bucket "default" loaded on node \'ns_1@172.23.121.75\' in 0 seconds.', 'serverTime': '2021-02-27T03:15:33.594Z'} |
[2021-02-27 03:19:06,759] - [rest_client] [140676780521216] - ERROR - {'node': 'ns_1@172.23.121.75', 'type': 'info', 'code': 12, 'module': 'menelaus_web', 'tstamp': 1614424533507, 'shortText': 'message', 'text': 'Created bucket "default" of type: couchbase\n[{num_replicas,1},\n {replica_index,true},\n {ram_quota,1727004672},\n {durability_min_level,none},\n {pitr_enabled,false},\n {pitr_granularity,600},\n {pitr_max_history_age,86400},\n {flush_enabled,true},\n {num_threads,3},\n {eviction_policy,value_only},\n {conflict_resolution_type,seqno},\n {storage_mode,couchstore},\n {max_ttl,0},\n {compression_mode,passive}]', 'serverTime': '2021-02-27T03:15:33.507Z'} |
[2021-02-27 03:19:06,759] - [fts_base] [140676780521216] - INFO - closing all ssh connections |
[2021-02-27 03:19:06,760] - [fts_base] [140676780521216] - INFO - closing all memcached connections |
Have attached collected logs, corresponding test execution logs are here:
[2021-03-04 17:17:12,520] - [rest_client:1714] INFO - rebalance params : {'knownNodes': 'ns_1@172.23.107.35,ns_1@172.23.107.56,ns_1@172.23.107.75', 'ejectedNodes': 'ns_1@172.23.107.75', 'user': 'Administrator', 'password': 'password'} |
[2021-03-04 17:17:12,596] - [rest_client:1719] INFO - rebalance operation started |
[2021-03-04 17:17:12,643] - [rest_client:1881] INFO - rebalance percentage : 0.00 % |
[2021-03-04 17:17:12,644] - [task:844] INFO - Rebalance - status: running, progress: 0.00% |
[2021-03-04 17:17:16,677] - [fts_base:654] INFO - Killing cbft on server: ip:172.23.107.56 port:8091 ssh_username:root |
[2021-03-04 17:17:16,682] - [remote_util:298] INFO - SSH Connecting to 172.23.107.56 with username:root, attempt#1 of 5 |
[2021-03-04 17:17:17,115] - [remote_util:336] INFO - SSH Connected to 172.23.107.56 as root |
[2021-03-04 17:17:17,708] - [remote_util:3567] INFO - os_distro: CentOS, os_version: centos 7, is_linux_distro: True |
[2021-03-04 17:17:18,659] - [remote_util:3723] INFO - extract_remote_info-->distribution_type: CentOS, distribution_version: centos 7 |
[2021-03-04 17:17:18,661] - [remote_util:3407] INFO - running command.raw on 172.23.107.56: pkill -e cbft |
[2021-03-04 17:17:18,834] - [remote_util:3455] INFO - command executed successfully with root |
[2021-03-04 17:17:18,835] - [remote_util:3240] INFO - cbft killed (pid 10984) |
[2021-03-04 17:17:22,787] - [rest_client:1864] ERROR - {'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed |
[2021-03-04 17:17:22,867] - [rest_client:3765] INFO - Latest logs from UI on 172.23.107.35: |
[2021-03-04 17:17:22,868] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'auto_rebalance', 'tstamp': 1614907038877, 'shortText': 'message', 'text': 'Rebalance with Id 4e2741c980fc7af4c748558d26defbc4 will be retried after 60 seconds. Retry attempt number 1', 'serverTime': '2021-03-04T17:17:18.877Z'} |
[2021-03-04 17:17:22,869] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1614907038875, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,fts,\n {agent_died,<27720.2739.1>,\n {lost_connection,shutdown}}}.\nRebalance Operation Id = 4e2741c980fc7af4c748558d26defbc4', 'serverTime': '2021-03-04T17:17:18.875Z'} |
[2021-03-04 17:17:22,870] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.56', 'type': 'info', 'code': 0, 'module': 'ns_log', 'tstamp': 1614907038868, 'shortText': 'message', 'text': "Service 'fts' exited with status 143. Restarting. Messages:\n2021-03-04T17:17:13.455-08:00 [INFO] (GOCBCORE) Creating new agent: &{MemdAddrs:[] HTTPAddrs:[127.0.0.1:8091] BucketName:default UserAgent:default_index_1_2465dc3f9bb14079_13aa53f3 UseTLS:false NetworkType: Auth:0x1d09b50 TLSRootCAProvider:<nil> UseMutationTokens:false UseCompression:false UseDurations:false DisableDecompression:false UseOutOfOrderResponses:false DisableXErrors:false DisableJSONHello:false DisableSyncReplicationHello:false UseCollections:true CompressionMinSize:0 CompressionMinRatio:0 HTTPRedialPeriod:0s HTTPRetryDelay:0s CccpMaxWait:0s CccpPollPeriod:0s ConnectTimeout:1m0s KVConnectTimeout:7s KvPoolSize:0 MaxQueueSize:0 HTTPMaxIdleConns:0 HTTPMaxIdleConnsPerHost:0 HTTPIdleConnectionTimeout:0s Tracer:<nil> NoRootTraceSpans:false DefaultRetryStrategy:<nil> CircuitBreakerConfig:{Enabled:false VolumeThreshold:0 ErrorThresholdPercentage:0 SleepWindow:0s RollingWindow:0s CompletionCallback:<nil> CanaryTimeout:0s} UseZombieLogger:false ZombieLoggerInterval:0s ZombieLoggerSampleSize:0 AuthMechanisms:[]}\n2021-03-04T17:17:13.525-08:00 [INFO] feed_dcp_gocbcore: newGocbcoreDCPFeed, name: default_index_1_2465dc3f9bb14079_13aa53f3, indexName: default_index_1, server: http://127.0.0.1:8091, bucketName: default, bucketUUID: abf3fb490b940a4c88f93e6031db8e39\n2021-03-04T17:17:13.525-08:00 [INFO] feed_dcp_gocbcore: Start, name: default_index_1_2465dc3f9bb14079_13aa53f3, num streams: 171, manifestUID: 0, streamOptions: {FilterOptions: &{ScopeID:0 CollectionIDs:[0]}, StreamOptions: &{StreamID:3}}\n2021-03-04T17:17:13.528-08:00 [INFO] janitor: awakes, op: kick, msg: cfg changed, key: curMetaKvPlanKey\n2021-03-04T17:17:13.559-08:00 [INFO] janitor: pindexes to remove: 0\n2021-03-04T17:17:13.559-08:00 [INFO] janitor: pindexes to add: 0\n2021-03-04T17:17:13.559-08:00 [INFO] janitor: pindexes to restart: 0\n2021-03-04T17:17:13.560-08:00 [INFO] janitor: feeds to remove: 0\n2021-03-04T17:17:13.560-08:00 [INFO] janitor: feeds to add: 0\n", 'serverTime': '2021-03-04T17:17:18.868Z'} |
[2021-03-04 17:17:22,871] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1614907033048, 'shortText': 'message', 'text': 'Bucket "default" rebalance appears to be swap rebalance', 'serverTime': '2021-03-04T17:17:13.048Z'} |
[2021-03-04 17:17:22,871] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1614907032874, 'shortText': 'message', 'text': 'Started rebalancing bucket default', 'serverTime': '2021-03-04T17:17:12.874Z'} |
[2021-03-04 17:17:22,872] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1614907032559, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.107.35','ns_1@172.23.107.56'], EjectNodes = ['ns_1@172.23.107.75'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 4e2741c980fc7af4c748558d26defbc4", 'serverTime': '2021-03-04T17:17:12.559Z'} |
[2021-03-04 17:17:22,873] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1614906881874, 'shortText': 'message', 'text': 'Bucket "default" loaded on node \'ns_1@172.23.107.35\' in 0 seconds.', 'serverTime': '2021-03-04T17:14:41.874Z'} |
[2021-03-04 17:17:22,874] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 12, 'module': 'menelaus_web', 'tstamp': 1614906881762, 'shortText': 'message', 'text': 'Created bucket "default" of type: couchbase\n[{num_replicas,1},\n {replica_index,true},\n {ram_quota,209715200},\n {durability_min_level,none},\n {pitr_enabled,false},\n {pitr_granularity,600},\n {pitr_max_history_age,86400},\n {flush_enabled,true},\n {num_threads,3},\n {eviction_policy,value_only},\n {conflict_resolution_type,seqno},\n {storage_mode,couchstore},\n {max_ttl,0},\n {compression_mode,passive}]', 'serverTime': '2021-03-04T17:14:41.762Z'} |
[2021-03-04 17:17:22,874] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1614906869020, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 0f35eb393a53117d0e179e70e256930c', 'serverTime': '2021-03-04T17:14:29.020Z'} |
[2021-03-04 17:17:22,875] - [rest_client:3766] ERROR - {'node': 'ns_1@172.23.107.35', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1614906868130, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.107.35','ns_1@172.23.107.56',\n 'ns_1@172.23.107.75'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 0f35eb393a53117d0e179e70e256930c", 'serverTime': '2021-03-04T17:14:28.130Z'} |
Collected logs can be found here:
https://cb-jira.s3.us-east-2.amazonaws.com/logs/Evgeny+Makarenko/collectinfo-2021-03-05T012013-ns_1%40172.23.107.35.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/Evgeny+Makarenko/collectinfo-2021-03-05T012013-ns_1%40172.23.107.56.zip