Details
-
Bug
-
Resolution: Fixed
-
Major
-
Cheshire-Cat
-
Untriaged
-
1
-
Unknown
Description
Build: 7.0.0-2278
Test suite: centos-fts_stabletopologyP0
http://qa.sc.couchbase.com/job/test_suite_executor/222681/console
Note that this is happening only with upside_down indexes and not with scorch indexes.
Test:
fts.stable_topology_fts.StableTopFTS:
create_simple_default_index,items=1000,cluster=D,F,F,standard_buckets=3,sasl_buckets=3,index_per_bucket=3,GROUP=P0,cluster=D+F,disable_HTP=True,get-cbcollect-info=False,index_type=upside_down,fts_quota=750,GROUP=P0
Steps in the test:
- Create a cluster with n1:fts+kv+index+n1ql and n2:fts
- Create default, sasl_bucket_1, sasl_bucket_2, sasl_bucket_3, standard_bucket_1, standard_bucket_2, standard_bucket_3
- Create fts indexes : default_index_1, default_index_2, default_index_3, sasl_bucket_1_index_1, sasl_bucket_1_index_2, sasl_bucket_1_index_3, sasl_bucket_2_index_1, sasl_bucket_2_index_2, sasl_bucket_2_index_3, sasl_bucket_3_index_1, sasl_bucket_3_index_2, sasl_bucket_3_index_3,standard_bucket_1_index_1,standard_bucket_1_index_2,standard_bucket_1_index_3,standard_bucket_2_index_1,standard_bucket_2_index_2,standard_bucket_2_index_3,standard_bucket_3_index_1,standard_bucket_3_index_2,standard_bucket_3_index_3
- * Load all the buckets with 1000 docs and wait for all the indexes to complete
- delete all the indexes one after the other and wait for index delete to complete
- delete all the buckets created and wait for delete bucket to complete
- rebalancing all nodes in order to remove nodes. We see below error:
2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] rebalancing all nodes in order to remove nodes
2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : {'knownNodes': 'ns_1@172.23.120.93,ns_1@172.23.120.95', 'ejectedNodes': 'ns_1@172.23.120.93', 'user': 'Administrator', 'password': 'password'}
2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started
2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 0.00 %
2020-06-10 18:46:13 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %
2020-06-10 18:46:23 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %
2020-06-10 18:46:33 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %
2020-06-10 18:46:43 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %
2020-06-10 18:46:53 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %
2020-06-10 18:47:03 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %
2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] {'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed
2020-06-10 18:47:13 | INFO | MainProcess | test_thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.120.95:
2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.120.95', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1591840023463, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,fts,\n {agent_died,<0.2923.0>,\n {linked_process_died,<0.3707.0>,\n {timeout,\n {gen_server,call,\n [<0.2973.0>,\n {call,"ServiceAPI.GetTaskList",\n #Fun<json_rpc_connection.0.102434519>},\n 60000]}}}}}.\nRebalance Operation Id = 3dea98d7db7f7f69cc8c82ba52151df2', 'serverTime': '2020-06-10T18:47:03.463Z'}
2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.120.95', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1591839963234, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.120.95'], EjectNodes = ['ns_1@172.23.120.93'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 3dea98d7db7f7f69cc8c82ba52151df2", 'serverTime': '2020-06-10T18:46:03.234Z'}
2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.120.95', 'type': 'warning', 'code': 102, 'module': 'menelaus_web', 'tstamp': 1591839963227, 'shortText': 'client-side error report', 'text': 'Client-side error-report for user "Administrator" on node \'ns_1@172.23.120.95\':\nUser-Agent:Python-httplib2/0.13.1 (gzip)\nStarting rebalance from test, ejected nodes [\'ns_1@172.23.120.93\']', 'serverTime': '2020-06-10T18:46:03.227Z'}
Log snippet:
Starting rebalance from test, ejected nodes ['ns_1@172.23.121.66']
|
2020-06-10T19:02:18.533-07:00, ns_orchestrator:0:info:message(ns_1@172.23.121.65) - Starting rebalance, KeepNodes = ['ns_1@172.23.121.65'], EjectNodes = ['ns_1@172.23.121.66'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 29c8525859119169f864efdb41b2b4d0
|
2020-06-10T19:03:18.636-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.121.65) - Rebalance exited with reason {service_rebalance_failed,fts,
|
{agent_died,<0.2128.0>,
|
{linked_process_died,<0.2675.0>,
|
{timeout,
|
{gen_server,call,
|
[<0.2162.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.102434519>},
|
60000]}}}}}.
|
Rebalance Operation Id = 29c8525859119169f864efdb41b2b4d0
|
-------------------------------
|
|
|
per_node_processes('ns_1@172.23.121.65') =
|
{<0.25311.13>,
|
[{backtrace,
|
[<<"Program counter: 0x00007f86bcb70178 (diag_handler:'-collect_diag_per_node/1-fun-1-'/2 + 136)">>,
|
<<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
|
<<"0x00007f86b2676048 Return addr 0x00007f8741d5fca0 (proc_lib:init_p/3 + 288)">>,
|
<<"y(0) <0.25310.13>">>,<<>>,
|
<<"0x00007f86b2676058 Return addr 0x0000000000942608 (<terminate process normally>)">>,
|
<<"y(0) Catch 0x00007f8741d5fcc0 (proc_lib:init_p/3 + 320)">>,
|
<<"y(1) []">>,<<>>]},
|
{messages,[]},
|
{dictionary,
|
[{'$initial_call',
|
{diag_handler,'-collect_diag_per_node/1-fun-1-',0}},
|
{'$ancestors',[<0.25310.13>]}]},
|
{registered_name,[]},
|
{status,waiting},
|
{initial_call,{proc_lib,init_p,3}},
|
{error_handler,error_handler},
|
{garbage_collection,
|
[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
|
{min_bin_vheap_size,46422},
|
{min_heap_size,233},
|
{fullsweep_after,512},
|
{minor_gcs,0}]},
|
{garbage_collection_info,
|
[{old_heap_block_size,0},
|
{heap_block_size,233},
|
{mbuf_size,0},
|
{recent_size,0},
|
{stack_size,5},
|
{old_heap_size,0},
|
{heap_size,32},
|
{bin_vheap_size,0},
|
{bin_vheap_block_size,46422},
|
{bin_old_vheap_size,0},
|
{bin_old_vheap_block_size,46422}]},
|
{links,[<0.25310.13>]},
|
{monitors,[{process,<0.290.0>},{process,<0.25310.13>}]},
|
{monitored_by,[]},
|
{memory,2888},
|
{message_queue_len,0},
|
{reductions,9},
|
{trap_exit,false},
|
{current_location,
|
{diag_handler,'-collect_diag_per_node/1-fun-1-',2,
|
[{file,"src/diag_handler.erl"},{line,238}]}}]}
|
{<0.25310.13>,
|
[{backtrace,
|
[<<"Program counter: 0x00007f873a644ee0 (unknown function)"
|
Attachments
Issue Links
- is caused by
-
GOCBC-868 [gocbcore.v9] If bucket isn't available CreateAgent needs to fail immediately
- Resolved