Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: fts
Labels:
- functional-test
- regression

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

Build: 7.0.0-2278

Test suite: centos-fts_stabletopologyP0
http://qa.sc.couchbase.com/job/test_suite_executor/222681/console

Note that this is happening only with upside_down indexes and not with scorch indexes.

Test:
fts.stable_topology_fts.StableTopFTS:
create_simple_default_index,items=1000,cluster=D,F,F,standard_buckets=3,sasl_buckets=3,index_per_bucket=3,GROUP=P0,cluster=D+F,disable_HTP=True,get-cbcollect-info=False,index_type=upside_down,fts_quota=750,GROUP=P0

Steps in the test:

Create a cluster with n1:fts+kv+index+n1ql and n2:fts
Create default, sasl_bucket_1, sasl_bucket_2, sasl_bucket_3, standard_bucket_1, standard_bucket_2, standard_bucket_3
Create fts indexes : default_index_1, default_index_2, default_index_3, sasl_bucket_1_index_1, sasl_bucket_1_index_2, sasl_bucket_1_index_3, sasl_bucket_2_index_1, sasl_bucket_2_index_2, sasl_bucket_2_index_3, sasl_bucket_3_index_1, sasl_bucket_3_index_2, sasl_bucket_3_index_3,standard_bucket_1_index_1,standard_bucket_1_index_2,standard_bucket_1_index_3,standard_bucket_2_index_1,standard_bucket_2_index_2,standard_bucket_2_index_3,standard_bucket_3_index_1,standard_bucket_3_index_2,standard_bucket_3_index_3
* Load all the buckets with 1000 docs and wait for all the indexes to complete
delete all the indexes one after the other and wait for index delete to complete
delete all the buckets created and wait for delete bucket to complete

rebalancing all nodes in order to remove nodes. We see below error:

2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] rebalancing all nodes in order to remove nodes

2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : {'knownNodes': 'ns_1@172.23.120.93,ns_1@172.23.120.95', 'ejectedNodes': 'ns_1@172.23.120.93', 'user': 'Administrator', 'password': 'password'}

2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started

2020-06-10 18:46:03 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 0.00 %

2020-06-10 18:46:13 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %

2020-06-10 18:46:23 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %

2020-06-10 18:46:33 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %

2020-06-10 18:46:43 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %

2020-06-10 18:46:53 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %

2020-06-10 18:47:03 | INFO | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 50.00 %

2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client._rebalance_status_and_progress] {'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed

2020-06-10 18:47:13 | INFO | MainProcess | test_thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.120.95:

2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.120.95', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1591840023463, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,fts,\n                              {agent_died,<0.2923.0>,\n                               {linked_process_died,<0.3707.0>,\n                                {timeout,\n                                 {gen_server,call,\n                                  [<0.2973.0>,\n                                   {call,"ServiceAPI.GetTaskList",\n                                    #Fun<json_rpc_connection.0.102434519>},\n                                   60000]}}}}}.\nRebalance Operation Id = 3dea98d7db7f7f69cc8c82ba52151df2', 'serverTime': '2020-06-10T18:47:03.463Z'}

2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.120.95', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1591839963234, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.120.95'], EjectNodes = ['ns_1@172.23.120.93'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 3dea98d7db7f7f69cc8c82ba52151df2", 'serverTime': '2020-06-10T18:46:03.234Z'}

2020-06-10 18:47:13 | ERROR | MainProcess | test_thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.120.95', 'type': 'warning', 'code': 102, 'module': 'menelaus_web', 'tstamp': 1591839963227, 'shortText': 'client-side error report', 'text': 'Client-side error-report for user "Administrator" on node \'ns_1@172.23.120.95\':\nUser-Agent:Python-httplib2/0.13.1 (gzip)\nStarting rebalance from test, ejected nodes [\'ns_1@172.23.120.93\']', 'serverTime': '2020-06-10T18:46:03.227Z'}

Log snippet:

Starting rebalance from test, ejected nodes ['ns_1@172.23.121.66']

2020-06-10T19:02:18.533-07:00, ns_orchestrator:0:info:message(ns_1@172.23.121.65) - Starting rebalance, KeepNodes = ['ns_1@172.23.121.65'], EjectNodes = ['ns_1@172.23.121.66'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 29c8525859119169f864efdb41b2b4d0

2020-06-10T19:03:18.636-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.121.65) - Rebalance exited with reason {service_rebalance_failed,fts,

                              {agent_died,<0.2128.0>,

                               {linked_process_died,<0.2675.0>,

                                {timeout,

                                 {gen_server,call,

                                  [<0.2162.0>,

                                   {call,"ServiceAPI.GetTaskList",

                                    #Fun<json_rpc_connection.0.102434519>},

                                   60000]}}}}}.

Rebalance Operation Id = 29c8525859119169f864efdb41b2b4d0

-------------------------------

per_node_processes('ns_1@172.23.121.65') =

     {<0.25311.13>,

      [{backtrace,

           [<<"Program counter: 0x00007f86bcb70178 (diag_handler:'-collect_diag_per_node/1-fun-1-'/2 + 136)">>,

            <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,

            <<"0x00007f86b2676048 Return addr 0x00007f8741d5fca0 (proc_lib:init_p/3 + 288)">>,

            <<"y(0)     <0.25310.13>">>,<<>>,

            <<"0x00007f86b2676058 Return addr 0x0000000000942608 (<terminate process normally>)">>,

            <<"y(0)     Catch 0x00007f8741d5fcc0 (proc_lib:init_p/3 + 320)">>,

            <<"y(1)     []">>,<<>>]},

       {messages,[]},

       {dictionary,

           [{'$initial_call',

                {diag_handler,'-collect_diag_per_node/1-fun-1-',0}},

            {'$ancestors',[<0.25310.13>]}]},

       {registered_name,[]},

       {status,waiting},

       {initial_call,{proc_lib,init_p,3}},

       {error_handler,error_handler},

       {garbage_collection,

           [{max_heap_size,#{error_logger => true,kill => true,size => 0}},

            {min_bin_vheap_size,46422},

            {min_heap_size,233},

            {fullsweep_after,512},

            {minor_gcs,0}]},

       {garbage_collection_info,

           [{old_heap_block_size,0},

            {heap_block_size,233},

            {mbuf_size,0},

            {recent_size,0},

            {stack_size,5},

            {old_heap_size,0},

            {heap_size,32},

            {bin_vheap_size,0},

            {bin_vheap_block_size,46422},

            {bin_old_vheap_size,0},

            {bin_old_vheap_block_size,46422}]},

       {links,[<0.25310.13>]},

       {monitors,[{process,<0.290.0>},{process,<0.25310.13>}]},

       {monitored_by,[]},

       {memory,2888},

       {message_queue_len,0},

       {reductions,9},

       {trap_exit,false},

       {current_location,

           {diag_handler,'-collect_diag_per_node/1-fun-1-',2,

               [{file,"src/diag_handler.erl"},{line,238}]}}]}

     {<0.25310.13>,

      [{backtrace,

           [<<"Program counter: 0x00007f873a644ee0 (unknown function)"

Attachments

Issue Links

is caused by

GOCBC-868 [gocbcore.v9] If bucket isn't available CreateAgent needs to fail immediately

Resolved

FTS: Rebalance failure : service_rebalance_failed while ejecting nodes - upside_down

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty