Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Test Blocker
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: tools
Labels:
- build-sanity
- functional-test

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

Issue observed in: 7.0.0-4025

Test:
./testrunner -i node_conf.ini -p get-cbcollect-info=True,get-couch-dbinfo=True,skip_cleanup=False,skip_log_scan=False -t ent_backup_restore.enterprise_backup_restore_test.EnterpriseBackupRestoreTest.test_backup_restore_sanity,items=1000

From diag.log:

020-12-14T11:05:49.498-08:00, memcached_config_mgr:0:info:message(ns_1@172.23.105.153) - Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]

2020-12-14T11:05:50.109-08:00, ns_orchestrator:0:info:message(ns_1@172.23.105.151) - Starting rebalance, KeepNodes = ['ns_1@172.23.105.151','ns_1@172.23.105.153'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 4f9a3be20d968903fc7ea27ccb5b3b56

2020-12-14T11:05:52.360-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.105.151) - Rebalance exited with reason {{badmatch,failed},

                              [{ns_rebalancer,rebalance_body,5,

                                   [{file,"src/ns_rebalancer.erl"},

                                    {line,532}]},

                               {async,'-async_init/4-fun-1-',3,

                                   [{file,"src/async.erl"},{line,197}]}]}.

Rebalance Operation Id = 4f9a3be20d968903fc7ea27ccb5b3b56

2020-12-14T11:06:00.305-08:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.105.151) - Client-side error-report for user "<ud>Administrator</ud>" on node 'ns_1@172.23.105.151':

User-Agent:Python-httplib2/0.13.1 (gzip)

Starting rebalance from test, ejected nodes ['ns_1@172.23.105.153']

2020-12-14T11:06:00.313-08:00, ns_orchestrator:0:info:message(ns_1@172.23.105.151) - Starting rebalance, KeepNodes = ['ns_1@172.23.105.151'], EjectNodes = ['ns_1@172.23.105.153'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 0045d716e47be11e253d0725577b86cf

2020-12-14T11:06:10.439-08:00, ns_cluster:1:info:message(ns_1@172.23.105.153) - Node 'ns_1@172.23.105.153' is leaving cluster.

2020-12-14T11:06:10.447-08:00, ns_orchestrator:0:info:message(ns_1@172.23.105.151) - Rebalance completed successfully.

Rebalance Operation Id = 0045d716e47be11e253d0725577b86cf

2020-12-14T11:06:10.644-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.105.151) - Node 'ns_1@172.23.105.151' saw that node 'ns_1@172.23.105.153' went down. Details: [{nodedown_reason,

                                                                                     connection_closed}]

2020-12-14T11:07:01.831-08:00, ns_cookie_manager:3:info:cookie update(ns_1@172.23.105.151) - Initial otp cookie generated: {sanitized,

                                  <<"VOL7MTlDuCj/QIAJDPiYpZNWoVQkVkznD/h9HETT13E=">>}

2020-12-14T11:07:01.957-08:00, menelaus_sup:1:info:web start ok(ns_1@172.23.105.151) - Couchbase Server has started on web port 8091 on node 'ns_1@172.23.105.151'. Version: "7.0.0-4025-enterprise".

2020-12-14T11:07:02.094-08:00, mb_master:0:info:message(ns_1@172.23.105.151) - I'm the only node, so I'm the master.

2020-12-14T11:07:02.170-08:00, compat_mode_manager:0:warning:message(ns_1@172.23.105.151) - Changed cluster compat mode from undefined to [7,0]

2020-12-14T11:07:02.203-08:00, auto_failover:0:info:message(ns_1@172.23.105.151) - Enabled auto-failover with timeout 120 and max count 1

2020-12-14T11:07:08.878-08:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.105.151) - Client-side error-report for user "<ud>Administrator</ud>" on node 'ns_1@172.23.105.151':

User-Agent:Python-httplib2/0.13.1 (gzip)

2020-12-14 11:07:08.856707 : test_backup_restore_sanity finished

-------------------------------

per_node_processes('ns_1@172.23.105.151') =

     {<0.5656.0>,

      [{backtrace,

           [<<"Program counter: 0x00007f261dcf6ff0 (diag_handler:'-collect_diag_per_node/1-fun-1-'/2 + 112)">>,

            <<"CP: 0x0000000000000000 (invalid)">>,<<>>,

            <<"0x00007f25d7f7a470 Return addr 0x00007f26653d6390 (proc_lib:init_p/3 + 200)">>,

            <<"y(0)     <0.5655.0>">>,<<>>,

            <<"0x00007f25d7f7a480 Return addr 0x0000000000986fa8 (<terminate process normally>)">>,

            <<"y(0)     []">>,<<"y(1)     []">>,

            <<"y(2)     Catch 0x00007f26653d63a0 (proc_lib:init_p/3 + 216)">>,

            <<>>]},

       {messages,[]},

       {dictionary,

           [{'$ancestors',[<0.5655.0>]},

            {'$initial_call',

                {diag_handler,'-collect_diag_per_node/1-fun-1-',0}}]},

       {registered_name,[]},

       {status,waiting},

       {initial_call,{proc_lib,init_p,3}},

       {error_handler,error_handler},

       {garbage_collection,

           [{max_heap_size,#{error_logger => true,kill => true,size => 0}},

            {min_bin_vheap_size,46422},

            {min_heap_size,233},

            {fullsweep_after,512},

            {minor_gcs,0}]},

       {garbage_collection_info,

           [{old_heap_block_size,0},

            {heap_block_size,233},

            {mbuf_size,0},

            {recent_size,0},

            {stack_size,6},

            {old_heap_size,0},

            {heap_size,32},

            {bin_vheap_size,0},

            {bin_vheap_block_size,46422},

            {bin_old_vheap_size,0},

            {bin_old_vheap_block_size,46422}]},

       {links,[<0.5655.0>]},

       {monitors,[{process,<0.339.0>},{process,<0.5655.0>}]},

       {monitored_by,[]},

       {memory,2860},

       {message_queue_len,0},

       {reductions,13},

       {trap_exit,false},

       {current_location,

           {diag_handler,'-collect_diag_per_node/1-fun-1-',2,

               [{file,"src/diag_handler.erl"},{line,228}]}}]}

We added backup service to build sanity and started seeing this failure - attaching logs

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

172.23.105.151-20201214-1107-diag.zip
48.34 MB
14/Dec/20 4:54 PM
172.23.105.151-20201218-1028-diag.zip
49.70 MB
18/Dec/20 1:56 PM
172.23.105.153-20201214-1109-diag.zip
47.06 MB
14/Dec/20 4:53 PM
172.23.105.153-20201218-1030-diag.zip
49.07 MB
18/Dec/20 1:56 PM
172.23.107.160-20201214-1112-diag.zip
36.64 MB
14/Dec/20 4:53 PM
172.23.107.160-20201218-1033-diag.zip
28.00 MB
18/Dec/20 1:55 PM
172.23.107.184-20201214-1114-diag.zip
38.98 MB
14/Dec/20 4:53 PM
172.23.107.184-20201218-1035-diag.zip
40.69 MB
18/Dec/20 1:56 PM
consoleFull.rtf
91 kB
14/Dec/20 4:52 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-43290
#	Subject	Branch	Project	Status	CR	V
142156,5	MB-43290 Allow cancelling of ongoing rebalances	master	cbbs	Status: MERGED	+2	+1
142419,2	MB-43290 Read correct channel to wait for task to be cancelled	master	cbbs	Status: MERGED	+2	+1
143133,6	MB-43290 Cancel in a more timely matter	master	cbbs	Status: MERGED	+2	+1

Activity

People

Assignee:: Arunkumar Senthilnathan (Inactive)

Reporter:: Arunkumar Senthilnathan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Dec/20 4:54 PM

Updated:: 17/Jun/21 3:31 PM

Resolved:: 14/Jan/21 5:41 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

MB-43290 Allow cancelling of ongoing rebalances: Gerrit Review:

MB-43290 Read correct channel to wait for task to be cancelled: Gerrit Review:

MB-43290 Cancel in a more timely matter: Gerrit Review:

Rebalance failure observed in build sanity after backup service is added

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty