[user:error,2021-07-18T04:42:41.940-07:00,ns_1@172.23.104.155:<0.19338.177>:ns_orchestrator:log_rebalance_completion:1460]Rebalance exited with reason {mover_crashed,
{{{function_clause,
[{compaction_daemon,
process_scheduler_message,
[compact_views,
{compaction_state,
[<<"default">>],
<25421.30680.1299>,
{state,undefined,undefined,30,
compact_views},
#Fun<compaction_daemon.2.62703019>,
{config,
{30,undefined},
{30,undefined},
undefined,false,false,
{daemon_config,30,131072,20971520}}}],
[{file,"src/compaction_daemon.erl"},
{line,1292}]},
{compaction_daemon,handle_info,2,
[{file,"src/compaction_daemon.erl"},
{line,334}]},
{gen_server,try_dispatch,4,
[{file,"gen_server.erl"},{line,616}]},
{gen_server,handle_msg,6,
[{file,"gen_server.erl"},{line,686}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,247}]}]},
{gen_server,call,
[compaction_daemon,
{uninhibit_view_compaction,<<"default">>,
#Ref<25421.2849894141.663486468.234167>},
infinity]}},
{gen_server,call,
[{'janitor_agent-default',
'ns_1@172.23.108.103'},
{if_rebalance,<0.32327.1435>,
{uninhibit_view_compaction,
#Ref<25421.2849894141.663486468.234167>}},
infinity]}}}.
Rebalance Operation Id = 91b0323dcb3c7456460b65e64e9d9d01
[ns_server:error,2021-07-18T06:05:01.738-07:00,ns_1@172.23.104.155:<0.6577.1484>:ns_config_rep:synchronize_remote:299]Failed to synchronize config to some nodes:
[{'ns_1@172.23.104.5',
{exit,
{shutdown,
{gen_server,call,
[{ns_config_rep,'ns_1@172.23.104.5'},
synchronize_everything,infinity]}}}}]
...
[ns_server:error,2021-07-18T06:05:01.739-07:00,ns_1@172.23.104.155:<0.5251.1484>:misc:try_with_maybe_ignorant_after:1427]Eating exception from ignorant after-block:
Looking at the task endpoint the totalChanges are changing for views updater(this is captured when the view updater starts). That means views are trying to index on new snapshot and old one is completed. This percentage of index build is on snapshot not on the entire bucket data.
There are "update_after" query going on in the background and also there are ops on bucket on which views are defined.
Sujay Gad could you please confirm this by stopping the update-after query or stopping ops on bucket?
Stopping update after query will run the updater every 5 seconds(updateInterval) if unindexed seq number is more than 5000(updateMinChanges)?
Ankit Prabhu
added a comment - Looking at the task endpoint the totalChanges are changing for views updater(this is captured when the view updater starts). That means views are trying to index on new snapshot and old one is completed. This percentage of index build is on snapshot not on the entire bucket data.
There are "update_after" query going on in the background and also there are ops on bucket on which views are defined.
[couchdb:info,2021-07-19T00:40:59.333-07:00,couchdb_ns_1@cb.local:couch_query_logger<0.189.0>:couch_log:info:41]Query-Volume
<ud>"/default/_design/scale/_view/padd"</ud> | internal={stale: {ok: 0, upd_after: 600, false: 0}, ResultBytes: 89727192} | external={stale: {ok: 0, upd_after: 0, false: 0}, ResultBytes: 0}
<ud>"/default/_design/scale/_view/stats"</ud> | internal={stale: {ok: 0, upd_after: 600, false: 0}, ResultBytes: 9000} | external={stale: {ok: 0, upd_after: 0, false: 0}, ResultBytes: 0}
task endpoint result:
[{"statusId":"bc622ce7c5df20c72d69a2c68eb37448","type":"rebalance","status":"notRunning","statusIsStale":false,"masterRequestTimedOut":false,"lastReportURI":"/logs/rebalanceReport?reportID=df593ad583b28b5990e143507559511d"},{"cancelURI":"/controller/cancelXDCR/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","settingsURI":"/settings/replications/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","status":"running","replicationType":"xmem","continuous":true,"filterBypassExpiry":false,"filterDeletion":false,"filterExpiration":false,"filterExpression":"","id":"48e39a1f0a3383317117c525188efd29/default/remote","pauseRequested":false,"source":"default","target":"/remoteClusters/48e39a1f0a3383317117c525188efd29/buckets/remote","type":"xdcr","recommendedRefreshPeriod":10,"changesLeft":12,"docsChecked":877750378,"docsWritten":64888858,"maxVBReps":null,"errors":[]},{"type":"indexer","recommendedRefreshPeriod":2,"status":"running","bucket":"default","designDocument":"_design/scale","changesDone":3,"totalChanges":238,"progress":1},{"type":"bucket_compaction","recommendedRefreshPeriod":2,"status":"running","bucket":"ITEM","changesDone":26,"totalChanges":159,"progress":16},{"type":"bucket_compaction","recommendedRefreshPeriod":2,"status":"running","bucket":"NEW_ORDER","changesDone":464,"totalChanges":789,"progress":58},
...
[{"statusId":"bc622ce7c5df20c72d69a2c68eb37448","type":"rebalance","status":"notRunning","statusIsStale":false,"masterRequestTimedOut":false,"lastReportURI":"/logs/rebalanceReport?reportID=df593ad583b28b5990e143507559511d"},{"cancelURI":"/controller/cancelXDCR/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","settingsURI":"/settings/replications/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","status":"running","replicationType":"xmem","continuous":true,"filterBypassExpiry":false,"filterDeletion":false,"filterExpiration":false,"filterExpression":"","id":"48e39a1f0a3383317117c525188efd29/default/remote","pauseRequested":false,"source":"default","target":"/remoteClusters/48e39a1f0a3383317117c525188efd29/buckets/remote","type":"xdcr","recommendedRefreshPeriod":10,"changesLeft":3,"docsChecked":881450287,"docsWritten":65188405,"maxVBReps":null,"errors":[]},{"type":"indexer","recommendedRefreshPeriod":2,"status":"running","bucket":"default","designDocument":"_design/scale","changesDone":3,"totalChanges":29,"progress":10},
...[{"statusId":"bc622ce7c5df20c72d69a2c68eb37448","type":"rebalance","status":"notRunning","statusIsStale":false,"masterRequestTimedOut":false,"lastReportURI":"/logs/rebalanceReport?reportID=df593ad583b28b5990e143507559511d"},{"cancelURI":"/controller/cancelXDCR/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","settingsURI":"/settings/replications/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","status":"running","replicationType":"xmem","continuous":true,"filterBypassExpiry":false,"filterDeletion":false,"filterExpiration":false,"filterExpression":"","id":"48e39a1f0a3383317117c525188efd29/default/remote","pauseRequested":false,"source":"default","target":"/remoteClusters/48e39a1f0a3383317117c525188efd29/buckets/remote","type":"xdcr","recommendedRefreshPeriod":10,"changesLeft":2,"docsChecked":882204174,"docsWritten":65196075,"maxVBReps":null,"errors":[]},{"type":"indexer","recommendedRefreshPeriod":2,"status":"running","bucket":"default","designDocument":"_design/scale","changesDone":2,"totalChanges":17,"progress":11},
...
[{"statusId":"bc622ce7c5df20c72d69a2c68eb37448","type":"rebalance","status":"notRunning","statusIsStale":false,"masterRequestTimedOut":false,"lastReportURI":"/logs/rebalanceReport?reportID=df593ad583b28b5990e143507559511d"},{"cancelURI":"/controller/cancelXDCR/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","settingsURI":"/settings/replications/48e39a1f0a3383317117c525188efd29%2Fdefault%2Fremote","status":"running","replicationType":"xmem","continuous":true,"filterBypassExpiry":false,"filterDeletion":false,"filterExpiration":false,"filterExpression":"","id":"48e39a1f0a3383317117c525188efd29/default/remote","pauseRequested":false,"source":"default","target":"/remoteClusters/48e39a1f0a3383317117c525188efd29/buckets/remote","type":"xdcr","recommendedRefreshPeriod":10,"changesLeft":2,"docsChecked":882215626,"docsWritten":65200308,"maxVBReps":null,"errors":[]},{"type":"indexer","recommendedRefreshPeriod":2,"status":"running","bucket":"default","designDocument":"_design/scale","changesDone":2,"totalChanges":16,"progress":12},
Sujay Gad could you please confirm this by stopping the update-after query or stopping ops on bucket?
Stopping update after query will run the updater every 5 seconds(updateInterval) if unindexed seq number is more than 5000(updateMinChanges)?
Initially rebalance failed due to MB-47477. Issue of view index getting stuck is not observed after stopping ops on default bucket.
Sujay Gad
added a comment - - edited Initially rebalance failed due to MB-47477 . Issue of view index getting stuck is not observed after stopping ops on default bucket.
Rebalance failed
[user:error,2021-07-18T04:42:41.940-07:00,ns_1@172.23.104.155:<0.19338.177>:ns_orchestrator:log_rebalance_completion:1460]Rebalance exited with reason {mover_crashed,
{{{function_clause,
[{compaction_daemon,
process_scheduler_message,
[compact_views,
{compaction_state,
[<<"default">>],
<25421.30680.1299>,
{state,undefined,undefined,30,
compact_views},
#Fun<compaction_daemon.2.62703019>,
{config,
{30,undefined},
{30,undefined},
undefined,false,false,
{daemon_config,30,131072,20971520}}}],
[{file,"src/compaction_daemon.erl"},
{line,1292}]},
{compaction_daemon,handle_info,2,
[{file,"src/compaction_daemon.erl"},
{line,334}]},
{gen_server,try_dispatch,4,
[{file,"gen_server.erl"},{line,616}]},
{gen_server,handle_msg,6,
[{file,"gen_server.erl"},{line,686}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,247}]}]},
{gen_server,call,
[compaction_daemon,
{uninhibit_view_compaction,<<"default">>,
#Ref<25421.2849894141.663486468.234167>},
infinity]}},
{gen_server,call,
[{'janitor_agent-default',
'ns_1@172.23.108.103'},
{if_rebalance,<0.32327.1435>,
{uninhibit_view_compaction,
#Ref<25421.2849894141.663486468.234167>}},
infinity]}}}.
Rebalance Operation Id = 91b0323dcb3c7456460b65e64e9d9d01
[stats:error,2021-07-18T04:55:08.087-07:00,ns_1@172.23.104.155:<0.29601.1443>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:55:08.464-07:00,ns_1@172.23.104.155:<0.27916.1443>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:55:08.517-07:00,ns_1@172.23.104.155:<0.10220.1444>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:55:08.520-07:00,ns_1@172.23.104.155:<0.5552.1444>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:55:08.545-07:00,ns_1@172.23.104.155:<0.29738.1443>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:56:23.407-07:00,ns_1@172.23.104.155:<0.1185.1445>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:56:23.412-07:00,ns_1@172.23.104.155:<0.7634.1445>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[stats:error,2021-07-18T04:59:48.457-07:00,ns_1@172.23.104.155:<0.13759.1448>:stats_reader:log_bad_responses:238]Some nodes didn't respond: ['ns_1@172.23.106.100']
[ns_server:error,2021-07-18T06:05:01.738-07:00,ns_1@172.23.104.155:<0.6577.1484>:ns_config_rep:synchronize_remote:299]Failed to synchronize config to some nodes:
[{'ns_1@172.23.104.5',
{exit,
{shutdown,
{gen_server,call,
[{ns_config_rep,'ns_1@172.23.104.5'},
synchronize_everything,infinity]}}}}]
...
[ns_server:error,2021-07-18T06:05:01.739-07:00,ns_1@172.23.104.155:<0.5251.1484>:misc:try_with_maybe_ignorant_after:1427]Eating exception from ignorant after-block:
{error,
{badmatch,
[{<0.11916.1484>,
{failed_to_update_vbucket_map,"default",244,
{error,
[{'ns_1@172.23.104.5',
{exit,
{shutdown,
{gen_server,call,
[{ns_config_rep,'ns_1@172.23.104.5'},
synchronize_everything,infinity]}}}}]}}}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1,
[{file,"src/misc.erl"},{line,1393}]},
{misc,try_with_maybe_ignorant_after,2,
[{file,"src/misc.erl"},{line,1425}]},
{ns_single_vbucket_mover,mover,6,
[{file,"src/ns_single_vbucket_mover.erl"},{line,52}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
Could someone from ns_server team take a look?