Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
7.0.0-5133-enterprise
-
Untriaged
-
1
-
No
-
KV-Engine CC Final Sprint
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,get-cbcollect-info=True,quota_percent=99,crash_warning=True,create_metakv_entries=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_rebalance_out,nodes_init=5,services_init=kv-fts-kv-kv-kv,nodes_failover=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=before,scrape_interval=5,rebalance_moves_per_node=32,quota_percent=80,skip_validations=True,GROUP=failover_with_collection_crud'
|
Steps to Repro
1. Create a node cluster
2021-05-10 20:21:19,243 | test | INFO | pool-5-thread-7 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
----------------------------------------------------------------------
172.23.98.196 | kv | 7.0.0-5133-enterprise | 3.35251438579 | Cluster node |
172.23.98.195 | ['fts'] | <--- IN — | ||
172.23.121.10 | ['kv'] | <--- IN — | ||
172.23.104.186 | ['kv'] | <--- IN — | ||
172.23.120.206 | ['kv'] | <--- IN — |
----------------------------------------------------------------------
2) Create buckets/scopes/collections/data
2021-05-10 20:25:08,065 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
-------------------------------------------------------------------------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
-------------------------------------------------------------------------
bucket1 | couchbase | 3 | none | 0 | 3000 | 838860800 | 205419072 | 300403800 |
bucket2 | ephemeral | 3 | none | 0 | 3000 | 838860800 | 316814512 | 136 |
default | couchbase | 3 | none | 0 | 500000 | 8388608000 | 755490896 | 590212284 |
-------------------------------------------------------------------------
3) Set the following settings
2021-05-10 20:25:16,298 | test | INFO | MainThread | [collections_rebalance:setUp:58] Changing scrape interval to 5
|
2021-05-10 20:25:18,355 | test | INFO | MainThread | [cluster_ready_functions:set_rebalance_moves_per_nodes:129] Changed Rebalance settings: {u'rebalanceMovesPerNode': 32}
|
4) Create metakv entries by creating and dropping 200 fts indexes
2021-05-10 20:25:18,355 | test | INFO | MainThread | [collections_rebalance:setUp:78] Creating metakv entries start
|
2021-05-10 20:27:37,470 | test | INFO | MainThread | [collections_rebalance:setUp:80] Creating metakv entries end
|
5) Start CRUD on collections
2021-05-10 20:27:37,474 | test | INFO | MainThread | [bucket_ready_functions:perform_tasks_from_spec:4651] Performing scope/collection specific operations
|
2021-05-10 20:27:44,384 | test | INFO | MainThread | [bucket_ready_functions:perform_tasks_from_spec:4741] Done Performing scope/collection specific operations
|
5) Start hard failover of one of the node which fails as shown below.
2021-05-10 20:27:44,589 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:388] Starting rebalance operation of type : hard_failover_rebalance_out
|
2021-05-10 20:27:44,591 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:632] failing over nodes [ip:172.23.104.186 port:8091 ssh_username:root, ip:172.23.120.206 port:8091 ssh_username:root]
|
2021-05-10 20:27:54,937 | test | ERROR | pool-5-thread-9 | [rest_client:_http_request:748] POST http://172.23.98.196:8091/controller/failOver body: otpNode=ns_1%40172.23.104.186&allowUnsafe=false headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n', 'Content-Type': 'application/x-www-form-urlencoded'} error: 500 reason: unknown ["Unexpected server error, request logged."] auth: Administrator:password
|
2021-05-10 20:27:54,940 | test | ERROR | pool-5-thread-9 | [rest_client:fail_over:1276] ns_1@172.23.104.186 - Failover error: ["Unexpected server error, request logged."]
|
ERROR
|
debug.log at the time of failure
[ns_server:error,2021-05-10T20:27:54.907-07:00,ns_1@172.23.98.196:<0.25585.1>:menelaus_util:reply_server_error:206]Server error during processing: ["web request failed",
|
{path,"/controller/failOver"},
|
{method,'POST'},
|
{type,exit},
|
{what,
|
{{function_clause,
|
[{ns_orchestrator,rebalancing,
|
[{request_janitor_run,
|
{bucket,"bucket1"}},
|
{rebalancing_state,<0.25739.1>,
|
<0.25737.1>,[],[],[],[],undefined,
|
['ns_1@172.23.104.186'],
|
undefined,failover,
|
<<"5173d70bbd2afe55f32eb5e976d59df5">>,
|
undefined,
|
{<0.25585.1>,
|
#Ref<0.2378229137.562823169.70811>}}],
|
[{file,"src/ns_orchestrator.erl"},
|
{line,887}]},
|
{gen_statem,loop_state_callback,11,
|
[{file,"gen_statem.erl"},{line,1120}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,249}]}]},
|
{gen_statem,call,
|
[{via,leader_registry,ns_orchestrator},
|
{failover,['ns_1@172.23.104.186'],false},
|
infinity]}}},
|
{trace,
|
[{gen,do_call,4,
|
[{file,"gen.erl"},{line,177}]},
|
{gen,do_for_proc,2,
|
[{file,"gen.erl"},{line,238}]},
|
{gen_statem,call_dirty,4,
|
[{file,"gen_statem.erl"},{line,623}]},
|
{menelaus_web_cluster,handle_failover,1,
|
[{file,"src/menelaus_web_cluster.erl"},
|
{line,782}]},
|
{request_throttler,do_request,3,
|
[{file,"src/request_throttler.erl"},
|
{line,58}]},
|
{menelaus_util,handle_request,2,
|
[{file,"src/menelaus_util.erl"},
|
{line,217}]},
|
{mochiweb_http,headers,6,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
|
{line,150}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,249}]}]}]
|
cbcollect_info attached.
Attachments
Issue Links
- duplicates
-
MB-46255 Rebalance failures observed in custom-map-n1ql-rqg-scorch-match-phrase
- Closed