Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
7.1.0
-
7.1.0-1363
-
Untriaged
-
-
1
-
Unknown
-
KV 2021-Oct-21
Description
Steps To Reproduce:
- Create a 10 node KV cluster
- Create a magma bucket with 1 replica. Create 20 collections
- Load 10M(0-10M, 0-50k per collection) items and upsert them once
- Load another 1M(10M-20M, 10M-20M per collection) items and upsert them
- Start CRUD load per collections as below:
Read Start: 0
Read End: 500000
Update Start: 1000000
Update End: 10000000
Expiry Start: 0
Expiry End: 0
Delete Start: 500000
Delete End: 1000000
Create Start: 1000000
Create End: 10000000
Final Start: 1000000
Final End: 10000000
- Rebalance in one node. Abort->Resume Rebalance at 20%, 40%, 60%, 80%. Rebalance passed
- Crash Magma/memc with Loading of docs on all the 10 nodes every random sleep of random.randint(60, 120). After every kill, wait for bucket warmup. Everything went fine at this step. No crashes found and no critical messages in memcached.log
- Rebalance out one node. Rebalance Failed:
{u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.120.170', u'tstamp': 1632909123361L, u'shortText': u'message', u'serverTime': u'2021-09-29T02:52:03.361Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.5065.41>,\n {{{badmatch,\n {error,\n {setup_replications_failed,\n [{\'ns_1@172.23.120.170\',\n {errors,[{10,64}]}}]}}},\n [{janitor_agent,handle_apply_vbucket_state,\n 2,\n [{file,"src/janitor_agent.erl"},\n {line,1074}]},\n {janitor_agent,\n apply_vbucket_states_worker_loop,0,\n [{file,"src/janitor_agent.erl"},\n {line,1063}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,234}]}]},\n {gen_server,call,\n [{\'janitor_agent-GleamBookUsers0\',\n \'ns_1@172.23.121.127\'},\n {if_rebalance,<0.3860.41>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.121.129\',\n \'ns_1@172.23.121.115\'],\n 698}},\n infinity]}}}}}.\nRebalance Operation Id = 694a80c21b7d0a2eb1c7118d1781ff67'}
2021-09-29 02:52:12,555 | test | ERROR | pool-3-thread-4 | [rest_client:print_UI_logs:2786] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.120.170', u'tstamp': 1632909123311L, u'shortText': u'message', u'serverTime': u'2021-09-29T02:52:03.311Z', u'text': u'Worker <0.5714.41> (for action {move,{698,\n [\'ns_1@172.23.121.127\',\n \'ns_1@172.23.121.129\'],\n [\'ns_1@172.23.121.129\',\n \'ns_1@172.23.121.115\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.5065.41>,\n {{{badmatch,\n {error,\n {setup_replications_failed,\n [{\'ns_1@172.23.120.170\',\n {errors,\n [{10,\n 64}]}}]}}},\n [{janitor_agent,\n handle_apply_vbucket_state,\n 2,\n [{file,\n "src/janitor_agent.erl"},\n {line,\n 1074}]},\n {janitor_agent,\n apply_vbucket_states_worker_loop,\n 0,\n [{file,\n "src/janitor_agent.erl"},\n {line,\n 1063}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 234}]}]},\n {gen_server,\n call,\n [{\'janitor_agent-GleamBookUsers0\',\n \'ns_1@172.23.121.127\'},\n {if_rebalance,\n <0.3860.41>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.121.129\',\n \'ns_1@172.23.121.115\'],\n 698}},\n infinity]}}}}'}
Expected Result:
Rebalance should progress and should not fail.
QE Test |
git fetch "http://review.couchbase.org/TAF" refs/changes/97/162297/1 && git checkout FETCH_HEAD
|
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job4.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False,iterations=2,sdk_timeout=60,log_level=debug,infra_log_level=debug,skip_cleanup=True -t aGoodDoctor.Hospital.Murphy.SystemTestMagma,nodes_init=10,graceful=True,skip_cleanup=True,num_items=500000,num_buckets=1,bucket_names=GleamBook,doc_size=2048,key_size=18,assert_crashes_on_load=True,num_collections=20,maxttl=10,num_indexes=20,pc=10,index_nodes=0,query_nodes=0,cbas_nodes=0,fts_nodes=0,ops_rate=50000,doc_ops=create:update:delete:read,durability=Majority,crashes=10,max_commit_points=0 -m rest'
|
Daniel Owen, the plan wasn't to run this test at this stage but i end up running this as i had to verify another magma bug but then i encountered this one.
Test Category: Unbounded Volume test that includes rebalance aborts and crashes: https://docs.google.com/spreadsheets/d/1AKutwtUlGX4UckfGPkJSKZu_7wfz_EwMMuoajCYUub8/edit#gid=1608573032&range=G7
Attachments
Issue Links
- duplicates
-
MB-48533 [Magma] Memcached keeps getting disconnected while data loading magma buckets to dgm
- Closed