Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
7.1.0, 7.0.3, 7.0.2, 6.6.5
-
RC1 - Enterprise Edition 7.1.0 build 2475
-
Untriaged
-
-
1
-
No
Description
1. Load 1 billion 5k(body field contains the major chunk of data) items in a default magma bucket.
2. Create 50% fragmentation in KV
3. Create an secondary index idx0 and wait for it to build.
4. Start a data load as below:
Read Start: 0
|
Read End: 500000000
|
Update Start: 0
|
Update End: 500000000
|
Expiry Start: 0
|
Expiry End: 0
|
Delete Start: 500000000
|
Delete End: 1000000000
|
Create Start: 1000000000
|
Create End: 1500000000
|
5. Rebalance IN 1 KV node. Rebalance finished in 11019.869 sec
6. Create another index idx1 on body field while the above rebalance was at 35%. Index was building while KV rebalances were running. Total plasma disk size was at ~2TB.
7. Rebalance OUT 1 KV node. Rebalance finished in 10870.6289999 sec
8. Rebalance IN 2 and out 1 KV node. Rebalance failed at 80%.
{u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.110.64', u'tstamp': 1647477583397L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:39:43.397Z', u'text': u"Rebalance exited with reason {service_rebalance_failed,index,\n {agent_died,<26112.18949.96>,\n {lost_connection,\n {'ns_1@172.23.110.66',shutdown}}}}.\nRebalance Operation Id = 6e737e537f7cc57e8962bec1a189e45e"} |
2022-03-16 17:39:52,405 | test | ERROR | pool-3-thread-6 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.110.66', u'tstamp': 1647477583388L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:39:43.388Z', u'text': u"Service 'index' exited with status 2. Restarting. Messages:\nruntime.gopark(0x14603b8, 0x0, 0x1809, 0x1)\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/proc.go:336 +0xe5 fp=0xc018ea99e0 sp=0xc018ea99c0 pc=0x441ae5\nruntime.selectgo(0xc018ea9c50, 0xc018ea9ba4, 0x0, 0x0, 0x3, 0x1459601, 0x0, 0x1)\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/select.go:327 +0xef7 fp=0xc018ea9b18 sp=0xc018ea99e0 pc=0x452a97\ngithub.com/couchbase/indexing/secondary/indexer.(*Rebalancer).processDropIndexQueue(0xc018e96000)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:1383 +0x1b2 fp=0xc018ea9fd8 sp=0xc018ea9b18 pc=0xfe9cb2\nruntime.goexit()\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc018ea9fe0 sp=0xc018ea9fd8 pc=0x4784c1\ncreated by github.com/couchbase/indexing/secondary/indexer.NewRebalancer\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:171 +0x545\n"} |
2022-03-16 17:39:52,407 | test | ERROR | pool-3-thread-6 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.110.65', u'tstamp': 1647477562188L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:39:22.188Z', u'text': u'Shutting down bucket "GleamBookUsers0" on \'ns_1@172.23.110.65\' for deletion'} |
2022-03-16 17:39:52,407 | test | ERROR | pool-3-thread-6 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.110.66', u'tstamp': 1647476292642L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:18:12.642Z', u'text': u"Service 'index' exited with status 2. Restarting. Messages:\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/mvcc.go:824 +0x3bd fp=0xc05021feb8 sp=0xc05021fc70 pc=0xe2c33d\ngithub.com/couchbase/plasma.(*Plasma).VisitPartition(0xc005ae2000, 0x46, 0xc014640000, 0xc014641300, 0xc0136419c0, 0x0, 0x0)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page_visitor.go:87 +0x24a fp=0xc05021ff38 sp=0xc05021feb8 pc=0xddbdca\ngithub.com/couchbase/plasma.(*Plasma).PageVisitor.func1(0xc048f98530, 0xc005ae2000, 0xc0136419c0, 0xc0007c4900, 0x48, 0x48, 0x46, 0xc014640000, 0xc014641300)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page_visitor.go:41 +0x88 fp=0xc05021ff98 sp=0xc05021ff38 pc=0xe2d1c8\nruntime.goexit()\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc05021ffa0 sp=0xc05021ff98 pc=0x4784c1\ncreated by github.com/couchbase/plasma.(*Plasma).PageVisitor\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page_visitor.go:39 +0x18b\n"} |
Note: This is found while trying out a recently reported customer use case(CBSE-11583). The env is much better in this case. These are physical machines with 256GB RAM and 72 cores.
All these rebalance passed(but the test got aborted due to jython process kill. Planning to re-run on RC3):
In the last run where this defect was found the test failed in step 3 above.
Here are the logs from the latest run on 7.1.0 build 2521:
s3://cb-customers-secure/index_7.1.0-2521/2022-03-25/collectinfo-2022-03-25t165930-ns_1@172.23.110.64.zip
s3://cb-customers-secure/index_7.1.0-2521/2022-03-25/collectinfo-2022-03-25t165930-ns_1@172.23.110.65.zip
s3://cb-customers-secure/index_7.1.0-2521/2022-03-25/collectinfo-2022-03-25t165930-ns_1@172.23.110.67.zip