Loading...

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 7.6.0, 7.2.1
Affects Version/s: 7.2.1
Component/s: secondary-index
Labels:
- zenith
Environment:
7.2.1-5878 ( upgraded from 7.1.5-3877)

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown

Description

This is a 7-node cluster ( 3 KV+ 2 Index+ 2 Query nodes). The cluster is running with a very low index RR(0 to 1%). Unfortunately, the nodes were not responsive and I could collect the logs for one of the index nodes.

https://cb-engineering.s3.amazonaws.com/SysTest24Jul_1/collectinfo-2023-07-26T112447-ns_1%40svc-d-node-008.x6xy5nt4xj5p-vn.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/SysTest24Jul_1/collectinfo-2023-07-26T112447-ns_1%40svc-d-node-009.x6xy5nt4xj5p-vn.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/SysTest24Jul_1/collectinfo-2023-07-26T112447-ns_1%40svc-d-node-010.x6xy5nt4xj5p-vn.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/SysTest24Jul_1/collectinfo-2023-07-26T112447-ns_1%40svc-i-node-013.x6xy5nt4xj5p-vn.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/SysTest24Jul_1/collectinfo-2023-07-26T112447-ns_1%40svc-q-node-011.x6xy5nt4xj5p-vn.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/SysTest24Jul_1/collectinfo-2023-07-26T112447-ns_1%40svc-q-node-014.x6xy5nt4xj5p-vn.sandbox.nonprod-project-avengers.com.zip

Supportal snapshot ->

https://supportal.couchbase.com/customer/systest_24jul_1/cluster/015c70994c958293105e7e7c7b45d071

// Some comments here

public String getFoo()

    return foo;

2023-07-26T11:09:47.862+00:00 [Info] default6/idx3_CvN3/Mainstore#10548483258518200711:1 Plasma: SMR reclaim pending is higher than expected: pending = 19 KB (expected = 12 KB), wCtxCnt = 11, objCnt 3, changed reclaimList flush threshold from 5 to 0, changed reclaimSize flush threshold from 1 KB to 0 KB.

2023-07-26T11:09:47.863+00:00 [Info] default6/idx3_CvN3/Mainstore#10903970996576475501:4 Plasma: SMR reclaim pending is higher than expected: pending = 70 KB (expected = 12 KB), wCtxCnt = 6, objCnt 7, changed reclaimList flush threshold from 0 to 0, changed reclaimSize flush threshold from 1 KB to 1 KB.

2023-07-26T11:09:47.926+00:00 [Warn] AutofailoverServiceManager::HealthCheck: Slow heartbeat 2.106421341s. priorTime: 2023-07-26 11:09:45.820183389 +0000 UTC m=+136955.683533216, callTime: 2023-07-26 11:09:47.92660473 +0000 UTC m=+136957.789954557, healthInfo: {DiskFailures:0}

2023-07-26T11:09:48.068+00:00 [Info] default8/idx8_XW91S5K4YY_idxprefix/Mainstore#3684940844532413095:1 Plasma: SMR reclaim pending is higher than expected: pending = 43 KB (expected = 12 KB), wCtxCnt = 10, objCnt 19, changed reclaimList flush threshold from 1 to 0, changed reclaimSize flush threshold from 0 KB to 0 KB.

2023-07-26T11:09:48.083+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.0.0.204:41666.  Error = read tcp4 10.0.0.204:9100->10.0.0.204:41666: i/o timeout. Kill Pipe.

2023-07-26T11:09:48.083+00:00 [Error] PeerListener.handleConnection error in authfn Server Error : SyncProxy.listen(): channel closed. Terminate for conn 10.0.0.204:9100:10.0.0.204:41666

2023-07-26T11:09:48.189+00:00 [Info] default6/idx1_T30z/Mainstore#14406419825967849331:4 Plasma: SMR reclaim pending is higher than expected: pending = 76 KB (expected = 12 KB), wCtxCnt = 8, objCnt 13, changed reclaimList flush threshold from 2 to 0, changed reclaimSize flush threshold from 1 KB to 1 KB.

2023-07-26T11:09:48.258+00:00 [Info] default8/idx12_rA0XDr/Mainstore#6150617141440904616:3 Plasma: SMR reclaim pending is higher than expected: pending = 70 KB (expected = 12 KB), wCtxCnt = 12, objCnt 11, changed reclaimList flush threshold from 0 to 0, changed reclaimSize flush threshold from 0 KB to 0 KB.

2023-07-26T11:09:48.599+00:00 [Info] default2/idx9_n8CtS/Mainstore#8671557997859106209:2 Plasma: Warning: not enough memory to hold records in memory. MemStats: {"memory_size":902995,"memory_size_index":770171,"buf_memused":23714463,"mvcc_purge_ratio":1.00000,"resident_ratio":0.00000,"alloc_size":644425855,"free_size":642752689,"items_count":1985086,"recs_in_mem":0,"reclaimed":642752689,"reclaim_pending":0}

2023-07-26T11:09:48.640+00:00 [Info] AutofailoverServiceManager::IsSafe: Called with nodeUUIDs [4a5c0a660929b26912d512e460777943]

2023-07-26T11:09:48.695+00:00 [Info] default8/idx3_E6CCEDY1DJ_idxprefix/Mainstore#10757568671253448994:0 Plasma: SMR reclaim pending is higher than expected: pending = 27 KB (expected = 12 KB), wCtxCnt = 11, objCnt 5, changed reclaimList flush threshold from 0 to 0, changed reclaimSize flush threshold from 0 KB to 0 KB.

fatal error: runtime: out of memory

runtime stack:

runtime.throw({0x135df5b?, 0x21d8260?})

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/panic.go:1047 +0x5d fp=0x7fca1a1f2d20 sp=0x7fca1a1f2cf0 pc=0x43dd1d

runtime.sysMapOS(0xc3bc000000, 0x400000?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mem_linux.go:187 +0x11b fp=0x7fca1a1f2d68 sp=0x7fca1a1f2d20 pc=0x41ef7b

runtime.sysMap(0x21c0a40?, 0x433a7a?, 0x21d0bd8?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mem.go:142 +0x35 fp=0x7fca1a1f2d98 sp=0x7fca1a1f2d68 pc=0x41e955

runtime.(*mheap).grow(0x21c0a40, 0x2000?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mheap.go:1522 +0x252 fp=0x7fca1a1f2e10 sp=0x7fca1a1f2d98 pc=0x42f1b2

runtime.(*mheap).allocSpan(0x21c0a40, 0x1, 0x0, 0x52?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mheap.go:1243 +0x1b7 fp=0x7fca1a1f2ea8 sp=0x7fca1a1f2e10 pc=0x42e8f7

runtime.(*mheap).alloc.func1()

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mheap.go:961 +0x65 fp=0x7fca1a1f2ef0 sp=0x7fca1a1f2ea8 pc=0x42e3a5

runtime.systemstack()

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/asm_amd64.s:496 +0x49 fp=0x7fca1a1f2ef8 sp=0x7fca1a1f2ef0 pc=0x470f89

goroutine 231963 [running]:

runtime.systemstack_switch()

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/asm_amd64.s:463 fp=0xc00a171258 sp=0xc00a171250 pc=0x470f20

runtime.(*mheap).alloc(0x437910?, 0xdd0546?, 0x0?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mheap.go:955 +0x65 fp=0xc00a1712a0 sp=0xc00a171258 pc=0x42e2e5

runtime.(*mcentral).grow(0xc0d0d203f0?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mcentral.go:246 +0x57 fp=0xc00a1712e0 sp=0xc00a1712a0 pc=0x41e2b7

runtime.(*mcentral).cacheSpan(0x21d1558)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mcentral.go:166 +0x306 fp=0xc00a171338 sp=0xc00a1712e0 pc=0x41e106

runtime.(*mcache).refill(0x7fd5723316b8, 0xc?)

	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.20.4/go/src/runtime/mcache.go:182 +0x152 fp=0xc00a171378 sp=0xc00a171338 pc=0x41d852

runtime.(*mcache).nextFree(0x7fd5723316b8, 0xc)

Attachments

Issue Links

relates to

MB-57814 [System Test on cloud] Index/Query nodes getting failed over and added back frequently

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[System Test on cloud] Panic in indexer - fatal error: runtime: out of memory

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty