Description
I'm running a system test with 3 KV nodes + 6 index nodes + 1 query node. In this test I have 3 active buckets and few indexes and running continuous query load on them. This cluster undergoes cycle of Index and KV nodes in/out with services kill and rebalance retry. At no point in time I kill thing in query node, so I expect it to run without any issue.
However, I saw that query node was unresponsive and not available. I had to manually start couchbase service to bring the node up.
I'm not really sure what has lead to query node down. I see lots of file like this in /opt/couchbase/var/lib/couchbase/logs dir
-rw-rw---- 1 couchbase couchbase 1437 Mar 28 03:55 query_ffdc_MRE_areq_00310988_2024-03-28-035546.770.gz |
-rw-rw---- 1 couchbase couchbase 1440 Mar 28 03:57 query_ffdc_MRE_areq_00310988_2024-03-28-035716.841.gz |
-rw-rw---- 1 couchbase couchbase 1154 Mar 28 04:01 query_ffdc_MRE_areq_00310988_2024-03-28-040116.861.gz |
-rw-rw---- 1 couchbase couchbase 45405 Mar 28 03:55 query_ffdc_MRE_grtn_00310988_2024-03-28-035546.770.gz |
-rw-rw---- 1 couchbase couchbase 46653 Mar 28 03:57 query_ffdc_MRE_grtn_00310988_2024-03-28-035716.841.gz |
-rw-rw---- 1 couchbase couchbase 48336 Mar 28 04:01 query_ffdc_MRE_grtn_00310988_2024-03-28-040116.861.gz |
-rw-rw---- 1 couchbase couchbase 150214 Mar 28 03:55 query_ffdc_MRE_heap_00310988_2024-03-28-035546.770.gz |
-rw-rw---- 1 couchbase couchbase 151079 Mar 28 03:57 query_ffdc_MRE_heap_00310988_2024-03-28-035716.841.gz |
I've run the cbcollect logs manually as well.