Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
7.1.1
-
Untriaged
-
-
1
-
Unknown
Description
We observed the indexer node become unreachable during 10% RR compression test even if we give the indexer 50% of available memory.
Ran few tests to validate it
Kernel limit | Available | Indexer Quota | RR(after scans) | Throughput | Is hung observed | Job |
---|---|---|---|---|---|---|
12G | 9G | 3100MB | 3% | 51072.0 | NO | http://perf.jenkins.couchbase.com/job/hemera/5288/console |
10G | 7G | 3100MB | 5% | 32644.7 | NO | http://perf.jenkins.couchbase.com/job/hemera/5292/consoleFull completed in 2:30 hr |
9.5G | 6290MB | 3100MB | 5% | 19757.8 | Partially | http://perf.jenkins.couchbase.com/job/hemera/5293/consoleFull completed in 5:30 hr |
9G | 6230MB | 3100MB | 5% | NA | Yes | http://perf.jenkins.couchbase.com/job/hemera/5294/consoleFull |
Seeing CBAuth issue as
Service 'goxdcr' exited with status 1. Restarting. Messages:
|
2022-06-30T10:45:54.369-07:00 INFO GOXDCR.SecuritySvc: Received security change notification. code 7
|
2022-06-30T10:45:54.634-07:00 ERRO GOXDCR.SecuritySvc: GetClusterEncryptionConfig returned error: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
|
2022-06-30T10:45:54.643-07:00 WARN GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get "http://127.0.0.1:8091/_metakv/remoteCluster/": CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
|
2022-06-30T10:45:54.733-07:00 ERRO GOXDCR.SecuritySvc: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
|
2022/06/30 10:45:54 revrpc: Got error (dial tcp 127.0.0.1:8091: connect: connection refused) and will retry in 1s
|
2022-06-30T10:45:54.734-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed after max retry. path=/remoteCluster/
|
2022-06-30T10:45:54.734-07:00 WARN GOXDCR.Utils: GetAllMetadataFromCatalog(remoteCluster) took 20.805129325s
|
Metadata service not available after 30 retries.
|
|
|
hidens_log 000ns_1@cen-s705.perf.couchbase.com 10:46:10 AM 30 Jun, 2022
|
|
|
IP address seems to have changed. Unable to listen on 'ns_1@cen-s705.perf.couchbase.com'. (POSIX error code: 'nxdomain')
|
Another job which is not completed even in 8hr http://perf.jenkins.couchbase.com/job/hemera/4982/console
Currently running it with 12G as kernel limit