Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52814

Compression : Indexer node become unreachable with 10%RR test

    XMLWordPrintable

Details

    Description

      We observed the indexer node become unreachable during 10% RR compression test even if we give the indexer 50% of available memory.

      Ran few tests to validate it 

       

      Kernel limit Available Indexer Quota RR(after scans) Throughput Is hung observed  Job
      12G 9G 3100MB 3% 51072.0 NO http://perf.jenkins.couchbase.com/job/hemera/5288/console 
      10G 7G 3100MB 5%  32644.7 NO http://perf.jenkins.couchbase.com/job/hemera/5292/consoleFull 
      completed in 2:30 hr
      9.5G 6290MB 3100MB 5% 19757.8 Partially http://perf.jenkins.couchbase.com/job/hemera/5293/consoleFull  completed in 5:30 hr
      9G 6230MB 3100MB 5% NA Yes http://perf.jenkins.couchbase.com/job/hemera/5294/consoleFull 

      Seeing CBAuth issue as 

      Service 'goxdcr' exited with status 1. Restarting. Messages:
      2022-06-30T10:45:54.369-07:00 INFO GOXDCR.SecuritySvc: Received security change notification. code 7
      2022-06-30T10:45:54.634-07:00 ERRO GOXDCR.SecuritySvc: GetClusterEncryptionConfig returned error: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
      2022-06-30T10:45:54.643-07:00 WARN GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get "http://127.0.0.1:8091/_metakv/remoteCluster/": CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
      2022-06-30T10:45:54.733-07:00 ERRO GOXDCR.SecuritySvc: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
      2022/06/30 10:45:54 revrpc: Got error (dial tcp 127.0.0.1:8091: connect: connection refused) and will retry in 1s
      2022-06-30T10:45:54.734-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed after max retry. path=/remoteCluster/
      2022-06-30T10:45:54.734-07:00 WARN GOXDCR.Utils: GetAllMetadataFromCatalog(remoteCluster) took 20.805129325s
      Metadata service not available after 30 retries.
       
       
      hidens_log 000ns_1@cen-s705.perf.couchbase.com 10:46:10 AM 30 Jun, 2022
       
       
      IP address seems to have changed. Unable to listen on 'ns_1@cen-s705.perf.couchbase.com'. (POSIX error code: 'nxdomain') 

      Another job which is not completed even in 8hr http://perf.jenkins.couchbase.com/job/hemera/4982/console 

      Currently running it with 12G as kernel limit

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            vikas.chaudhary Vikas Chaudhary
            vikas.chaudhary Vikas Chaudhary
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty