Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56274

[System Test][Magma CDC] Rebalance exited with reason pre_rebalance_janitor_run_failed, wait_for_memcached_failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • 7.2.0
    • couchbase-bucket
    • Enterprise Edition 7.2.0 build 5275
    • Triaged
    • Centos 64-bit
    • 0
    • Unknown
    • KV 2023-4

    Description

      QE TEST

      -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml
      

      Day - 3
      Cycle - 1
      Scale - 3

      TEST STEP
      Hard failover + full recovery + rebalance of an Indexer node.

      [2023-03-31T13:49:54-07:00, sequoiatools/couchbase-cli:7.1:92e6b8] failover -c 172.23.108.103:8091 --server-failover 172.23.96.252:8091 -u Administrator -p password --hard
      [2023-03-31T13:50:04-07:00, sequoiatools/couchbase-cli:7.1:cdcb56] recovery -c 172.23.108.103:8091 --server-recovery 172.23.96.252:8091 --recovery-type full -u Administrator -p password
      [2023-03-31T13:50:10-07:00, sequoiatools/couchbase-cli:7.1:22cd52] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      →  
       
      Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
      docker logs 22cd52
      docker start 22cd52
       
      sWARNING: couchbase-cli version 7.1.0-1345-enterprise does not match couchbase server version 7.2.0-5275-enterprise
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      

      REBALANCE FAILURE

      2023-03-31T13:50:21.596-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"default",
                                       {error,wait_for_memcached_failed,
                                           ['ns_1@172.23.99.25']}}.
      Rebalance Operation Id = c9aa5e8b04077b106b65664ea7bd116a
      

      OBSERVATION
      There are some slow runtime and no memory warnings for default bucket on 99.25 node just before the rebalance failure.

      2023-03-31T13:50:09.083551-07:00 WARNING 335: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.20->ns_1@172.23.99.25:default - vb:848 Got error 'no memory' while trying to process mutation with seqno:511887
      2023-03-31T13:50:09.084774-07:00 WARNING 41666: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.97.119->ns_1@172.23.99.25:default - vb:503 Got error 'no memory' while trying to process mutation with seqno:82682
      2023-03-31T13:50:09.134771-07:00 WARNING 335: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.20->ns_1@172.23.99.25:default - vb:846 Got error 'no memory' while trying to process mutation with seqno:589699
      2023-03-31T13:50:09.179071-07:00 WARNING 335: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.20->ns_1@172.23.99.25:default - vb:852 Got error 'no memory' while trying to process mutation with seqno:621411
      2023-03-31T13:50:09.375396-07:00 WARNING 335: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.20->ns_1@172.23.99.25:default - vb:845 Got error 'no memory' while trying to process mutation with seqno:439580
      2023-03-31T13:50:09.778039-07:00 WARNING 335: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.20->ns_1@172.23.99.25:default - vb:847 Got error 'no memory' while trying to process mutation with seqno:513110
      2023-03-31T13:50:09.803088-07:00 WARNING 335: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.20->ns_1@172.23.99.25:default - vb:849 Got error 'no memory' while trying to process mutation with seqno:444234
      2023-03-31T13:50:18.136060-07:00 WARNING (default) Slow runtime for 'Backfilling items for eq_dcpq:cbas:Local:default:9f34e8ceb48e8e1ca55594a6a2826422:1' on thread AuxIoPool5: 44 s
      2023-03-31T13:50:18.934246-07:00 WARNING (default) Slow runtime for 'Backfilling items for eq_dcpq:cbas:Local:default:9f34e8ceb48e8e1ca55594a6a2826422:5' on thread AuxIoPool6: 18 s
      

      NOTE
      This issue was not observed on any of the previous system test runs on 7.2 builds. Last run on 7.2.0-5263 build.

      Attachments

        Activity

          People

            sujay.gad Sujay Gad
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty