Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51477

[Magma] Rebalance exited with reason buckets_shutdown_wait_failed

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Unknown

    Description

      Seeing this issue during cluster teardown:

      rebalance_report_20220314T223148.json:
       
      {"stageInfo":{"data":{"startTime":"2022-03-14T15:30:48.099-07:00","completedTime":false,"timeTaken":60280}},"rebalanceId":"966b008ec2d7d181ce4f40571d563618","nodesInfo":{"active_nodes":["ns_1@172.23.100.38","ns_1@172.23.100.39"],"keep_nodes":["ns_1@172.23.100.38"],"eject_nodes":["ns_1@172.23.100.39"],"delta_nodes":[],"failed_nodes":[]},"masterNode":"ns_1@172.23.100.38","startTime":"2022-03-14T15:30:48.098-07:00","completedTime":"2022-03-14T15:31:48.379-07:00","timeTaken":60281,"completionMessage":"Rebalance exited with reason {buckets_shutdown_wait_failed,\n                              [{'ns_1@172.23.100.38',\n                                {'EXIT',\n                                 {old_buckets_shutdown_wait_failed,\n                                  [\"default\"]}}}]}."}
      
      

      Attachments

        1. 172.23.100.38.zip
          24.32 MB
        2. 172.23.100.39.zip
          28.18 MB

        Issue Links

          For Gerrit Dashboard: MB-51477
          # Subject Branch Project Status CR V

          Activity

            ritam.sharma Ritam Sharma added a comment -

            Apaar Gupta - We did try with dedicated disk without the toy and with 10 mins sleep, the results were not very good; the tests failed with bucket deletion timeout.
            Pavithra Mahamani - Please confirm on above.

            ritam.sharma Ritam Sharma added a comment - Apaar Gupta - We did try with dedicated disk without the toy and with 10 mins sleep, the results were not very good; the tests failed with bucket deletion timeout. Pavithra Mahamani - Please confirm on above.
            apaar.gupta Apaar Gupta added a comment -

            Ritam Sharma does that mean the toy build was better than the 10 min timeout build? In the toy build we saw this issue only once. If we are seeing a measurable benefit from the toy build we will look into merging the fix for neo too. If we are not sure we can merge just to master.

            apaar.gupta Apaar Gupta added a comment - Ritam Sharma does that mean the toy build was better than the 10 min timeout build? In the toy build we saw this issue only once. If we are seeing a measurable benefit from the toy build we will look into merging the fix for neo too. If we are not sure we can merge just to master.
            apaar.gupta Apaar Gupta added a comment -

            We have merged patches to mitigate this issue. The issue cannot be completely solved since Magma does Io during shutdown and extremely slow disks can still cause us to exceed the 5 minute threshold.

            apaar.gupta Apaar Gupta added a comment - We have merged patches to mitigate this issue. The issue cannot be completely solved since Magma does Io during shutdown and extremely slow disks can still cause us to exceed the 5 minute threshold.

            Build couchbase-server-7.2.0-1069 contains magma commit c768712 with commit message:
            MB-51477 magma: Don't checkpoint on shutdown based on a threshold

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1069 contains magma commit c768712 with commit message: MB-51477 magma: Don't checkpoint on shutdown based on a threshold

            Build couchbase-server-7.2.0-1069 contains magma commit 757617d with commit message:
            MB-51477 magma: Cancel inflight SSTable writes during shutdown

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1069 contains magma commit 757617d with commit message: MB-51477 magma: Cancel inflight SSTable writes during shutdown

            People

              apaar.gupta Apaar Gupta
              pavithra.mahamani Pavithra Mahamani
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There is 1 open Gerrit change

                  PagerDuty