Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43343

[System Test]: Handlers hung in deploying state

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      Yes

      Description

      Build: 6.6.1-9213, passed on 6.6.1-9207

      Test: Eventing component

      Day: 3rd 

      Cycle: 15

      Test Step: rebalance failed and then its hung in deploying state. Which cause subsequent rebalance to be failed

      [2020-12-16T08:39:18-08:00, sequoiatools/couchbase-cli:6.5:ec0df2] failover -c 172.23.104.16:8091 --server-failover 172.23.104.18:8091 -u Administrator -p password --force
      [2020-12-16T08:39:50-08:00, sequoiatools/couchbase-cli:6.5:2a00f3] failover -c 172.23.104.16:8091 --server-failover 172.23.97.77:8091 -u Administrator -p password --force
      [2020-12-16T08:40:03-08:00, sequoiatools/couchbase-cli:6.5:54cd98] rebalance -c 172.23.104.16:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.104.16:8091 -u Administrator -p password]
       
      docker logs 54cd98
      docker start 54cd98
       
      ������sWARNING: couchbase-cli version 6.5.0-3216-enterprise does not match couchbase server version 6.6.1-9213-enterprise
      ������*Unable to display progress bar on this os
      ������JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2020-12-16T09:06:51-08:00, sequoiatools/cmd:fc08b5] 60
      [2020-12-16T09:08:07-08:00, sequoiatools/couchbase-cli:6.5:08d84f] server-add -c 172.23.104.16:8091 --server-add https://172.23.104.17 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2020-12-16T09:08:32-08:00, sequoiatools/couchbase-cli:6.5:42690f] server-add -c 172.23.104.16:8091 --server-add https://172.23.104.18 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2020-12-16T09:08:52-08:00, sequoiatools/couchbase-cli:6.5:e27bf0] server-add -c 172.23.104.16:8091 --server-add https://172.23.104.23 -u Administrator -p password --server-add-username Administrator --server-add-password password --services eventing
      [2020-12-16T09:09:04-08:00, sequoiatools/couchbase-cli:6.5:5a6788] rebalance -c 172.23.104.16:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.104.16:8091 -u Administrator -p password]
       
      docker logs 5a6788
      docker start 5a6788
       
      ������sWARNING: couchbase-cli version 6.5.0-3216-enterprise does not match couchbase server version 6.6.1-9213-enterprise
      ������*Unable to display progress bar on this os
      ������JERROR: Rebalance failed. See logs for detailed reason. You can try again. 

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            vikas.chaudhary Vikas Chaudhary added a comment - - edited

            System recovered after killing all the producer on eventing nodes 1 by 1 and rebalance passed 

            logs:  http://supportal.couchbase.com/snapshot/80d6eeb26726b35ea7abfb43a3070ecb::1

            Subsequent rebalance and lifecycle operations passed too.

            Show
            vikas.chaudhary Vikas Chaudhary added a comment - - edited System recovered after killing all the producer on eventing nodes 1 by 1 and rebalance passed  logs:   http://supportal.couchbase.com/snapshot/80d6eeb26726b35ea7abfb43a3070ecb::1 Subsequent rebalance and lifecycle operations passed too.
            Hide
            ankit.prabhu Ankit Prabhu added a comment - - edited

            Function can be deployed in 2 ways. From SettingsChangeCallback and TopologyChangeCallback. During rebalance in of a eventing node there can be a race between these 2 callbacks and 2 different function can watch bucket at a same time. Eventing will update the bucketmap with the 2nd function and 1st one won't be included in the bucketMap due to race between concurrent access to map. So when 1st function tries GetBucket it will fail and exit. But it will remain in bootstrapping list. Users can run into this issue when they have multiple functions against the same source bucket and they try to rebalance-in an eventing node.

            2020-12-16T09:14:15.807-08:00 [Info] Supervisor::New bucket_op_curl: Failed service 'consumer => app: bucket_op_curl name: worker_bucket_op_curl_0 tcpPort: /tmp/127.0.0.1:8091_0_680253833.sock ospid: 0 dcpEventProcessed:  v8EventProcessed: ' (1.000000 failures of 5.000000), restarting: true, error: "{consumer => app: bucket_op_curl name: worker_bucket_op_curl_0 tcpPort: /tmp/127.0.0.1:8091_0_680253833.sock ospid: 0 dcpEventProcessed:  v8EventProcessed:  consumer => app: bucket_op_curl name: worker_bucket_op_curl_0 tcpPort: /tmp/127.0.0.1:8091_0_680253833.sock ospid: 0 dcpEventProcessed:  v8EventProcessed: } returned unexpectedly", stacktrace: [unknown stack trace]
            

            Show
            ankit.prabhu Ankit Prabhu added a comment - - edited Function can be deployed in 2 ways. From SettingsChangeCallback and TopologyChangeCallback. During rebalance in of a eventing node there can be a race between these 2 callbacks and 2 different function can watch bucket at a same time. Eventing will update the bucketmap with the 2nd function and 1st one won't be included in the bucketMap due to race between concurrent access to map. So when 1st function tries GetBucket it will fail and exit. But it will remain in bootstrapping list. Users can run into this issue when they have multiple functions against the same source bucket and they try to rebalance-in an eventing node. 2020-12-16T09:14:15.807-08:00 [Info] Supervisor::New bucket_op_curl: Failed service 'consumer => app: bucket_op_curl name: worker_bucket_op_curl_0 tcpPort: /tmp/127.0.0.1:8091_0_680253833.sock ospid: 0 dcpEventProcessed: v8EventProcessed: ' (1.000000 failures of 5.000000), restarting: true, error: "{consumer => app: bucket_op_curl name: worker_bucket_op_curl_0 tcpPort: /tmp/127.0.0.1:8091_0_680253833.sock ospid: 0 dcpEventProcessed: v8EventProcessed: consumer => app: bucket_op_curl name: worker_bucket_op_curl_0 tcpPort: /tmp/127.0.0.1:8091_0_680253833.sock ospid: 0 dcpEventProcessed: v8EventProcessed: } returned unexpectedly", stacktrace: [unknown stack trace]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-4073 contains eventing commit b036e1f with commit message:
            MB-43343: Lock the bucketmap before connecting to the bucket

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4073 contains eventing commit b036e1f with commit message: MB-43343 : Lock the bucketmap before connecting to the bucket
            Hide
            vikas.chaudhary Vikas Chaudhary added a comment -

            Not seen on 7.0.0-4669

            Show
            vikas.chaudhary Vikas Chaudhary added a comment - Not seen on 7.0.0-4669

              People

              Assignee:
              ankit.prabhu Ankit Prabhu
              Reporter:
              vikas.chaudhary Vikas Chaudhary
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty