Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34846

rebalance failure in centos longevity with failed_to_update_vbucket_map error

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 6.5.0
    • Fix Version/s: 6.5.0
    • Component/s: test-execution
    • Labels:
      None
    • Triage:
      Untriaged
    • Is this a Regression?:
      Unknown

      Description

      6.5.0-3633 - centos longevity - following rebalance error observed twice - latest occurrence:

      2019-07-01T19:12:44.503-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.97.74) - Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.18825.179>,
      {failed_to_update_vbucket_map,"HISTORY",685,
      {error,
      [{'ns_1@172.23.96.219',
      {exit,
      {{nodedown,'ns_1@172.23.96.219'},
      {gen_server,call,
      [

      {ns_config_rep,'ns_1@172.23.96.219'}

      ,
      synchronize_everything,
      infinity]}}}}]}}}}}.
      Rebalance Operation Id = 0bffe4a5a8c1da5801868807f2a46387

      logs collected within the hour:

      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.104.113.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.104.114.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.104.148.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.104.68.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.104.78.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.122.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.14.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.18.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.183.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.190.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.191.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.207.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.209.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.210.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.212.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.214.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.215.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.216.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.219.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.220.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.221.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.223.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.254.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.96.48.zip
      https://s3.amazonaws.com/bugdb/jira/systestmon-1562035042/collectinfo-2019-07-02T023723-ns_1%40172.23.97.74.zip

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            steve.watanabe Steve Watanabe added a comment -

            Girish Benakappa What is the test doing to lead to rebalance operations?  e.g. are nodes being taken down, etc.?

            Show
            steve.watanabe Steve Watanabe added a comment - Girish Benakappa What is the test doing to lead to rebalance operations?  e.g. are nodes being taken down, etc.?
            Hide
            steve.watanabe Steve Watanabe added a comment -

            I see that Ajit has analyzed this issue and suggested that the test not add a node back into the cluster until after the rebalance has completed.  Also,  that ns_server should prevent node additions during a rebalance.

            Show
            steve.watanabe Steve Watanabe added a comment - I see that Ajit has analyzed this issue and suggested that the test not add a node back into the cluster until after the rebalance has completed.  Also,  that ns_server should prevent node additions during a rebalance.
            Hide
            steve.watanabe Steve Watanabe added a comment -

            Assigning to Ajit as he is looking at a change where node adds would go through the orchestrator who could error them out if a rebalance was in progress.

            Show
            steve.watanabe Steve Watanabe added a comment - Assigning to Ajit as he is looking at a change where node adds would go through the orchestrator who could error them out if a rebalance was in progress.
            Hide
            ajit.yagaty Ajit Yagaty [X] (Inactive) added a comment -

            I have opened a ticket to disallow node additions while rebalance is in progress. Assigning this ticket back to QE to fix up the test script.

            Show
            ajit.yagaty Ajit Yagaty [X] (Inactive) added a comment - I have opened a ticket to disallow node additions while rebalance is in progress. Assigning this ticket back to QE to fix up the test script.
            Hide
            arunkumar Arunkumar Senthilnathan added a comment -

            Our longevity script always polls on rebalance progress and does not attempt to add nodes until a rebalance in progress is done - but if the timestamps from logs are indicating that the script tried to add a node while a rebalance in progress means there might be a bug in the rest API used to monitor rebalance progress or a bug in the script's polling mechanism - anyways this issue is not seen in the past two runs - so resolving it for now - will reopen and investigate if it occurs again

            Show
            arunkumar Arunkumar Senthilnathan added a comment - Our longevity script always polls on rebalance progress and does not attempt to add nodes until a rebalance in progress is done - but if the timestamps from logs are indicating that the script tried to add a node while a rebalance in progress means there might be a bug in the rest API used to monitor rebalance progress or a bug in the script's polling mechanism - anyways this issue is not seen in the past two runs - so resolving it for now - will reopen and investigate if it occurs again

              People

              Assignee:
              arunkumar Arunkumar Senthilnathan
              Reporter:
              arunkumar Arunkumar Senthilnathan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty